PostgreSQL has an extension to jsonpath: ** operator, which explores arbitrary depth finding your values everywhere. At the same time, there is a lax mode, defined by the standard, providing a “relaxed” way for working with json. In the lax mode, accessors automatically unwrap arrays; missing keys don’t trigger errors; etc. In short, it appears that the ** operator and lax mode aren’t designed to be together :)

The story started with the bug report. The simplified version is below. Jsonpath query is intended to select the value of key "y" everywhere. But it appears to select these values twice.

1
2
3
4
5
6
7
8
9
# SELECT * FROM jsonb_path_query('[{"x": "a", "y": [{"x":"b"}]}]'::jsonb,
                                 '$.**.x');
 jsonb_path_query
------------------
 "a"
 "a"
 "b"
 "b"
(4 rows)

This case looks like a bug. But is it? Let’s dig into details. Let’s split the jsonpath query into two parts: one containing the ** operator and another having the key accessor.

1
2
3
4
5
6
7
8
9
10
11
12
13
# SELECT var,
         jsonb_path_query_array(var, '$.x') key_x
  FROM jsonb_path_query('[{"x": "a", "y": [{"x":"b"}]}]'::jsonb,
                        '$.**') var;
               var               | key_x
---------------------------------+-------
 [{"x": "a", "y": [{"x": "b"}]}] | ["a"]
 {"x": "a", "y": [{"x": "b"}]}   | ["a"]
 "a"                             | []
 [{"x": "b"}]                    | ["b"]
 {"x": "b"}                      | ["b"]
 "b"                             | []
(6 rows)

As you can see, the ** operator selects every child in the json document as expected. The key accessor extracts corresponding values from both objects themselves and their wrapping arrays. And that’s also expected in the lax mode. So, it appears there is no bug; everything works as designed, although it’s surprising for users.

Finally, I’ve committed a paragraph to the docs, which explicitly clarifies this issue. It seems that lax mode and ** operator just aren’t designed to be used together. If you need ** operator, you can use strict mode. and everything is intuitively correct.

1
2
3
4
5
6
7
# SELECT * FROM jsonb_path_query('[{"x": "a", "y": [{"x":"b"}]}]'::jsonb,
                                 'strict $.**.x');
 jsonb_path_query
------------------
 "a"
 "b"
(2 rows)

Comments