JSONB in PostgreSQL queries

Earlier I wrote how to enable jsonb postgres/psycopg2. Today experimented with how to query data in columns of type JSON.
On this subject there is documentation but I was not quite clear how to operate different types of transaction:

the
CREATE TABLE json_test (
id serial primary key,
data jsonb
);

INSERT INTO json_test (data) VALUES 
('{}'),
('{"a": 1}'),
('{"a": 2, "b": ["c", "d"]}'),
('{"a": 1, "b": {"c": "d", "e": true}}'),
('{"b": 2}');


The query worked, let's get all the data to check:

the
SELECT * FROM json_test;
id | data 
----+--------------------------------------
1 | {}
2 | {"a": 1}
3 | {"a": 2, "b": ["c", "d"]}
4 | {"a": 1, "b": {"c": "d", "e": true}}
5 | {"b": 2}
(5 rows) 

Now make the filter results. There are several operators that we can use, and then see why as the type chose jsonb.

Equality
In jsonb, we can check that the two identical JSON object:

the
SELECT * FROM json_test WHERE data = '{"a":1}';

id | data 
----+------
1 | {"a": 1}
(1 row)


Restrictions
Also we can retrieve the json object containing the other, i.e. "is a subset":

the
SELECT * FROM json_test WHERE data @ > '{"a":1}';

Says: — Give us all the objects starting with the key and the value to 1:

the
 id | data 
----+--------------------------------------
2 | {"a": 1}
4 | {"a": 1, "b": {"c": "d", "e": true}}
(2 rows)

Restrictions in both directions:
In this case, the query displays an empty object and an exact match for the second:

the
SELECT * FROM json_test WHERE data <@ '{"a":1}';
id | data 
----+----------
1 | {}
2 | {"a": 1}
(2 rows)


Existence of a key/element
The last batch of operators will check the existence of a key (or item type string in the array).

the
 id | data 
----+--------------------------------------
2 | {"a": 1}
3 | {"a": 2, "b": ["c", "d"]}
4 | {"a": 1, "b": {"c": "d", "e": true}}
(3 rows)

Get the objects that have any keys from the list:

the
SELECT * FROM json_test WHERE data ?| array['a', 'b'];
id | data 
----+--------------------------------------
2 | {"a": 1}
3 | {"a": 2, "b": ["c", "d"]}
4 | {"a": 1, "b": {"c": "d", "e": true}}
5 | {"b": 2}
(4 rows)

All object values have exact matching key from the list:

the
SELECT * FROM json_test WHERE data ?& array['a', 'b'];
id | data 
----+--------------------------------------
3 | {"a": 2, "b": ["c", "d"]}
4 | {"a": 1, "b": {"c": "d", "e": true}}
(2 rows)

Keys to bypass
You can filter the records having a match key->path. In simple cases, the use of the restriction operator can be easier, but not challenging them enough. These operations we can use in the SELECT, but still interesting to use them in the WHERE clause.

the
SELECT * FROM json_test WHERE data ->> 'a' > '1';

Get all records values of the associative element with a key equal to 1.
Note the need to use a text value, not numbers:

the
 id | data 
----+---------------------------
3 | {"a": 2, "b": ["c", "d"]}
(1 row)

Can make a comparison between the primitive objects and arrays:

the
SELECT * FROM json_test WHERE data -> 'b' > '1';
id | data 
----+--------------------------------------
3 | {"a": 2, "b": ["c", "d"]}
4 | {"a": 1, "b": {"c": "d", "e": true}}
5 | {"b": 2}
(3 rows)

It turns out that arrays and objects are more than the numbers.
You can also see a deeper path:

the
SELECT * FROM json_test WHERE data # > '{b,c}' = '"d"';

Get the object where the element b is a child object c, and C is equal to the string "d":

the
id | data 
----+--------------------------------------
4 | {"a": 1, "b": {"c": "d", "e": true}}

Also, there are versions of these operators that return text, not JSON object. In the case of the last query it means that we don't need to compare it with the JSON object (when we really want to line):

the
SELECT * FROM json_test WHERE data #>> '{b,c}' = 'd';
id | data 
----+--------------------------------------
4 | {"a": 1, "b": {"c": "d", "e": true}}
(1 row)

Thus, up to this point all is well. We can work with different data and the same data can be used in indexes jsonb, too. However, a more attentive reader may have noticed that we are dealing with JSON data, which have the object path from the root. It doesn't have to be like this: arrays are also valid JSON, do these are any valid examples:
the
SELECT 
'null'::json 
'true'::json 
'false'::json 
'2'::json
'1.0001'::json
'"abc"'::json 
'1E7'::jsonb;

Note the last entry that is of type jsonb and converted to a canonical form:

the
 json | json | json | json | json | json | jsonb 
------+------+-------+------+---------+-------+----------
null | true | false| 2 | 1.00001 | "abc" | 10000000
(1 row)

The same JSON null is different from SQL NULL.
So, what happens when we keep the mixed "type" in a JSON column?

the
INSERT INTO json_test (data) 
VALUES ('[]'), ('[1,2,"a"]'), ('null'), ('1E7'), ('"abc"');
SELECT * FROM json_test;
id | data 
----+--------------------------------------
1 | {}
2 | {"a": 1}
3 | {"a": 2, "b": ["c", "d"]}
4 | {"a": 1, "b": {"c": "d", "e": true}}
5 | {"b": 2}
6 | []
7 | [1, 2, "a"]
8 | null
9 | 10000000
10 | "abc"
(10 rows)

The whole structure is deduced without problems. Let's see if we can work with these objects and queries?
The equality test works fine:

the
SELECT * FROM json_test WHERE data = '{"a":1}';
SELECT * FROM json_test WHERE data = 'null';

Limitations also work as expected:

the
SELECT * FROM json_test WHERE data @ > '{"a":1}';
SELECT * FROM json_test WHERE data <@ '{"a":1}';

Keys and existing elements are working. Not surprisingly, one request will correspond to the elements in the array and keys in object:

the
SELECT * FROM json_test WHERE data ? 'a';
id | data 
----+--------------------------------------
2 | {"a": 1}
3 | {"a": 2, "b": ["c", "d"]}
4 | {"a": 1, "b": {"c": "d", "e": true}}
7 | [1, 2, "a"]
(4 rows)


the
SELECT * FROM json_test WHERE data ?| array['a', 'b'];
id | data 
----+--------------------------------------
2 | {"a": 1}
3 | {"a": 2, "b": ["c", "d"]}
4 | {"a": 1, "b": {"c": "d", "e": true}}
5 | {"b": 2}
7 | [1, 2, "a"]
(5 rows)


the
SELECT * FROM json_test WHERE data ?& array['a', 'b'];
id | data 
----+--------------------------------------
3 | {"a": 2, "b": ["c", "d"]}
4 | {"a": 1, "b": {"c": "d", "e": true}}
(2 rows)

But as soon as we started to do the keys or elements 'get' getting in trouble;

(Apparently the author of the article at the time of writing it was installed PotgreSQL 9.4 betta version, so some queries were thrown errors, checked for 9.4.1 all requests are processed):

the
SELECT * FROM json_test WHERE data ->> 'a' > '1';

ERROR: cannot call jsonb_object_field_text 
(jsonb ->> operator text) on an array

You can still use the bypass key-path if you do not have scalar values:

the
SELECT * FROM json_test WHERE data # > '{b,c}' = '"d"';
ERROR: cannot call extract path from a scalar
SELECT * FROM json_test WHERE data # > '{b,c}' = '"d"' AND id < 8;
id | data 
----+--------------------------------------
4 | {"a": 1, "b": {"c": "d", "e": true}}
(1 row)

Note the syntax for key is the path for strings (must be json keys) or an integer (the array indexes).
This imposes very strict limitations. I don't know how these things work in MondgoDB.

But in the long term, if you store your data in arrays and json objects in one column, then in future there can be some problems. But all is not lost. You can get rows on the basis of base objects:

the
SELECT * FROM json_test WHERE data @>' {}';
id | data 
----+--------------------------------------
1 | {}
2 | {"a": 1}
3 | {"a": 2, "b": ["c", "d"]}
4 | {"a": 1, "b": {"c": "d", "e": true}}
5 | {"b": 2}
(5 rows)

Then can this request be combined with the above query:

the
SELECT * FROM json_test WHERE data @ > '{} ' AND data ->> 'a' > '1';
id | data 
----+---------------------------
3 | {"a": 2, "b": ["c", "d"]}
(1 row)

Indeed, in Postgres you don't even need to be sure that the data @ > ' {} comes first.
But what if we need only array data types? It turns out you can use the same trick:

the
SELECT * FROM json_test WHERE data @>' []';
id | data 
----+-------------
6 | []
7 | [1, 2, "a"]
(2 rows)

And it still can be combined with other operators:

the
SELECT * FROM json_test WHERE data @ > '[] ' AND data ->> 1 = '2';
id | data 
----+-------------
7 | [1, 2, "a"]
(1 row)

Well, the entry @> operator is available only for jsonb columns, so you can't request for mixed data simple json columns.

What next?

Consideration of jsonb in Postgres was a side project, I am working on json(b) queries in the django ORM. With Django 1.7 in the search function you can write something like:

the
# Exact
MyModel.objects.filter(data={'a': 1})

# Key/element existence
MyModel.objects.filter(data__has='a')
MyModel.objects.filter(data__has_any=['a', 'b'])
MyModel.objects.filter(data__has_all=['a', 'b'])
# Sub/superset of key/value pairs testing
MyModel.objects.filter(data__contains={'a': 1})
MyModel.objects.filter(data__in={'a': 1, 'b': 2})
# Get element/field (compare with json)
MyModel.objects.filter(data__get=(2, {'a': 1}))
# Get element/field (compare with scalar, including gt/lt comparisons)
MyModel.objects.filter(data__get=(2, 'a'))
MyModel.objects.filter(data__get__gt=('a', 1))
# key-path traversal, compare with json or scalar.
MyModel.objects.filter(data__get=('{a,2}', {'foo': 'bar'}))
MyModel.objects.filter(data__get=('{a,2}', 2))
MyModel.objects.filter(data__get__lte=('{a,2}', 2))

But I'm not sure that will work the names of the last set. The name of "get" seems a bit generic, and maybe we could use different names for the lookup of the input type, although only integer and string are valid.
Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

Briefly on how to make your Qt geoservice plugin

Database replication PostgreSQL-based SymmetricDS

Yandex.Widget + adjustIFrameHeight + MooTools