Story #4019

[API] Support query of "properties" field on objects

Added by Peter Amstutz almost 5 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Target version:
Start date:
12/12/2017
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
2.0

Description

A major drawback of using "link" objects for metadata (as opposed to the "properties" field on the object itself) is that link objects have permissions separate from the item they are describing. While sometimes this is desirable, in other situations (such as associating repo/tag to docker images) this ends introducing significant unnecessary complexity. The ability to query the "properties" field enables us to store metadata directly on the object and search on it.

Upgrade apiserver to Postgres 9.4 and Rails 4.2 (when it is released) to take advantage of searchable jsonb column data type.


Subtasks

Task #12771: Review 4019-query-propertiesResolvedPeter Amstutz


Related issues

Related to Arvados - Feature #9665: Arvados' PostgreSQL jsonb supportDuplicate07/26/2016

Related to Arvados - Bug #11168: [API] Use JSON instead of YAML for serialized fields in databaseResolved02/24/2017

Blocked by Arvados - Feature #7709: [API] Upgrade API server to Rails 4.2Resolved03/19/2017

Blocked by Arvados - Story #11908: Migrate Collections.properties to JSONBResolved06/27/2017

Associated revisions

Revision e768a05d
Added by Peter Amstutz over 1 year ago

Merge branch '4019-query-properties' closes #4019

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <>

History

#1 Updated by Peter Amstutz almost 5 years ago

  • Subject changed from Upgrade apiserver to Postgres 9.4 and Rails 4.2 (when it is released) to take advantage of queriable jsonb column data type. to Support query of "properties" field on objects
  • Description updated (diff)
  • Target version set to Arvados Future Sprints

#2 Updated by Peter Amstutz over 4 years ago

  • Subject changed from Support query of "properties" field on objects to [API] Support query of "properties" field on objects
  • Category set to API
  • Story points set to 5.0

#3 Updated by Tom Morris about 2 years ago

  • Assigned To set to Peter Amstutz

#4 Updated by Peter Amstutz about 2 years ago

https://www.postgresql.org/docs/9.4/static/datatype-json.html

jsonb also supports indexing

In general, most applications should prefer to store JSON data as jsonb, unless there are quite specialized needs, such as legacy assumptions about ordering of object keys.

JSON "null" is allowed, but not the same as the SQL NULL.

JSON data is written out as a string with tagged with ::json or ::jsonb

SELECT '[1, 2, "foo", null]'::json;

Testing containment is an important capability of jsonb. There is no parallel set of facilities for the json type. Containment tests whether one jsonb document has contained within it another one.

The default GIN operator class for jsonb supports queries with the @>, ?, ?& and ?| operators.
The non-default GIN operator class jsonb_path_ops supports indexing the @> operator only.

https://www.postgresql.org/docs/9.4/static/functions-json.html

peterjsontest=# select * from jsonbtest;
       j1       
----------------
 "foo" 
 {"foo": "bar"}
 null
 true
h2. (4 rows)

The -> operator is the basic indexing operator:

select * from jsonbtest where j1->'foo' = '"bar"'::jsonb
       j1       
----------------
 {"foo": "bar"}
(1 row)

The ->> operator is also an indexing operator, but returns the value as a text type which is directly comparable to SQL text values:

select * from jsonbtest where j1->>'foo' = 'bar';

The #> and #>> operators are similar but follow a path:

select * from jsonbtest where j1#>'{"foo", 1}' = '4'::jsonb;
         j1         
--------------------
 {"foo": [2, 4, 6]}
(1 row)

The ? operator determines if a key or value is present.

peterjsontest=# select * from jsonbtest where j1 ? 'foo';
         j1         
--------------------
 "foo" 
 {"foo": "bar"}
 {"foo": [2, 4, 6]}
 {"foo": ["3", "5", "7"]}
(3 rows)

Operators can be chained together.

select * from jsonbtest where j1->'foo' ? '3';
            j1            
--------------------------
 {"foo": ["3", "5", "7"]}
(1 row)

The ? operator only works for strings:

select * from jsonbtest where j1->'foo' ? '2';
 j1 
----
(0 rows)

Note: the following query won't use the index (except maybe to filter on rows which have 'foo' as a key, but that's unhelpful if 'foo' appears the majority of rows). It requires a special expression index:

select * from jsonbtest where j1->'foo' ? '3';
CREATE INDEX idxgintags ON jsonbtest USING gin ((j1 -> 'foo'));

We can do pattern matching on values:

select * from jsonbtest where j1->>'foo' like 'b%';
       j1       
----------------
 {"foo": "bar"}
(1 row)

The @> operator is the "containment" search operator, which allows testing for presence of key-value pairs directly. This produces the same answer as select * from jsonbtest where j1->'foo' = '"bar"'::jsonb; but @> is able to use an index:

select * from jsonbtest where j1 @> '{"foo": "bar"}'::jsonb;
       j1       
----------------
 {"foo": "bar"}
(1 row)

We can use containment search with numbers:

select * from jsonbtest where j1 @> '{"foo": [2]}'::jsonb
         j1         
--------------------
 {"foo": [2, 4, 6]}
(1 row)

#6 Updated by Peter Amstutz about 2 years ago

Design sketch for extending filters.

Syntax for filters.

Extend the 1st portion of the query to support operations on embedded json fields.

[["properties", "foo"], "=", "bar"]

(other options considered: "properties.foo", "properties->foo", "properties[foo]" but these alls require parsing and raise the issue of special characters ".->[]" appearing in the key).

Straight key-value tests can be efficiently implemented using @> operator on the index:

select * from table where properties @> '{"foo": "bar"}'::jsonb

Other possible queries:

[["properties", "foo"], "like", "b%"]

Implemented as:

select * from table where properties ? 'foo' and properties->>'foo' like 'b%'

(note that properties->>'foo' evaluates to SQL NULL for rows that don't have a "foo" entry; using properties ? 'foo' uses the index to filter rows that don't have a "foo" entry).

Somewhat confusingly, although the documentation describes correspondences between JSON types and Postgres types, they are not directly interoperable and comparisons require coercion to jsonb types:

select * from jsonbtest where properties ? 'foo' and j1->'foo' <> 'null'::jsonb
select * from jsonbtest where properties ? 'foo' and j1->'foo' > '3'::jsonb
select * from jsonbtest where properties ? 'foo' and j1->'foo' < '3'::jsonb

"in" queries are probably best expanded to an OR expression:

[["properties", "foo"], "in", ["bar", "quux"]]
select * from table where properties > '{"foo": "bar"}'::jsonb or properties > '{"foo": "quux"}'::jsonb;

Should "not_in" queries include rows where the key isn't defined?

[["properties", "foo"], "not_in", ["bar", "quux"]]
select * from table where properties ? 'foo' and properties->>'foo' not in ('bar', 'quux');

This raises a question: if the "foo" key is missing, that is an SQL NULL, but if it is present, it can have the value of JSON null. How should the filtering handling this? Maybe expose the '?' operator for JSONB columns to allow existence queries?

#7 Updated by Peter Amstutz about 2 years ago

The minimal operator set I recommend is:

  1. equality matching: [["properties", "foo"], "=", "bar"]
  2. like-matching: [["properties", "foo"], "like", "b%"]
  3. like-matching to get all records with a given key: [["properties", "foo"], "like", "%"]

#8 Updated by Tom Clegg about 2 years ago

Would it be possible to use arrays in serialized fields? Say we had properties like this

{
  "foo": [
    {"bar": "baz"},
    {"qux": "quux"},
    "corge" 
  ]
}
[How] would the following queries work?
  • properties.foo[anything].bar == "baz"
  • properties.foo[anything].bar == anything
  • properties.foo[anything] == "corge"

Would it be possible to distinguish {"foo":["bar"]} from {"foo":{"bar":"bar"}} and {"foo":"bar"}?

#9 Updated by Peter Amstutz about 2 years ago

Tom Clegg wrote:

Would it be possible to use arrays in serialized fields? Say we had properties like this

[...]

[How] would the following queries work?
  • properties.foo[anything].bar "baz"
  • properties.foo[anything].bar anything
  • properties.foo[anything] == "corge"

Since you just made up a syntax, I don't know what these queries are supposed to do?

Would it be possible to distinguish {"foo":["bar"]} from {"foo":{"bar":"bar"}} and {"foo":"bar"}?

It depends?

#10 Updated by Tom Clegg about 2 years ago

Peter Amstutz wrote:

Since you just made up a syntax, I don't know what these queries are supposed to do?

Match the given example based on the contents. For example, when I said

properties.foo[anything].bar == "baz"

I meant "get every record whose properties hash has a key "foo" whose value is an array containing an object with a key "bar" whose value is "baz"."

Would it be possible to distinguish {"foo":["bar"]} from {"foo":{"bar":"bar"}} and {"foo":"bar"}?

It depends?

Depends on..?

#11 Updated by Peter Amstutz about 2 years ago

Here's how I think the 1st and 3rd would be queried in Postgres:

select * from table where properties @> '{"foo": [{"bar": "baz"}]}'::jsonb
select * from table where properties @> '{"foo": ["corge"]}'::jsonb

The 2nd one is kind of hard. I eventually got to this:

select j1 from jsonbtest where j1 @> '{"foo": [{}]}'::jsonb and exists (select value from jsonb_array_elements(j1->'foo') where value @> '{}'::jsonb and value ? 'bar');

This filters on values of "foo" that consist of arrays containing objects, and then constructs a subquery on the elements in the array and tests if each element is an object, and it contains the key 'bar'.

#12 Updated by Peter Amstutz about 2 years ago

Basically, although you can use [] to match any array and {} to match any object, the postgres @> operator lacks a scalar wildcard to match strings/numbers/booleans/null.

Also annoying, this is true:

SELECT '["foo", "bar"]'::jsonb @> '"foo"'::jsonb;

But this doesn't work:

SELECT '{"foo": "bar"}'::jsonb @> '"foo"'::jsonb;

Even though these are both true:

SELECT '["foo", "bar"]'::jsonb ? 'foo';
SELECT '{"foo": "bar"}'::jsonb ? 'foo';

#13 Updated by Peter Amstutz about 2 years ago

Searching and filtering operations:

Get records with {"mytag1": "myvalue1"}:

[["properties", "tags", "mytag1"], "=", "myvalue1"]]

Get records with the value of "mytag1" matching a prefix:

[["properties", "tags", "mytag1"], "like", "myval%"]]

Get records with any value of "mytag1":

[["properties", "tags", "mytag1"], "like", "%"]]

#14 Updated by Peter Amstutz about 2 years ago

Implementations:

select properties from table where properties @> '{"tags": {"mytag1": "myvalue1"}}'::jsonb
select properties from table where properties @> '{"tags": {}}'::jsonb and properties#>>'{"tags","mytag1"}' like 'myval%'
select properties from table where properties @> '{"tags": {}}'::jsonb and properties->'tags' ? 'mytag1'

#15 Updated by Peter Amstutz about 2 years ago

Note: the above assumes 'tags' in a subfield. But it is easier to write queries that utilize the index if tags are in a toplevel object. (If there is a reason to have separate "user visible" and "not user visible" tags, then we should have a separate "tags" column).

Get records with {"mytag1": "myvalue1"}:

[["properties", "mytag1"], "=", "myvalue1"]]
select properties from table where properties @> '{"mytag1": "myvalue1"}'::jsonb

Get records with the value of "mytag1" matching a prefix:

[["properties", "tags", "mytag1"], "like", "myval%"]]
select properties from table where properties ? 'mytag1' and properties->>'mytag1' like 'myval%'

Get records with any value of "mytag1":

[["properties", "mytag1"], "like", "%"]]
select properties from table where properties ? 'mytag1'

#16 Updated by Tom Morris almost 2 years ago

  • Target version changed from Arvados Future Sprints to To Be Groomed

#17 Updated by Peter Amstutz almost 2 years ago

Test for exact key-value:

["properties.mytag1", "=", "myvalue1"]

Test for existence of 'mytag1', any value, which ever is easier:

["properties", "?", "mytag1"]
["properties.mytag1", "?", ""]

#18 Updated by Tom Morris almost 2 years ago

  • Target version changed from To Be Groomed to Arvados Future Sprints
  • Story points changed from 5.0 to 2.0

#20 Updated by Tom Morris over 1 year ago

  • Target version changed from Arvados Future Sprints to 2017-12-20 Sprint

#21 Updated by Peter Amstutz over 1 year ago

  • Related to deleted (Story #11908: Migrate Collections.properties to JSONB)

#22 Updated by Peter Amstutz over 1 year ago

  • Blocked by Story #11908: Migrate Collections.properties to JSONB added

#23 Updated by Peter Amstutz over 1 year ago

4019-query-properties @ 79a2d819d596e610f26c08beee53f5432bfbb360

Supports almost all operators on jsonb subproperties: =, !=, <, <=, >, >=, like, ilike, in, not in. Also adds a new operator "exists".

Documentation updated.

Adds a new index on collection properties.

#24 Updated by Lucas Di Pentima over 1 year ago

  • File services/api/lib/record_filters.rb
    • Line 112: Wouldn’t be convenient to validate the operand type to only allow boolean values when filtering subproperties with 'exists'? I found confusing the use of operands like ‘’, ‘none’, ‘false’ and 0 with ‘exists’ operator and getting results like if I used an operand == true.
    • If the above comment is valid, maybe adding some tests on ‘exists’ with invalid operand types will be useful.
  • File services/api/test/unit/arvados_model_test.rb
    • Line 150: Is the addition on this line superfluous as :jsonb was removed from searchable_columns()?

#25 Updated by Peter Amstutz over 1 year ago

Lucas Di Pentima wrote:

  • File services/api/lib/record_filters.rb
    • Line 112: Wouldn’t be convenient to validate the operand type to only allow boolean values when filtering subproperties with 'exists'? I found confusing the use of operands like ‘’, ‘none’, ‘false’ and 0 with ‘exists’ operator and getting results like if I used an operand == true.
    • If the above comment is valid, maybe adding some tests on ‘exists’ with invalid operand types will be useful.

Good idea. Improved error checking & added some tests for error cases.

  • File services/api/test/unit/arvados_model_test.rb
    • Line 150: Is the addition on this line superfluous as :jsonb was removed from searchable_columns()?

That's right, reverted.

now 4019-query-properties @ a879823f631381cefc4458c28f06c36803e30530

#26 Updated by Lucas Di Pentima over 1 year ago

This LGTM, thanks!

#27 Updated by Anonymous over 1 year ago

  • Status changed from New to Resolved

Also available in: Atom PDF