Feature #17994

[api] storage class fields should be supported in filters

Added by Ward Vandewege 3 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
08/27/2021
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-

Description

The storage class fields `storage_classes_confirmed` and `storage_classes_desired` are currently not supported as filter attributes (e.g. for use with the cli tools).

It would be useful to change that. This would allow an admin to get a list of collections that are confirmed (or desired) for a particular storage class. Such a list can be used as input to the `deduplication-report`, so that report could then be generated for a particular (set of) storage class(es).

This would also make it possible to create a filter group for a specific (set of) storage class(es).


Subtasks

Task #18043: Review 17994-filter-by-storage-classesResolvedTom Clegg


Related issues

Related to Arvados - Feature #17993: [deduplication-report] supports storage classesNew

Related to Arvados - Story #17697: Design for reporting tools to determine what data is on multiple storage classes.Resolved

Related to Arvados - Feature #17995: [api] add method to get collections where replication_confirmed < replication_desiredResolved08/27/2021

Blocks Arvados Epics - Story #16107: Storage classesResolved03/01/202109/30/2021

Associated revisions

Revision b1daec9a
Added by Tom Clegg about 2 months ago

Merge branch '17994-filter-by-storage-classes' into main

closes #17994

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

History

#1 Updated by Ward Vandewege 3 months ago

  • Description updated (diff)
  • Subject changed from [api] storage class fields should be supported in our filters to [api] storage class fields should be supported in filters

#2 Updated by Ward Vandewege 3 months ago

  • Related to Feature #17993: [deduplication-report] supports storage classes added

#3 Updated by Ward Vandewege 2 months ago

  • Related to Story #17697: Design for reporting tools to determine what data is on multiple storage classes. added

#4 Updated by Ward Vandewege 2 months ago

#5 Updated by Peter Amstutz 2 months ago

  • Target version set to 2021-09-01 sprint
  • Assigned To set to Tom Clegg

#6 Updated by Tom Clegg 2 months ago

  • Status changed from New to In Progress

#7 Updated by Tom Clegg 2 months ago

17994-filter-by-storage-classes @ 902f8cd258a8dfec749a7f94d478a4027e319750 -- https://ci.arvados.org/view/Developer/job/developer-run-tests/2649/

So far this is a minimal implementation, it accepts filters like [["storage_classes_desired", "=", "[\"default\"]"]] -- note the operand is the JSON representation, as it's stored in the database.

But we probably want these, too:
  • [["storage_classes_desired", "=", ["default"]]] (alternative syntax equivalent to "[\"default\"]")
  • [["storage_classes_desired", "contains", ["default"]]] (matches ["foo","default"] as well as exact match ["default"])

Currently, https://doc.arvados.org/main/api/methods.html uses ["foo", "contains", "bar"] as the example for "contains", which is a bit misleading since "contains" only works if the first element is "attr.key" where "attr" is a jsonb object column and "key" is a top-level key in the json object. (Should change "foo" to "properties.foo" to make it more clear, I think.)

I'm thinking we could extend that to match json objects/arrays at the top level too, so
  • ["properties", "contains", ["foo", "bar"]] matches a record with {"foo": 1, "bar": 2, "baz": 3}
  • ["storage_classes_desired", "contains", ["foo", "bar"]] matches a record with ["bar", "foo", "default"].

The storage_classes_* fields aren't indexed. Practically speaking this might be okay -- there are typically very few classes with lots of collections in each, and if a condition matches a large portion of the table, an index doesn't save much time.

#8 Updated by Peter Amstutz 2 months ago

  • Related to Feature #17995: [api] add method to get collections where replication_confirmed < replication_desired added

#9 Updated by Tom Clegg 2 months ago

17994-filter-by-storage-classes @ 402e69f6e55dce4e11d354c3ca708b8e536c124b -- https://ci.arvados.org/view/Developer/job/developer-run-tests/2651/

  • accepts ["storage_classes_confirmed", "contains", ["key1", "key2", ...]] (works on any jsonb column)
  • accepts ["storage_classes_confirmed", "contains", "key1"]
  • reverts adding storage_classes_* to searchable_attributes on collection model (this caused "any" to try to match those columns, which seems undesirable and would require migrating the huge multi-column table index)
  • accepts "=", "<>", "!=" operators on jsonb columns even if they aren't in searchable_attributes. This makes it possible to do exact matches on storage_classes_*, which could be useful for degenerate cases like a single-element array or an empty properties object.

If this seems like the right behavior I'll need to update the API methods docs.

#11 Updated by Ward Vandewege about 2 months ago

Tom Clegg wrote:

17994-filter-by-storage-classes @ be900941bb4ab286cbeb02f65509be938726d67e -- https://ci.arvados.org/view/Developer/job/developer-run-tests/2662/

The developer-run-tests-apps-workbench-integration tests failed so I kicked those off again at https://ci.arvados.org/job/developer-run-tests-apps-workbench-integration/2823/console. That failed again, so once more at https://ci.arvados.org/job/developer-run-tests-apps-workbench-integration/2824/console, which finally passed.

LGTM, thanks!

#12 Updated by Tom Clegg about 2 months ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF