Feature #17995

[api] add method to get collections where replication_confirmed < replication_desired

Added by Ward Vandewege 2 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
08/27/2021
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-

Description

This could be used by a new "storage classes" report built into `arvados-client`, but it should also be usable from the `arv` cli tools and the sdk, to make custom reporting easy.

From discussion: should this be a generalized feature to be able to compare two columns


Subtasks

Task #18042: Review 17995-filter-by-comparing-attrsResolvedTom Clegg


Related issues

Related to Arvados - Story #17697: Design for reporting tools to determine what data is on multiple storage classes.Resolved

Related to Arvados - Feature #17994: [api] storage class fields should be supported in filtersResolved08/27/2021

Blocks Arvados Epics - Story #16107: Storage classesIn Progress03/01/202109/30/2021

Associated revisions

Revision 7aaf9f22
Added by Tom Clegg about 1 month ago

Merge branch '17995-filter-by-comparing-attrs'

closes #17995

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

History

#1 Updated by Ward Vandewege 2 months ago

  • Description updated (diff)

#2 Updated by Ward Vandewege 2 months ago

  • Related to Story #17697: Design for reporting tools to determine what data is on multiple storage classes. added

#3 Updated by Ward Vandewege 2 months ago

#4 Updated by Peter Amstutz 2 months ago

  • Target version set to 2021-09-01 sprint
  • Assigned To set to Tom Clegg

#5 Updated by Peter Amstutz 2 months ago

  • Description updated (diff)

#6 Updated by Peter Amstutz about 2 months ago

  • Related to Feature #17994: [api] storage class fields should be supported in filters added

#7 Updated by Tom Clegg about 2 months ago

Currently, this is what keep-balance does:
  • If storage_classes_desired + replication_desired are achieved, set _confirmed fields equal to _desired fields
  • If any of the desired classes have lower replication than desired, set storage_classes_confirmed=storage_classes_desired and replication_confirmed=lowest replication among the desired classes
  • If any of the desired classes have zero replication, set storage_classes_confirmed=[], replication_confirmed=0

To get a list of collections with insufficient replication/classes, you need to filter on replication_confirmed < replication_desired (not on the storage_classes fields)

Some ideas about how to spell that:
  • filters=[["replication_desired - replication_confirmed", ">", 0]] (I'm thinking this is the least bad)
  • filters=[["replication_desired > replication_confirmed", "=", true]]
  • filters=[["replication_desired", ">", "replication_confirmed"]] (treat replication_confirmed as a column name instead of a value because replication_desired is a numeric column) (this does not feel like a good pattern to explore)

#8 Updated by Tom Clegg about 2 months ago

  • Status changed from New to In Progress

#9 Updated by Peter Amstutz about 2 months ago

Tom Clegg wrote:

Currently, this is what keep-balance does:
  • If storage_classes_desired + replication_desired are achieved, set _confirmed fields equal to _desired fields
  • If any of the desired classes have lower replication than desired, set storage_classes_confirmed=storage_classes_desired and replication_confirmed=lowest replication among the desired classes
  • If any of the desired classes have zero replication, set storage_classes_confirmed=[], replication_confirmed=0

To get a list of collections with insufficient replication/classes, you need to filter on replication_confirmed < replication_desired (not on the storage_classes fields)

Some ideas about how to spell that:
  • filters=[["replication_desired - replication_confirmed", ">", 0]] (I'm thinking this is the least bad)
  • filters=[["replication_desired > replication_confirmed", "=", true]]
  • filters=[["replication_desired", ">", "replication_confirmed"]] (treat replication_confirmed as a column name instead of a value because replication_desired is a numeric column) (this does not feel like a good pattern to explore)

Consider surrounding the left side with () to indicate it is an expression and not just a column name?

filters=[["(replication_desired - replication_confirmed)", ">", 0]]

How much do we generalize this?

#10 Updated by Tom Clegg about 2 months ago

Discussed offline that the second form

  • filters=[["replication_desired > replication_confirmed", "=", true]]

seems like a good step toward someday accepting arbitrary expressions like "(replication_desired > replication_confirmed && replication_confirmed > 2)".

For now we only need to accept that specific formula. More generic is good, but not a priority.

#12 Updated by Tom Clegg about 2 months ago

  • Subject changed from [api] add method to get collections where storage_classes_confirmed != storage_classes_desired to [api] add method to get collections where replication_confirmed < replication_desired

#13 Updated by Ward Vandewege about 2 months ago

Tom Clegg wrote:

17995-filter-by-comparing-attrs @ f68961a64f84de3987f99fe3dd79ca35abfab3aa -- https://ci.arvados.org/view/Developer/job/developer-run-tests/2661/

In `doc/api/methods.html.textile.liquid`:

+h4(#filterexpression). Filtering by comparing two attributes
+
+If @a@ and @b@ are attributes that have numeric values, they can be compared in a filter condition using the form @["(a < b)", "=", true]@. Supported operators are @=@, @<@, @<=@, @>@, and @>=@. Examples:
+* @["(replication_desired < replication_confirmed)", "=", true]@
+* @["(replication_desired = replication_confirmed)", "=", true]@
+
+Note that only a very specific subset of boolean expressions is supported. For example:
+* @["replication_desired < replication_confirmed", "=", true]@ is not accepted. Outer parentheses are mandatory.
+* @["(replication_desired < 1)", "=", true]@ is not accepted. Operands must be attribute names.

Can we describe the all the current constraints for this feature in a list, then give some examples like you did? Intermixing the rules and examples makes it a bit hard to parse this documentation.

Perhaps something like

Rules for boolean expressions:
  • supported operators are ...
  • expression must be surrounded by parentheses
  • both operands must be attribute names with integer values (both operands must be in the searchable_columns list, right?)
  • operators may not be not constants
  • outer operand must be 'true'
  • outer operator must be '='
  • anything else?
Examples:
  • ...

Otherwise, I can't say I like this syntax, it feels... ugh. If/when we want to expand to more arbitrary expressions, is the plan to remove some of the constraints, adopt a new syntax, or something else?

LGTM, otherwise.

#14 Updated by Tom Clegg about 2 months ago

I figure if/when we do arbitrary expressions the ,"=",true part would become optional/implicit, and there wouldn't really be any need for other kinds of filter, or multiple filters... so we could accept something like filter="(foo && bar)" instead of filters=[["(foo && bar)","=",true]]. But the main thing is that the current highly-restricted format ensures that anything we accept now will also be accepted by a more generic expression parser in future.

Another option is to allow an expression string in place of a 3-element filter array, like

filters=[["is_trashed","=",false], "(replication_desired > replication_confirmed)"]

We could transform this to [...,=,true] when accepting the request at controller, so we wouldn't need to change the rails code at all.

#15 Updated by Tom Clegg about 2 months ago

Rearranged docs as suggested, and merged main.

17995-filter-by-comparing-attrs @ 2115f4609adcc86156e8e0aa59ea38ba5808378e -- https://ci.arvados.org/view/Developer/job/developer-run-tests/2665/

#16 Updated by Ward Vandewege about 2 months ago

Tom Clegg wrote:

Rearranged docs as suggested, and merged main.

17995-filter-by-comparing-attrs @ 2115f4609adcc86156e8e0aa59ea38ba5808378e -- https://ci.arvados.org/view/Developer/job/developer-run-tests/2665/

Thanks, the doc update looks good. The new syntax from note 14 seems a lot nicer, that would definitely be my preference.

#17 Updated by Tom Clegg about 2 months ago

  • Target version changed from 2021-09-01 sprint to 2021-09-15 sprint

#18 Updated by Tom Clegg about 2 months ago

Good, that was even easier than I thought it would be (we already had a custom JSON-unmarshal method on arvados.Filter).

17995-filter-by-comparing-attrs @ f2ce99ebb4c721de5b22da4bd5bd6565d5f08d2e -- https://ci.arvados.org/view/Developer/job/developer-run-tests/2668/

#19 Updated by Ward Vandewege about 1 month ago

Tom Clegg wrote:

Good, that was even easier than I thought it would be (we already had a custom JSON-unmarshal method on arvados.Filter).

17995-filter-by-comparing-attrs @ f2ce99ebb4c721de5b22da4bd5bd6565d5f08d2e -- https://ci.arvados.org/view/Developer/job/developer-run-tests/2668/

LGTM, thanks!

#20 Updated by Tom Clegg about 1 month ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF