Project

General

Profile

Actions

Feature #17995

closed

[api] add method to get collections where replication_confirmed < replication_desired

Added by Ward Vandewege about 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
-
Release relationship:
Auto

Description

This could be used by a new "storage classes" report built into `arvados-client`, but it should also be usable from the `arv` cli tools and the sdk, to make custom reporting easy.

From discussion: should this be a generalized feature to be able to compare two columns


Subtasks 1 (0 open1 closed)

Task #18042: Review 17995-filter-by-comparing-attrsResolvedTom Clegg08/27/2021Actions

Related issues

Related to Arvados - Idea #17697: Design for reporting tools to determine what data is on multiple storage classes.ResolvedWard VandewegeActions
Related to Arvados - Feature #17994: [api] storage class fields should be supported in filtersResolvedTom Clegg08/27/2021Actions
Blocks Arvados Epics - Idea #16107: Storage classesResolved03/01/202109/30/2021Actions
Actions #1

Updated by Ward Vandewege about 3 years ago

  • Description updated (diff)
Actions #2

Updated by Ward Vandewege about 3 years ago

  • Related to Idea #17697: Design for reporting tools to determine what data is on multiple storage classes. added
Actions #3

Updated by Ward Vandewege about 3 years ago

Actions #4

Updated by Peter Amstutz about 3 years ago

  • Target version set to 2021-09-01 sprint
  • Assigned To set to Tom Clegg
Actions #5

Updated by Peter Amstutz about 3 years ago

  • Description updated (diff)
Actions #6

Updated by Peter Amstutz about 3 years ago

  • Related to Feature #17994: [api] storage class fields should be supported in filters added
Actions #7

Updated by Tom Clegg about 3 years ago

Currently, this is what keep-balance does:
  • If storage_classes_desired + replication_desired are achieved, set _confirmed fields equal to _desired fields
  • If any of the desired classes have lower replication than desired, set storage_classes_confirmed=storage_classes_desired and replication_confirmed=lowest replication among the desired classes
  • If any of the desired classes have zero replication, set storage_classes_confirmed=[], replication_confirmed=0

To get a list of collections with insufficient replication/classes, you need to filter on replication_confirmed < replication_desired (not on the storage_classes fields)

Some ideas about how to spell that:
  • filters=[["replication_desired - replication_confirmed", ">", 0]] (I'm thinking this is the least bad)
  • filters=[["replication_desired > replication_confirmed", "=", true]]
  • filters=[["replication_desired", ">", "replication_confirmed"]] (treat replication_confirmed as a column name instead of a value because replication_desired is a numeric column) (this does not feel like a good pattern to explore)
Actions #8

Updated by Tom Clegg about 3 years ago

  • Status changed from New to In Progress
Actions #9

Updated by Peter Amstutz about 3 years ago

Tom Clegg wrote:

Currently, this is what keep-balance does:
  • If storage_classes_desired + replication_desired are achieved, set _confirmed fields equal to _desired fields
  • If any of the desired classes have lower replication than desired, set storage_classes_confirmed=storage_classes_desired and replication_confirmed=lowest replication among the desired classes
  • If any of the desired classes have zero replication, set storage_classes_confirmed=[], replication_confirmed=0

To get a list of collections with insufficient replication/classes, you need to filter on replication_confirmed < replication_desired (not on the storage_classes fields)

Some ideas about how to spell that:
  • filters=[["replication_desired - replication_confirmed", ">", 0]] (I'm thinking this is the least bad)
  • filters=[["replication_desired > replication_confirmed", "=", true]]
  • filters=[["replication_desired", ">", "replication_confirmed"]] (treat replication_confirmed as a column name instead of a value because replication_desired is a numeric column) (this does not feel like a good pattern to explore)

Consider surrounding the left side with () to indicate it is an expression and not just a column name?

filters=[["(replication_desired - replication_confirmed)", ">", 0]]

How much do we generalize this?

Actions #10

Updated by Tom Clegg about 3 years ago

Discussed offline that the second form

  • filters=[["replication_desired > replication_confirmed", "=", true]]

seems like a good step toward someday accepting arbitrary expressions like "(replication_desired > replication_confirmed && replication_confirmed > 2)".

For now we only need to accept that specific formula. More generic is good, but not a priority.

Actions #12

Updated by Tom Clegg about 3 years ago

  • Subject changed from [api] add method to get collections where storage_classes_confirmed != storage_classes_desired to [api] add method to get collections where replication_confirmed < replication_desired
Actions #13

Updated by Ward Vandewege about 3 years ago

Tom Clegg wrote:

17995-filter-by-comparing-attrs @ f68961a64f84de3987f99fe3dd79ca35abfab3aa -- developer-run-tests: #2661

In `doc/api/methods.html.textile.liquid`:

+h4(#filterexpression). Filtering by comparing two attributes
+
+If @a@ and @b@ are attributes that have numeric values, they can be compared in a filter condition using the form @["(a < b)", "=", true]@. Supported operators are @=@, @<@, @<=@, @>@, and @>=@. Examples:
+* @["(replication_desired < replication_confirmed)", "=", true]@
+* @["(replication_desired = replication_confirmed)", "=", true]@
+
+Note that only a very specific subset of boolean expressions is supported. For example:
+* @["replication_desired < replication_confirmed", "=", true]@ is not accepted. Outer parentheses are mandatory.
+* @["(replication_desired < 1)", "=", true]@ is not accepted. Operands must be attribute names.

Can we describe the all the current constraints for this feature in a list, then give some examples like you did? Intermixing the rules and examples makes it a bit hard to parse this documentation.

Perhaps something like

Rules for boolean expressions:
  • supported operators are ...
  • expression must be surrounded by parentheses
  • both operands must be attribute names with integer values (both operands must be in the searchable_columns list, right?)
  • operators may not be not constants
  • outer operand must be 'true'
  • outer operator must be '='
  • anything else?
Examples:
  • ...

Otherwise, I can't say I like this syntax, it feels... ugh. If/when we want to expand to more arbitrary expressions, is the plan to remove some of the constraints, adopt a new syntax, or something else?

LGTM, otherwise.

Actions #14

Updated by Tom Clegg about 3 years ago

I figure if/when we do arbitrary expressions the ,"=",true part would become optional/implicit, and there wouldn't really be any need for other kinds of filter, or multiple filters... so we could accept something like filter="(foo && bar)" instead of filters=[["(foo && bar)","=",true]]. But the main thing is that the current highly-restricted format ensures that anything we accept now will also be accepted by a more generic expression parser in future.

Another option is to allow an expression string in place of a 3-element filter array, like

filters=[["is_trashed","=",false], "(replication_desired > replication_confirmed)"]

We could transform this to [...,=,true] when accepting the request at controller, so we wouldn't need to change the rails code at all.

Actions #15

Updated by Tom Clegg about 3 years ago

Rearranged docs as suggested, and merged main.

17995-filter-by-comparing-attrs @ 2115f4609adcc86156e8e0aa59ea38ba5808378e -- developer-run-tests: #2665

Actions #16

Updated by Ward Vandewege about 3 years ago

Tom Clegg wrote:

Rearranged docs as suggested, and merged main.

17995-filter-by-comparing-attrs @ 2115f4609adcc86156e8e0aa59ea38ba5808378e -- developer-run-tests: #2665

Thanks, the doc update looks good. The new syntax from note 14 seems a lot nicer, that would definitely be my preference.

Actions #17

Updated by Tom Clegg about 3 years ago

  • Target version changed from 2021-09-01 sprint to 2021-09-15 sprint
Actions #18

Updated by Tom Clegg about 3 years ago

Good, that was even easier than I thought it would be (we already had a custom JSON-unmarshal method on arvados.Filter).

17995-filter-by-comparing-attrs @ f2ce99ebb4c721de5b22da4bd5bd6565d5f08d2e -- developer-run-tests: #2668

Actions #19

Updated by Ward Vandewege about 3 years ago

Tom Clegg wrote:

Good, that was even easier than I thought it would be (we already had a custom JSON-unmarshal method on arvados.Filter).

17995-filter-by-comparing-attrs @ f2ce99ebb4c721de5b22da4bd5bd6565d5f08d2e -- developer-run-tests: #2668

LGTM, thanks!

Actions #20

Updated by Tom Clegg about 3 years ago

  • Status changed from In Progress to Resolved
Actions #21

Updated by Peter Amstutz almost 3 years ago

  • Release set to 42
Actions

Also available in: Atom PDF