Feature #17995
closed[api] add method to get collections where replication_confirmed < replication_desired
Description
This could be used by a new "storage classes" report built into `arvados-client`, but it should also be usable from the `arv` cli tools and the sdk, to make custom reporting easy.
From discussion: should this be a generalized feature to be able to compare two columns
Related issues
Updated by Ward Vandewege about 3 years ago
- Related to Idea #17697: Design for reporting tools to determine what data is on multiple storage classes. added
Updated by Ward Vandewege about 3 years ago
- Blocks Idea #16107: Storage classes added
Updated by Peter Amstutz about 3 years ago
- Target version set to 2021-09-01 sprint
- Assigned To set to Tom Clegg
Updated by Peter Amstutz about 3 years ago
- Related to Feature #17994: [api] storage class fields should be supported in filters added
Updated by Tom Clegg about 3 years ago
- If storage_classes_desired + replication_desired are achieved, set _confirmed fields equal to _desired fields
- If any of the desired classes have lower replication than desired, set storage_classes_confirmed=storage_classes_desired and replication_confirmed=lowest replication among the desired classes
- If any of the desired classes have zero replication, set storage_classes_confirmed=[], replication_confirmed=0
To get a list of collections with insufficient replication/classes, you need to filter on replication_confirmed < replication_desired (not on the storage_classes fields)
Some ideas about how to spell that:filters=[["replication_desired - replication_confirmed", ">", 0]]
(I'm thinking this is the least bad)filters=[["replication_desired > replication_confirmed", "=", true]]
filters=[["replication_desired", ">", "replication_confirmed"]]
(treat replication_confirmed as a column name instead of a value because replication_desired is a numeric column) (this does not feel like a good pattern to explore)
Updated by Peter Amstutz about 3 years ago
Tom Clegg wrote:
Currently, this is what keep-balance does:
- If storage_classes_desired + replication_desired are achieved, set _confirmed fields equal to _desired fields
- If any of the desired classes have lower replication than desired, set storage_classes_confirmed=storage_classes_desired and replication_confirmed=lowest replication among the desired classes
- If any of the desired classes have zero replication, set storage_classes_confirmed=[], replication_confirmed=0
To get a list of collections with insufficient replication/classes, you need to filter on replication_confirmed < replication_desired (not on the storage_classes fields)
Some ideas about how to spell that:
filters=[["replication_desired - replication_confirmed", ">", 0]]
(I'm thinking this is the least bad)filters=[["replication_desired > replication_confirmed", "=", true]]
filters=[["replication_desired", ">", "replication_confirmed"]]
(treat replication_confirmed as a column name instead of a value because replication_desired is a numeric column) (this does not feel like a good pattern to explore)
Consider surrounding the left side with () to indicate it is an expression and not just a column name?
filters=[["(replication_desired - replication_confirmed)", ">", 0]]
How much do we generalize this?
Updated by Tom Clegg about 3 years ago
Discussed offline that the second form
filters=[["replication_desired > replication_confirmed", "=", true]]
seems like a good step toward someday accepting arbitrary expressions like "(replication_desired > replication_confirmed && replication_confirmed > 2)".
For now we only need to accept that specific formula. More generic is good, but not a priority.
Updated by Tom Clegg about 3 years ago
17995-filter-by-comparing-attrs @ f68961a64f84de3987f99fe3dd79ca35abfab3aa -- developer-run-tests: #2661
Updated by Tom Clegg about 3 years ago
- Subject changed from [api] add method to get collections where storage_classes_confirmed != storage_classes_desired to [api] add method to get collections where replication_confirmed < replication_desired
Updated by Ward Vandewege about 3 years ago
Tom Clegg wrote:
17995-filter-by-comparing-attrs @ f68961a64f84de3987f99fe3dd79ca35abfab3aa -- developer-run-tests: #2661
In `doc/api/methods.html.textile.liquid`:
+h4(#filterexpression). Filtering by comparing two attributes + +If @a@ and @b@ are attributes that have numeric values, they can be compared in a filter condition using the form @["(a < b)", "=", true]@. Supported operators are @=@, @<@, @<=@, @>@, and @>=@. Examples: +* @["(replication_desired < replication_confirmed)", "=", true]@ +* @["(replication_desired = replication_confirmed)", "=", true]@ + +Note that only a very specific subset of boolean expressions is supported. For example: +* @["replication_desired < replication_confirmed", "=", true]@ is not accepted. Outer parentheses are mandatory. +* @["(replication_desired < 1)", "=", true]@ is not accepted. Operands must be attribute names.
Can we describe the all the current constraints for this feature in a list, then give some examples like you did? Intermixing the rules and examples makes it a bit hard to parse this documentation.
Perhaps something like
Rules for boolean expressions:- supported operators are ...
- expression must be surrounded by parentheses
- both operands must be attribute names with integer values (both operands must be in the searchable_columns list, right?)
- operators may not be not constants
- outer operand must be 'true'
- outer operator must be '='
- anything else?
- ...
Otherwise, I can't say I like this syntax, it feels... ugh. If/when we want to expand to more arbitrary expressions, is the plan to remove some of the constraints, adopt a new syntax, or something else?
LGTM, otherwise.
Updated by Tom Clegg about 3 years ago
I figure if/when we do arbitrary expressions the ,"=",true
part would become optional/implicit, and there wouldn't really be any need for other kinds of filter, or multiple filters... so we could accept something like filter="(foo && bar)"
instead of filters=[["(foo && bar)","=",true]]
. But the main thing is that the current highly-restricted format ensures that anything we accept now will also be accepted by a more generic expression parser in future.
Another option is to allow an expression string in place of a 3-element filter array, like
filters=[["is_trashed","=",false], "(replication_desired > replication_confirmed)"]
We could transform this to [...,=,true]
when accepting the request at controller, so we wouldn't need to change the rails code at all.
Updated by Tom Clegg about 3 years ago
Rearranged docs as suggested, and merged main.
17995-filter-by-comparing-attrs @ 2115f4609adcc86156e8e0aa59ea38ba5808378e -- developer-run-tests: #2665
Updated by Ward Vandewege about 3 years ago
Tom Clegg wrote:
Rearranged docs as suggested, and merged main.
17995-filter-by-comparing-attrs @ 2115f4609adcc86156e8e0aa59ea38ba5808378e -- developer-run-tests: #2665
Thanks, the doc update looks good. The new syntax from note 14 seems a lot nicer, that would definitely be my preference.
Updated by Tom Clegg about 3 years ago
- Target version changed from 2021-09-01 sprint to 2021-09-15 sprint
Updated by Tom Clegg about 3 years ago
Good, that was even easier than I thought it would be (we already had a custom JSON-unmarshal method on arvados.Filter).
17995-filter-by-comparing-attrs @ f2ce99ebb4c721de5b22da4bd5bd6565d5f08d2e -- developer-run-tests: #2668
Updated by Ward Vandewege about 3 years ago
Tom Clegg wrote:
Good, that was even easier than I thought it would be (we already had a custom JSON-unmarshal method on arvados.Filter).
17995-filter-by-comparing-attrs @ f2ce99ebb4c721de5b22da4bd5bd6565d5f08d2e -- developer-run-tests: #2668
LGTM, thanks!
Updated by Tom Clegg about 3 years ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|7aaf9f22aa646077b4b7fd961d6b731185b88137.