Feature #17944

Updated by Peter Amstutz 7 months ago


Cf. https://doc.arvados.org/admin/workbench2-vocabulary.html

* When an Arvados object that has properties (collections, container_requests, groups, links) is created or updated, the API server will validate the properties contents. Properties are key-value pairs (property definitions are called “tags” in the vocabulary file)
* Property keys are checked against the standardized key identifiers defined in the vocabulary file. The key
is also checked against the aliases (labels) for each tag. If a property key matches one of the aliases, the API server returns an error indicating that the client is required to use the standardized identifier for the key.
* The property value is checked that it is
currently available in the range of values wb2 for the tag as defined in the vocabulary file.
When “strict” is true, the value must be one of the standardized value identifiers listed for that tag. If it is not a standardized value identifier, the API server returns an error. It does not accept aliases, but if the provided value matches an alias, the error message should indicate as such.
* When “strict” is false or undefined, the value must either be one of the standardized values listed for that tag, or it must be a value that is not listed in aliases. If the value is listed in aliases, it should return an error that the client is required to use the standardized identifier.
* When a value is rejected due to use of an alias and not the standardized value identifier, the error message should include what standardized value identifier was expected.
* Use case insensitive match to check if a key or value matches an alias
* Configuration option for handling unknown property keys, can specify either:
** Property keys which are not defined in the vocabulary are not checked
** Property keys which are not defined in the vocabulary are rejected
* Property validation is applied to all users, including admins
* The configuration file will be stored somewhere on the filesystem of the host that runs Arvados controller. The controller will have an API endpoint that Workbench 2 or other applications can use to fetch the vocabulary file.
* If a vocabulary file is configured but cannot be read at startup, Arvados controller will fail with an error.
* If the same alias is associated with more than one standardized identifier, fail with an error.
* The config-check subcommand will detect and report configuration and vocabulary file errors.
* To ease migration, if a record is updated but the update does not change the properties, it should not reject the update of unrelated fields even if the current properties are invalid
collections, projects.

Implementation:

* Validation happens in controller for create and update calls
* Add
add config parameter to API/VocabularyPath, expected to be local to the machine the controller runs on.
* The the vocabulary file will be loaded and cached by controller; file timestamp will be checked on any request. If the vocabulary file can't be read (e.g. permissions, invalid json, etc), the existing cached version will be used and a health warning/prometheus alert should be raised.
* If if the file can't be read on startup, that's an error. config-check should also check this, and will need to take into account that this is only an error if the context is the controller.


* Apply apply validation before all save/update requests. Admin users do not get special treatment.


* The the validation code should handle existing data gracefully: if a record has existing properties tags are not being changed but are invalid, but the update does not include properties, updates to other fields in the collection should still be permitted.


* Update change wb2 to get the file from controller (it will need to export the cache copy as a valid URL to the JSON)
json url)

Back