Feature #17944

Updated by Peter Amstutz 7 months ago


Cf. https://doc.arvados.org/admin/workbench2-vocabulary.html

* When an Arvados object that has properties (collections, container_requests, groups, links) is created or updated, the API server will validate the properties contents. Properties are key-value pairs (property definitions are called “tags” in the vocabulary file)
* Property keys are checked against the standardized key identifiers defined in the vocabulary file. The key is also checked against the aliases (labels) for each tag. If a property key matches one of the aliases, the API server returns an error indicating that the client is required to use the standardized identifier for the key.
* The property value is checked that it is in the range of values for the tag as defined in the vocabulary file.
When “strict” is true, the value must be one of the standardized value identifiers listed for that tag. If it is not a standardized value identifier, the API server returns an error. It does not accept aliases, but if the provided value matches an alias, the error message should indicate as such.
* When “strict” is false or undefined, the value must either be one of the standardized values listed for that tag, or it must be a value that is not listed in aliases. If the value is listed in aliases, it should return an error that the client is required to use the standardized identifier.
* When a value is rejected due to use of an alias and not the standardized value identifier, the error message should include what standardized value identifier was expected.
* Use case insensitive match to check if a key or value matches an alias
* Respect the value of "strict_tags" in the vocabulary file Configuration option for handling unknown property keys, can specify either:
** strict_tags: false -- Property keys which are not defined in the vocabulary are not checked
** strict_tags: true -- Property keys which are not defined in the vocabulary are rejected
* Property validation is applied to all users, including admins
* The configuration file will be stored somewhere on the filesystem of the host that runs Arvados controller. The controller will have an API endpoint that Workbench 2 or other applications can use to fetch the vocabulary file.
* If a vocabulary file is configured but cannot be read at startup, Arvados controller will fail with an error.
* If the same alias is associated with more than one standardized identifier, fail with an error.
* The config-check subcommand will detect and report configuration and vocabulary file errors.
* To ease migration, if a record is updated but the update does not change the properties, it should not reject the update of unrelated fields even if the current properties are invalid
* When strict_tags is enabled, need to recognize and special case properties already in use by Arvados tools. Some properties (list is likely incomplete?)
** type
** template_uuid
** groups
** username
** image_timestamp
** docker-image-repo-tag
** filters
** container_request

Also: arvados-cwl-runner has a 'cache http download' feature that notes the provenance by setting the source URL as the key that maps to an object containing the cache headers. This usage is incompatible with "strict_tag".


Implementation:

* Validation happens in controller for create and update calls
* Add config parameter to API/VocabularyPath, expected to be local to the machine the controller runs on.
* The vocabulary file will be loaded and cached by controller; file timestamp will be checked on any request. If the vocabulary file can't be read (e.g. permissions, invalid json, etc), the existing cached version will be used and a health warning/prometheus alert should be raised.
* If the file can't be read on startup, that's an error. config-check should also check this, and will need to take into account that this is only an error if the context is the controller.
* Apply validation before all save/update requests. Admin users do not get special treatment.
* The validation code should handle existing data gracefully: if a record has existing properties are invalid, but the update does not include properties, updates to other fields in the collection should still be permitted.
* Update wb2 to get the file from controller (it will need to export the cache copy as a valid URL to the JSON)

Back