Idea #14151
closedExtend vocabulary support for properties to support strong identifiers and multiple labels
Description
In #12479 support was added for controlled vocabularies of tags and values when setting and updating properties on collections, but, on the backend, that feature just used the existing string based properties which means there is no easy way to track vocabulary label changes, alternate labels, multiple languages, etc.
As a system administrator, I would like the ability to configure the vocabulary that the users are allowed to use when editing properties and be able to update the labels while retaining the same concept identifiers.
As a user, I would like the ability to search on any of a number of alternate terms for a common concept. For example, I'd like to be able to search for either "human" or "homo sapiens" in the context of "species" and have it return the same concept.
I would also like the option of viewing a definition of the concept, preferred label, and other related information to help make sure I'm choosing the correct term. For hierarchical vocabularies, it'd be desirable to match on all child concepts of a parent concept when searching. The need for the additional capabilities in this paragraph are TBD.
Design:
- UI will be implemented in Workbench2
- Vocabulary terms (predicates/property keys and values) have an id and one or more labels. There should be a way of deciding on a preferred or primary label for display.
- Properties have a range of valid terms (identifier + labels) and/or can be non-strict (allows free text)
- Restrict label search on property range and language
- When user starts typing, search identifiers, labels (possibly also description/definition text) and present an autocomplete list
- Alternately, user may start with a full list (with most commonly used terms at the top? how to rank?)
- When user selects desired label, store the underlying identifier for the property field/value
- When viewing properties, display the primary label associated with an id for the current language
- Permissible to have duplicate labels (???) provided can be differentiated otherwise (different languages? map to ids in different property ranges?)
- Extend the vocabulary JSON to support identifiers, multiple labels, description/definition text for terms, language tags
- To start with, assume that the entire vocabulary JSON can be loaded by the browser and searched with Javascript. The module for working with the vocabulary should be implemented using asynchronous methods so that an API backend can be plugged in later if loading static JSON becomes impracticable.
Updated by Peter Amstutz over 6 years ago
This seems like an obvious application for an ontology.
Updated by Tom Morris over 6 years ago
- Target version changed from Arvados Future Sprints to To Be Groomed
Updated by Peter Amstutz about 6 years ago
- Related to Idea #12479: [Workbench] Extend tag/property editing to support a structured vocabulary added
Updated by Tom Morris over 5 years ago
- Blocked by Idea #15067: [Workbench 2] Update property editing to use IDs added
Updated by Tom Morris over 5 years ago
- Blocked by Idea #15069: [Workbench 2] Extend search UI to support vocabulary IDs as well as text added
Updated by Tom Morris over 5 years ago
- Blocked by Idea #15070: Update search API to support OR queries across text and vocabulary IDs added
Updated by Tom Morris over 5 years ago
- Related to Idea #15071: Design new vocabulary file format added
Updated by Peter Amstutz over 5 years ago
Distinguishing labels / free text / identifiers
Suggestion was to use {"id": "xxx"} or {"text": "xxx"} but that doesn't work well:
Can't be used for keysMatching values is awkward, needs special handling to compare jsonb values compared to plain string values
Proposal: use a prefix to identify vocabulary terms.
term:ROX12345
Migration. Replace labels with terms.
"species": "human"
becomes
"term:ROX123": "term:ROX456"
Can easily identify/convert between terms and labels by looking for the leading "term:". Anything else can be assumed to be a label or free text.
The contents in the properties hash will not distinguish between identifiers and strings. When a client such as workbench2 displays/edits properties, it should look up each key and value in the vocabulary, and use corresponding preferred label for display. If no label is found, display the bare value. The vocabulary is assumed to use identifiers that are unlikely to conflict with normal text input.
Updated by Tom Morris over 5 years ago
- Blocked by deleted (Idea #15070: Update search API to support OR queries across text and vocabulary IDs)
Updated by Tom Morris over 5 years ago
- Related to Idea #15070: Update search API to support OR queries across text and vocabulary IDs added
Updated by Peter Amstutz about 5 years ago
- Related to deleted (Idea #15070: Update search API to support OR queries across text and vocabulary IDs)
Updated by Peter Amstutz about 5 years ago
- Target version deleted (
To Be Groomed)