Story #14151

Extend vocabulary support for properties to support strong identifiers and multiple labels

Added by Tom Morris 12 months ago. Updated about 1 month ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Story points:
-
Release relationship:
Auto

Description

In #12479 support was added for controlled vocabularies of tags and values when setting and updating properties on collections, but, on the backend, that feature just used the existing string based properties which means there is no easy way to track vocabulary label changes, alternate labels, multiple languages, etc.

As a system administrator, I would like the ability to configure the vocabulary that the users are allowed to use when editing properties and be able to update the labels while retaining the same concept identifiers.

As a user, I would like the ability to search on any of a number of alternate terms for a common concept. For example, I'd like to be able to search for either "human" or "homo sapiens" in the context of "species" and have it return the same concept.

I would also like the option of viewing a definition of the concept, preferred label, and other related information to help make sure I'm choosing the correct term. For hierarchical vocabularies, it'd be desirable to match on all child concepts of a parent concept when searching. The need for the additional capabilities in this paragraph are TBD.

Design:

  • UI will be implemented in Workbench2
  • Vocabulary terms (predicates/property keys and values) have an id and one or more labels. There should be a way of deciding on a preferred or primary label for display.
  • Properties have a range of valid terms (identifier + labels) and/or can be non-strict (allows free text)
  • Restrict label search on property range and language
  • When user starts typing, search identifiers, labels (possibly also description/definition text) and present an autocomplete list
  • Alternately, user may start with a full list (with most commonly used terms at the top? how to rank?)
  • When user selects desired label, store the underlying identifier for the property field/value
  • When viewing properties, display the primary label associated with an id for the current language
  • Permissible to have duplicate labels (???) provided can be differentiated otherwise (different languages? map to ids in different property ranges?)
  • Extend the vocabulary JSON to support identifiers, multiple labels, description/definition text for terms, language tags
  • To start with, assume that the entire vocabulary JSON can be loaded by the browser and searched with Javascript. The module for working with the vocabulary should be implemented using asynchronous methods so that an API backend can be plugged in later if loading static JSON becomes impracticable.

Related issues

Related to Arvados - Story #12479: [Workbench] Extend tag/property editing to support a structured vocabularyResolved10/24/2017

Related to Arvados - Story #15071: Design new vocabulary file formatNew

Related to Arvados - Story #15070: Update search API to support OR queries across text and vocabulary IDsNew

Blocked by Arvados - Story #15067: [Workbench 2] Update property editing to use IDsNew

Blocked by Arvados - Story #15069: [Workbench 2] Extend search UI to support vocabulary IDs as well as textNew

History

#1 Updated by Peter Amstutz 11 months ago

This seems like an obvious application for an ontology.

#2 Updated by Tom Morris 11 months ago

  • Target version changed from Arvados Future Sprints to To Be Groomed

#3 Updated by Peter Amstutz 11 months ago

  • Description updated (diff)

#4 Updated by Peter Amstutz 11 months ago

  • Description updated (diff)

#5 Updated by Peter Amstutz 11 months ago

  • Description updated (diff)

#7 Updated by Peter Amstutz 10 months ago

  • Related to Story #12479: [Workbench] Extend tag/property editing to support a structured vocabulary added

#8 Updated by Peter Amstutz 5 months ago

  • Description updated (diff)

#9 Updated by Tom Morris 5 months ago

  • Blocked by Story #15067: [Workbench 2] Update property editing to use IDs added

#10 Updated by Tom Morris 5 months ago

  • Blocked by Story #15069: [Workbench 2] Extend search UI to support vocabulary IDs as well as text added

#11 Updated by Tom Morris 3 months ago

  • Blocked by Story #15070: Update search API to support OR queries across text and vocabulary IDs added

#12 Updated by Tom Morris 3 months ago

  • Related to Story #15071: Design new vocabulary file format added

#13 Updated by Peter Amstutz 3 months ago

Distinguishing labels / free text / identifiers

Suggestion was to use {"id": "xxx"} or {"text": "xxx"} but that doesn't work well:

  • Can't be used for keys
  • Matching values is awkward, needs special handling to compare jsonb values compared to plain string values

Proposal: use a prefix to identify vocabulary terms.

term:ROX12345

Migration. Replace labels with terms.

"species": "human"

becomes

"term:ROX123": "term:ROX456"

Can easily identify/convert between terms and labels by looking for the leading "term:". Anything else can be assumed to be a label or free text.

The contents in the properties hash will not distinguish between identifiers and strings. When a client such as workbench2 displays/edits properties, it should look up each key and value in the vocabulary, and use corresponding preferred label for display. If no label is found, display the bare value. The vocabulary is assumed to use identifiers that are unlikely to conflict with normal text input.

#14 Updated by Tom Morris about 1 month ago

  • Release set to 22

#15 Updated by Tom Morris 13 days ago

  • Blocked by deleted (Story #15070: Update search API to support OR queries across text and vocabulary IDs)

#16 Updated by Tom Morris 13 days ago

  • Related to Story #15070: Update search API to support OR queries across text and vocabulary IDs added

Also available in: Atom PDF