Idea #9148
open
[API] Finalize and document the collections/provenance and collections/used_by API calls
Added by Brett Smith over 8 years ago.
Updated 9 months ago.
Release relationship:
Auto
Description
If we're happy with the API as-is, we just need to document it.
If there are any changes we want to make before then, now is probably the time. It doesn't work the way you would expect based on other calls that return multiple objects: rather than an "items" field with a list, you get a plain hash where the keys are identifiers and the values are objects. But I personally don't see any reason that would be a dealbreaker.
- Target version set to Arvados Future Sprints
We should figure out how to do paging here. The current API doesn't seem to give us any alternative to returning a single response with the entire graph, which can be arbitrarily large.
Another oddity is that there's no distinction between these two kinds of provenance:
- job A output just the word "yes", and job B used an input collection that contained just the word "yes".
- job A output the word "yes", and that collection was used as an input to job B (e.g., there was a pipeline like "job A | job B").
(This is by no means the only place we fail to make that distinction, but we should probably consider it when naming and specifying APIs in this area.)
Another thing to consider changing regarding both `collections/used_by` and `collections/provenance` calls is that they return "complete" collection records, including `manifest_text` (which can be very long), `file_names` (which AFAIK can't be selected by the list call and may be incomplete?), and "id" (which I think is an internal id not intended for use outside the database). It would probably be advantageous to implement the same sort of restricted selection functionality that `collections/list` has, but the difficulty/complication with that is that these calls can return multiple types of records.
Other issues:
- Collection records in the list returned by `collections/used_by` are missing the "kind" column (whereas job records have them)
- "fragment" collections only have a portable_data_hash and a name - they could also use a "kind" (perhaps "arvados#fragment"?) in order to be able to easily tell what one is looking at in the dictionary
- Target version deleted (
Arvados Future Sprints)
- Target version set to Future
Also available in: Atom
PDF