Story #9148

[API] Finalize and document the collections/provenance and collections/used_by API calls

Added by Brett Smith about 3 years ago. Updated almost 3 years ago.

Assigned To:
Target version:
Start date:
Due date:
% Done:


Estimated time:
Story points:


If we're happy with the API as-is, we just need to document it.

If there are any changes we want to make before then, now is probably the time. It doesn't work the way you would expect based on other calls that return multiple objects: rather than an "items" field with a list, you get a plain hash where the keys are identifiers and the values are objects. But I personally don't see any reason that would be a dealbreaker.


#1 Updated by Brett Smith about 3 years ago

  • Target version set to Arvados Future Sprints

#2 Updated by Tom Clegg about 3 years ago

We should figure out how to do paging here. The current API doesn't seem to give us any alternative to returning a single response with the entire graph, which can be arbitrarily large.

Another oddity is that there's no distinction between these two kinds of provenance:
  • job A output just the word "yes", and job B used an input collection that contained just the word "yes".
  • job A output the word "yes", and that collection was used as an input to job B (e.g., there was a pipeline like "job A | job B").

(This is by no means the only place we fail to make that distinction, but we should probably consider it when naming and specifying APIs in this area.)

#3 Updated by Joshua Randall almost 3 years ago

Another thing to consider changing regarding both `collections/used_by` and `collections/provenance` calls is that they return "complete" collection records, including `manifest_text` (which can be very long), `file_names` (which AFAIK can't be selected by the list call and may be incomplete?), and "id" (which I think is an internal id not intended for use outside the database). It would probably be advantageous to implement the same sort of restricted selection functionality that `collections/list` has, but the difficulty/complication with that is that these calls can return multiple types of records.

Other issues:
- Collection records in the list returned by `collections/used_by` are missing the "kind" column (whereas job records have them)
- "fragment" collections only have a portable_data_hash and a name - they could also use a "kind" (perhaps "arvados#fragment"?) in order to be able to easily tell what one is looking at in the dictionary

Also available in: Atom PDF