Project

General

Profile

Actions

Idea #9148

open

[API] Finalize and document the collections/provenance and collections/used_by API calls

Added by Brett Smith almost 8 years ago. Updated about 2 months ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
Story points:
-
Release:
Release relationship:
Auto

Description

If we're happy with the API as-is, we just need to document it.

If there are any changes we want to make before then, now is probably the time. It doesn't work the way you would expect based on other calls that return multiple objects: rather than an "items" field with a list, you get a plain hash where the keys are identifiers and the values are objects. But I personally don't see any reason that would be a dealbreaker.

Actions #1

Updated by Brett Smith almost 8 years ago

  • Target version set to Arvados Future Sprints
Actions #2

Updated by Tom Clegg almost 8 years ago

We should figure out how to do paging here. The current API doesn't seem to give us any alternative to returning a single response with the entire graph, which can be arbitrarily large.

Another oddity is that there's no distinction between these two kinds of provenance:
  • job A output just the word "yes", and job B used an input collection that contained just the word "yes".
  • job A output the word "yes", and that collection was used as an input to job B (e.g., there was a pipeline like "job A | job B").

(This is by no means the only place we fail to make that distinction, but we should probably consider it when naming and specifying APIs in this area.)

Actions #3

Updated by Joshua Randall almost 8 years ago

Another thing to consider changing regarding both `collections/used_by` and `collections/provenance` calls is that they return "complete" collection records, including `manifest_text` (which can be very long), `file_names` (which AFAIK can't be selected by the list call and may be incomplete?), and "id" (which I think is an internal id not intended for use outside the database). It would probably be advantageous to implement the same sort of restricted selection functionality that `collections/list` has, but the difficulty/complication with that is that these calls can return multiple types of records.

Other issues:
- Collection records in the list returned by `collections/used_by` are missing the "kind" column (whereas job records have them)
- "fragment" collections only have a portable_data_hash and a name - they could also use a "kind" (perhaps "arvados#fragment"?) in order to be able to easily tell what one is looking at in the dictionary

Actions #4

Updated by Ward Vandewege almost 3 years ago

  • Target version deleted (Arvados Future Sprints)
Actions #5

Updated by Peter Amstutz about 1 year ago

  • Release set to 60
Actions #6

Updated by Peter Amstutz about 2 months ago

  • Target version set to Future
Actions

Also available in: Atom PDF