Story #10388

Request collections that don't (yet) exist via fuse interface

Added by Joshua Randall over 4 years ago. Updated 20 days ago.

Assigned To:
Target version:
Start date:
Due date:
% Done:


Estimated time:
Story points:


The fuse interface already supports a number of different ways to access collections aside from the low-level portable data hash. Accessing a collection by uuid rather than pdh requires the fuse client to contact the API server to get the current pdh for a collection, and to keep up to date on changes to that pdh over time.

As a user, it would be helpful to have a mechanism within the fuse mount by which I can request a collection that does not yet exist but that can be generated by some entity in the system (pipeline instance, job, or possibly even pipeline template + configuration).

This mechanism could be useful in a number of contexts, but one particularly useful one would be in "just-in-time" transcoding between formats, or performing relatively simple operations on existing data.

For example, I could imagine storing variant data in keep in a compact format such as BCF or even in a structured relational system such as lightning db. However, users may want to access this data in VCF format. Rather than having to manually create a pipeline to convert the data from its stored format to VCF, it would be useful to be able to access it by means of a fuse path such as:


My expectation would be that attempting to access the above file in the fuse mount would:
- create and run a pipeline_instance using the "convert_variants" pipeline_template as a template with the parameters "input" and "output_format" set as given
- block on any reads on the file until data becomes available (ideally at some point in the future streaming of a partially completed output collection would also be possible as each block is committed, but that should probably be out of scope for the initial implementation)
- make the default for the output collection to be garbage collected (i.e. mark the collection as intermediate or ephemeral, set replication_desired to 0, or don't even save the output pdh to a collection at all)


#1 Updated by Tom Morris almost 4 years ago

  • Target version set to Arvados Future Sprints

#2 Updated by Ward Vandewege 20 days ago

  • Target version deleted (Arvados Future Sprints)

Also available in: Atom PDF