Idea #10388
openRequest collections that don't (yet) exist via fuse interface
Description
The fuse interface already supports a number of different ways to access collections aside from the low-level portable data hash. Accessing a collection by uuid rather than pdh requires the fuse client to contact the API server to get the current pdh for a collection, and to keep up to date on changes to that pdh over time.
As a user, it would be helpful to have a mechanism within the fuse mount by which I can request a collection that does not yet exist but that can be generated by some entity in the system (pipeline instance, job, or possibly even pipeline template + configuration).
This mechanism could be useful in a number of contexts, but one particularly useful one would be in "just-in-time" transcoding between formats, or performing relatively simple operations on existing data.
For example, I could imagine storing variant data in keep in a compact format such as BCF or even in a structured relational system such as lightning db. However, users may want to access this data in VCF format. Rather than having to manually create a pipeline to convert the data from its stored format to VCF, it would be useful to be able to access it by means of a fuse path such as:
/keep/pipeline_template/convert_variants/input=5a0e057e83846a5ea9a6d8eebe3c1508+875474:input.bcf/output_format=VCF/output.vcf.gz
My expectation would be that attempting to access the above file in the fuse mount would:
- create and run a pipeline_instance using the "convert_variants" pipeline_template as a template with the parameters "input" and "output_format" set as given
- block on any reads on the file until data becomes available (ideally at some point in the future streaming of a partially completed output collection would also be possible as each block is committed, but that should probably be out of scope for the initial implementation)
- make the default for the output collection to be garbage collected (i.e. mark the collection as intermediate or ephemeral, set replication_desired to 0, or don't even save the output pdh to a collection at all)
Updated by Tom Morris over 7 years ago
- Target version set to Arvados Future Sprints
Updated by Ward Vandewege over 3 years ago
- Target version deleted (
Arvados Future Sprints)