Project

General

Profile

Feature #7751

Updated by Tom Clegg over 8 years ago

It is already _possible_ for a crunch program to do this: start arv-mount in writable mode, write files into a new directory, and use the resulting PDH as the task output. 

 This story makes it convenient to do this, i.e., the crunch script itself shouldn't need to do anything more complicated than this: 

 <pre><code class="python"> 
 outputdir = arvados.crunch.task_output_dir() 

 with open(os.path.join(outputdir.path, 'foo'), 'w') as f: 
     f.write('foo') 

 arvados.current_task().set_output(outputdir) 
 # or perhaps just: outputdir.save() 
 </code></pre> 

 Possible implementation approach: 
 * crunch-job sets up a writable fuse mount for every job task (but if the job doesn't do anything with it, nothing gets written; and it does not include any read or write access to existing collections beyond the by-PDH access already needed by jobs) 
 * add SDK functions that figure out (by looking at environment vars, etc.) where the output directory is supposed to go; push arv-mount's magic buttons[1] to get the PDH of the finished collection; and set the task output to that PDH. 

 fn1. Read JSON from @{dir}/.arvados#collection@ 

Back