Idea #7661: [FUSE] Add --by-pdh switch for Crunch's use - Arvados

Idea #7661

Problem: Right now the FUSE driver opens a Websockets client.    This is great for interactive use, but can cause scalability and stability issues with long-running jobs, where the functionality isn't necessary. 

 Fix: 

 * Add a --by-pdh switch to FUSE.    This is like --by-id, except it *only* allows accessing collections by their PDH. 
 ** The --by-id option is defined in /services/fuse/bin/arv-mount.    --by-pdh will be defined there as well. 
 ** When --by-id is specified, arv-mount sets up the FUSE driver with a MagicDirectory at the root, which automatically sets up subdirectories for collections found by UUID or PDH.    --by-pdh will work very similarly: it will put a class at the root that automatically sets up subdirectories for collections found by PDH only.    There are a few different ways you might implement this: 
 *** You might move the lookup-by-PDH logic out of MagicDirectory into a new class (MagicPDHDirectory, or some name like that), and then make MagicDirectory a subclass of that that adds logic to lookup by UUID. 
 *** You might make a new subclass of MagicDirectory that just prohibits UUID lookups, returning ENOENT ENOTFOUND as soon as the user tries it, before calling corresponding methods on the MagicDirectory superclass. 
 *** You might instantiate MagicDirectory with some kind of "PDH-only" argument, and then guard UUID lookups inside the MagicDirectory code based on that setting. 
    Tom and Peter might have thoughts about which implementation would be best. 
 * When arv-mount is invoked with --by-pdh, it must not start a Websockets client. 
 ** bin/arv-mount always calls operations.listen_for_events.    It should skip that when --by-pdh is an argument. 
 * Modify crunch-job to use --by-pdh instead of --by-id. 
 ** This is a simple string replacement in /sdk/cli/bin/crunch-job.    There's only one place where it says --by-id.    Make it say --by-pdh instead. 
 * Documentation: 
 ** Make sure --by-pdh has appropriate information in --help. 
 ** Make sure the README text inside the new directory (and the original --by-id directory) is correct.    Currently this is contained in README_TEXT in the MagicDirectory class definition, for the --by-id directory. 
 ** If the user guide (for either FUSE or Crunch jobs) mentions that Crunch jobs can only access collections by UUID or PDH, update those references to be clear that collections are accessible by PDH only. 

 A future story may add a runtime constraint to allow jobs to omit the --by-pdh switch, and access collections more dynamically (since they can do that via the API anyway).

Back

Project

General

Profile

Arvados

Idea #7661