[FUSE] Add --by-pdh switch for Crunch's use
Problem: Right now the FUSE driver opens a Websockets client. This is great for interactive use, but can cause scalability and stability issues with long-running jobs, where the functionality isn't necessary.
- Add a --by-pdh switch to FUSE. This is like --by-id, except it only allows accessing collections by their PDH.
- The --by-id option is defined in /services/fuse/bin/arv-mount. --by-pdh will be defined there as well.
- When --by-id is specified, arv-mount sets up the FUSE driver with a MagicDirectory at the root, which automatically sets up subdirectories for collections found by UUID or PDH. --by-pdh will work very similarly: it will put a class at the root that automatically sets up subdirectories for collections found by PDH only. There are a few different ways you might implement this:
- You might move the lookup-by-PDH logic out of MagicDirectory into a new class (MagicPDHDirectory, or some name like that), and then make MagicDirectory a subclass of that that adds logic to lookup by UUID.
- You might make a new subclass of MagicDirectory that just prohibits UUID lookups, returning ENOENT as soon as the user tries it, before calling corresponding methods on the MagicDirectory superclass.
- You might instantiate MagicDirectory with some kind of "PDH-only" argument, and then guard UUID lookups inside the MagicDirectory code based on that setting.
Tom and Peter might have thoughts about which implementation would be best.
- When arv-mount is invoked with --by-pdh, it must not start a Websockets client.
- bin/arv-mount always calls operations.listen_for_events. It should skip that when --by-pdh is an argument.
- Modify crunch-job to use --by-pdh instead of --by-id.
- This is a simple string replacement in /sdk/cli/bin/crunch-job. There's only one place where it says --by-id. Make it say --by-pdh instead.
- Make sure --by-pdh has appropriate information in --help.
- Make sure the README text inside the new directory (and the original --by-id directory) is correct. Currently this is contained in README_TEXT in the MagicDirectory class definition, for the --by-id directory.
- If the user guide (for either FUSE or Crunch jobs) mentions that Crunch jobs can only access collections by UUID or PDH, update those references to be clear that collections are accessible by PDH only.
A future story may add a runtime constraint to allow jobs to omit the --by-pdh switch, and access collections more dynamically (since they can do that via the API anyway).
#3 Updated by Brett Smith almost 6 years ago
- Description updated (diff)
I've updated the description with implementation ideas, and points about where documentation is likely to need updating. I have not changed any functional requirements, so I don't think this requires a change in story points. I hope these are useful pointers to help start the ball rolling on implementation work.
#5 Updated by Radhika Chippada almost 6 years ago
Implemented the items listed in the description. I chose option three and added a by_pdh optional argument to MagicDirectory since method overloading seemed to be the most straightforward and non-intrusive.
I spent a couple hours trying to see how I may write a test to pass by_pdh instead of by_id to the MagicDirectory based tests. I could not figure out where to improve existing test implementation to do this. So, did not add any tests for this.
#7 Updated by Peter Amstutz almost 6 years ago
MagicDirectory, suggest calling it
pdh_only instead of
if self.by_pdh and uuid_pattern.match(k): raise llfuse.FUSEError(errno.ENOENT)
should be folded into the existing conditional. Also it should return False, not raise an exception:
if not portable_data_hash_pattern.match(k) and (self.pdh_only or not uuid_pattern.match(k)): return False
I don't think it's necessary or desirable to modify the behavior of arv-mount --all based on --by-pdh. This means you can revert the changes to the text of
--all (line 35) and behavior of
e._entries['by_id'] (line 135).
For testing, suggest starting from
make_mounttakes keyword arguments, which are passed through, so you can write tests for each of
self.api.collections().create()...should return an API object with a UUID of the collection
- When by_pdh is False, test that the collection is accessible by both the PDH and the UUID
- When by_pdh is True, perform the same test except check that the collection is only accessable by PDH and fails when accessing by UUID
#8 Updated by Peter Amstutz almost 6 years ago
- I figured out why
Line 514 of fusedir.py:
self._entries['by_id'] = self.inodes.add_entry(MagicDirectory( self.inode, self.inodes, self.api, self.num_retries))
You need to add
self._entries['by_id'] = self.inodes.add_entry(MagicDirectory( self.inode, self.inodes, self.api, self.num_retries, self.pdh_only))
- Can you add the same access-by-UUID test to FuseMagicTest? Except it should succeed, of course.
- tests.test_mount.MagicDirApiError is failing for me:
ERROR: runTest (tests.test_mount.MagicDirApiError) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/peter/work/arvados/services/fuse/tests/test_mount.py", line 1030, in runTest llfuse.listdir(os.path.join(self.mounttmp, self.testcollection)) File "llfuse/fuse_api.pxi", line 43, in llfuse.capi.listdir (src/llfuse/capi_linux.c:22621) OSError: [Errno 2] No such file or directory: '/tmp/tmpBpoluv/97d180c4f916faf61fb3d64aa2263961+52'
#13 Updated by Joshua Randall almost 6 years ago
This change appears to be causing a problem for me - I appear to have a crunch-job that passes `--by-pdh` to arv-mount, but my arv-mount does not support that option so it exits with an error. I guess this change requires that crunch-job and arv-mount be upgraded in sync. I think in my case arv-mount is coming from the Ubuntu apt packages, and I can confirm that now that I've updated python-arvados-fuse on all nodes, this issue appears to have been resolved.