Story #7661

[FUSE] Add --by-pdh switch for Crunch's use

Added by Brett Smith almost 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Radhika Chippada
Category:
FUSE
Target version:
Start date:
11/11/2015
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
1.0

Description

Problem: Right now the FUSE driver opens a Websockets client. This is great for interactive use, but can cause scalability and stability issues with long-running jobs, where the functionality isn't necessary.

Fix:

  • Add a --by-pdh switch to FUSE. This is like --by-id, except it only allows accessing collections by their PDH.
    • The --by-id option is defined in /services/fuse/bin/arv-mount. --by-pdh will be defined there as well.
    • When --by-id is specified, arv-mount sets up the FUSE driver with a MagicDirectory at the root, which automatically sets up subdirectories for collections found by UUID or PDH. --by-pdh will work very similarly: it will put a class at the root that automatically sets up subdirectories for collections found by PDH only. There are a few different ways you might implement this:
      • You might move the lookup-by-PDH logic out of MagicDirectory into a new class (MagicPDHDirectory, or some name like that), and then make MagicDirectory a subclass of that that adds logic to lookup by UUID.
      • You might make a new subclass of MagicDirectory that just prohibits UUID lookups, returning ENOENT as soon as the user tries it, before calling corresponding methods on the MagicDirectory superclass.
      • You might instantiate MagicDirectory with some kind of "PDH-only" argument, and then guard UUID lookups inside the MagicDirectory code based on that setting.
        Tom and Peter might have thoughts about which implementation would be best.
  • When arv-mount is invoked with --by-pdh, it must not start a Websockets client.
    • bin/arv-mount always calls operations.listen_for_events. It should skip that when --by-pdh is an argument.
  • Modify crunch-job to use --by-pdh instead of --by-id.
    • This is a simple string replacement in /sdk/cli/bin/crunch-job. There's only one place where it says --by-id. Make it say --by-pdh instead.
  • Documentation:
    • Make sure --by-pdh has appropriate information in --help.
    • Make sure the README text inside the new directory (and the original --by-id directory) is correct. Currently this is contained in README_TEXT in the MagicDirectory class definition, for the --by-id directory.
    • If the user guide (for either FUSE or Crunch jobs) mentions that Crunch jobs can only access collections by UUID or PDH, update those references to be clear that collections are accessible by PDH only.

A future story may add a runtime constraint to allow jobs to omit the --by-pdh switch, and access collections more dynamically (since they can do that via the API anyway).


Subtasks

Task #7692: DocumentationResolvedRadhika Chippada

Task #7691: Review branch: 7661-fuse-by-pdhResolvedRadhika Chippada

Associated revisions

Revision 3cdc4e8e
Added by Radhika Chippada almost 6 years ago

closes #7661
Merge branch '7661-fuse-by-pdh'

History

#1 Updated by Brett Smith almost 6 years ago

  • Description updated (diff)
  • Category set to FUSE
  • Story points set to 1.0

#2 Updated by Radhika Chippada almost 6 years ago

  • Assigned To set to Radhika Chippada

#3 Updated by Brett Smith almost 6 years ago

  • Description updated (diff)

I've updated the description with implementation ideas, and points about where documentation is likely to need updating. I have not changed any functional requirements, so I don't think this requires a change in story points. I hope these are useful pointers to help start the ball rolling on implementation work.

#4 Updated by Brett Smith almost 6 years ago

  • Description updated (diff)

#5 Updated by Radhika Chippada almost 6 years ago

commit 64027668d7b5bd47a863c86277c83961d8a29914

Implemented the items listed in the description. I chose option three and added a by_pdh optional argument to MagicDirectory since method overloading seemed to be the most straightforward and non-intrusive.

I spent a couple hours trying to see how I may write a test to pass by_pdh instead of by_id to the MagicDirectory based tests. I could not figure out where to improve existing test implementation to do this. So, did not add any tests for this.

#6 Updated by Radhika Chippada almost 6 years ago

  • Status changed from New to In Progress

#7 Updated by Peter Amstutz almost 6 years ago

In MagicDirectory, suggest calling it pdh_only instead of by_pdh.

This statement:

        if self.by_pdh and uuid_pattern.match(k):
            raise llfuse.FUSEError(errno.ENOENT)

should be folded into the existing conditional. Also it should return False, not raise an exception:

         if not portable_data_hash_pattern.match(k) and (self.pdh_only or not uuid_pattern.match(k)):
             return False

I don't think it's necessary or desirable to modify the behavior of arv-mount --all based on --by-pdh. This means you can revert the changes to the text of --all (line 35) and behavior of e._entries['by_id'] (line 135).

For testing, suggest starting from FuseMagicTest

  1. make_mount takes keyword arguments, which are passed through, so you can write tests for each of make_mount(fuse.MagicDirectory, by_pdh=True) and make_mount(fuse.MagicDirectory, by_pdh=False)
  2. self.api.collections().create()... should return an API object with a UUID of the collection
  3. When by_pdh is False, test that the collection is accessible by both the PDH and the UUID
  4. When by_pdh is True, perform the same test except check that the collection is only accessable by PDH and fails when accessing by UUID

#8 Updated by Peter Amstutz almost 6 years ago

  • I figured out why FuseMagicTestPDHOnly is failing.

Line 514 of fusedir.py:

                self._entries['by_id'] = self.inodes.add_entry(MagicDirectory(
                        self.inode, self.inodes, self.api, self.num_retries))

You need to add self.pdh_only:

                self._entries['by_id'] = self.inodes.add_entry(MagicDirectory(
                        self.inode, self.inodes, self.api, self.num_retries, self.pdh_only))
  • Can you add the same access-by-UUID test to FuseMagicTest? Except it should succeed, of course.
  • tests.test_mount.MagicDirApiError is failing for me:
    ERROR: runTest (tests.test_mount.MagicDirApiError)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/home/peter/work/arvados/services/fuse/tests/test_mount.py", line 1030, in runTest
        llfuse.listdir(os.path.join(self.mounttmp, self.testcollection))
      File "llfuse/fuse_api.pxi", line 43, in llfuse.capi.listdir (src/llfuse/capi_linux.c:22621)
    OSError: [Errno 2] No such file or directory: '/tmp/tmpBpoluv/97d180c4f916faf61fb3d64aa2263961+52'
    

#9 Updated by Peter Amstutz almost 6 years ago

tests.test_mount.MagicDirApiError seems to be failing in master as well for me, so it is likely unrelated to this change.

#10 Updated by Radhika Chippada almost 6 years ago

Thanks for the help with line 514 of fusedir.py. The test is now working.

I also added the other test variations (by_id, pdh_only=False) etc that I was holding back until the main test passed.

All fuse tests are passing for me.

#11 Updated by Peter Amstutz almost 6 years ago

768f114b1746dddec0f0224dee92a58c8ddc5499 Looks good to me. Please merge.

#12 Updated by Radhika Chippada almost 6 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 50 to 100

Applied in changeset arvados|commit:3cdc4e8e3668b783f42f887f6a8662b66ecb7f9a.

#13 Updated by Joshua Randall almost 6 years ago

This change appears to be causing a problem for me - I appear to have a crunch-job that passes `--by-pdh` to arv-mount, but my arv-mount does not support that option so it exits with an error. I guess this change requires that crunch-job and arv-mount be upgraded in sync. I think in my case arv-mount is coming from the Ubuntu apt packages, and I can confirm that now that I've updated python-arvados-fuse on all nodes, this issue appears to have been resolved.

#14 Updated by Brett Smith almost 6 years ago

Sorry for the trouble, Josh. Getting cross-component dependencies declared correctly is always a challenge. I should've given you a heads-up about that when we discussed upgrading yesterday.

Also available in: Atom PDF