Feature #7939

[FUSE] If arv-mount cannot mount anything, report that on stderr, exit nonzero

Added by Brett Smith over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
FUSE
Target version:
Start date:
12/03/2015
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
1.0

Description

Today, arv-mount daemonizes very early, before it initializes FUSE or makes any API calls. If those later steps fail, arv-mount will not give the user any indication of that. They'll just see that nothing is actually mounted. Examples of things that can go wrong:

  • The user doesn't have permission to use FUSE
  • The user specified --allow-other but user_allow_other is not set in /etc/fuse.conf
  • The user has a bad Arvados client configuration (wrong API server; invalid API token)
  • The user tried to mount a collection that they can't read

When nothing can be mounted as a consequence of user or administrator error, that error should be reported on stderr, and arv-mount should exit nonzero, so the user immediately knows that there's been a problem, and ideally gets details about what that problem is. For the scope of this story, errors from uncaught exceptions are fine. Obviously we want better error reporting, but that can happen in a later story.

We've tried to fix this in the past, but it's non-trivial because our daemonization code closes all file descriptors by default, and FUSE initialization opens /dev/fuse, with no apparent API to find the file descriptor it's using. See f573c35a8f.

Fix:

  • Never daemonize (call self.daemon_ctx.open) until we're about to call llfuse.main().
  • When we construct the DaemonContext, pass preserve_files=range(3, number of inodes limit). Anything arv-mount opens, we trust it to need after daemonization.
  • Make a users/current API call immediately after parsing command-line arguments. This ensures that we can make API calls generally.
  • Write integration tests to simulate each of the above error conditions. It should be possible to just start arv-mount normally and test for a nonzero exit code. Ensure that you always unmount no matter what exit code you got.

Subtasks

Task #7999: Review 7939-arv-mount-errorsResolvedPeter Amstutz

Associated revisions

Revision 081e0ce2
Added by Peter Amstutz over 3 years ago

Merge branch '7939-arv-mount-errors' closes #7939

History

#1 Updated by Brett Smith over 3 years ago

  • Target version set to Arvados Future Sprints

#2 Updated by Brett Smith over 3 years ago

  • Subject changed from [FUSE] If FUSE initialization fails, report that on stderr, exit nonzero to [FUSE] If arv-mount cannot mount anything, report that on stderr, exit nonzero
  • Description updated (diff)
  • Target version deleted (Arvados Future Sprints)

#3 Updated by Brett Smith over 3 years ago

  • Target version set to Arvados Future Sprints

#4 Updated by Brett Smith over 3 years ago

  • Target version changed from Arvados Future Sprints to 2016-01-06 sprint

#5 Updated by Brett Smith over 3 years ago

  • Description updated (diff)

#6 Updated by Brett Smith over 3 years ago

  • Description updated (diff)
  • Story points set to 2.0

If the planned implementation fundamentally does not work, discuss that with the team immediately before embarking on an alternative implementation.

#7 Updated by Peter Amstutz over 3 years ago

  • Target version changed from 2016-01-06 sprint to 2015-12-16 sprint

#8 Updated by Peter Amstutz over 3 years ago

  • Status changed from New to In Progress
  • Assigned To set to Peter Amstutz
  • Story points changed from 2.0 to 1.0

#9 Updated by Brett Smith over 3 years ago

Reviewing 98acd11. Unfortunately, this branch does not implement the story as specified.

Please add the specified tests. In case you didn't know, exit and sys.exit just raise a SystemExit exception. Tests could mock or reconfigure things to intentionally fail, then call the main method and assert it raises SystemExit. The exception's code attribute will be the exit code.

Please construct the range as specified, and use the system's rlimit as the end, rather than an arbitrary constant. You want resource.getrlimit(resource.RLIMIT_NOFILE).

Please always pass the range as preserve_files, and don't go searching for the FUSE file descriptor. The current branch breaks --logfile when daemonizing, because it doesn't pass the logfile file descriptor in preserve_files. You could manually add that to the current code, of course, but that just means that someone else is likely to trip up on this if they add a feature that opens a file descriptor in the future. Like the description says, "Anything arv-mount opens, we trust it to need after daemonization." Doing the naïve thing is less code, meaning less bugs, and more future-proof.

Thanks.

#10 Updated by Peter Amstutz over 3 years ago

Brett Smith wrote:

Reviewing 98acd11. Unfortunately, this branch does not implement the story as specified.

Please add the specified tests. In case you didn't know, exit and sys.exit just raise a SystemExit exception. Tests could mock or reconfigure things to intentionally fail, then call the main method and assert it raises SystemExit. The exception's code attribute will be the exit code.

Done.

Please construct the range as specified, and use the system's rlimit as the end, rather than an arbitrary constant. You want resource.getrlimit(resource.RLIMIT_NOFILE).

Please always pass the range as preserve_files, and don't go searching for the FUSE file descriptor. The current branch breaks --logfile when daemonizing, because it doesn't pass the logfile file descriptor in preserve_files. You could manually add that to the current code, of course, but that just means that someone else is likely to trip up on this if they add a feature that opens a file descriptor in the future. Like the description says, "Anything arv-mount opens, we trust it to need after daemonization." Doing the naïve thing is less code, meaning less bugs, and more future-proof.

Done.

Now at d7d291b

#11 Updated by Brett Smith over 3 years ago

Reviewing d7d291b, and this branch is good. Thanks for all the follow-ups. The tests especially look nice.

There's one functional change I'm wondering about before merge: master uses self.args.mountpoint as the working directory for the daemon. This branch makes it dirname(self.args.mountpoint). Was this an intentional change? If so, what's the rationale behind it?

Beyond that, just a couple of small suggestions for niceties:

It would be nice to add a comment above the new users/current API call to explain why we're doing it. Otherwise it can look a lot like a noop to future readers.

In test_bogus_host, '100::' would be a better bogus host. The RFC says that traffic to 100::/64 must be immediately refused, so this way the test can't pass in case a local resolver gives an answer for 'example.null' or anything like that.

Thanks.

#12 Updated by Peter Amstutz over 3 years ago

Brett Smith wrote:

Reviewing d7d291b, and this branch is good. Thanks for all the follow-ups. The tests especially look nice.

There's one functional change I'm wondering about before merge: master uses self.args.mountpoint as the working directory for the daemon. This branch makes it dirname(self.args.mountpoint). Was this an intentional change? If so, what's the rationale behind it?

Yes. What happens is that before it would change into the mount directory prior to mounting meant the CWD of the daemon was associated with the inode of empty mount point on the host filesystem; changing to that directory after mounting causes it to be associated with the inode of the root of the FUSE filesystem, which then means it can't ever be unmounted because the directory is busy.

Beyond that, just a couple of small suggestions for niceties:

It would be nice to add a comment above the new users/current API call to explain why we're doing it. Otherwise it can look a lot like a noop to future readers.

Done.

In test_bogus_host, '100::' would be a better bogus host. The RFC says that traffic to 100::/64 must be immediately refused, so this way the test can't pass in case a local resolver gives an answer for 'example.null' or anything like that.

Done.

Thanks.

#13 Updated by Peter Amstutz over 3 years ago

  • Status changed from In Progress to Resolved

Applied in changeset arvados|commit:081e0ce2c669068f5684a5e0a30935294a90147f.

Also available in: Atom PDF