Project

General

Profile

Actions

Bug #12306

closed

[arv-mount] --unmount should work on an unresponsive mount

Added by Tom Clegg over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
-

Description

Currently, if an arv-mount process is in some deadlocked/stuck state, running arv-mount --unmount PATH just hangs instead of unmounting.

When this happens, echo 1 > /sys/fs/fuse/connections/NNN/abort revives the stuck unmount command.

It looks like arv-mount --unmount attempts to lstat() all mount points in /proc/self/mounts and lstat(stuck_mount_path) hangs.

This seems to be the fault of realpath() in source:services/fuse/arvados_fuse/unmount.py:

    while True:
        mounted = False
        for m in mountinfo():
            if m.is_fuse and (mnttype is None or mnttype == m.mnttype):
                try:
                    if os.path.realpath(m.path) == path:

On the shell node where this happened, where /home and /home/foo are both symlinks, arv-mount /home/foo/keep results in /data-sdd/foo/keep appearing in /proc/self/mountinfo, which means realpath() is superfluous here. (Is that true on all systems?)


Subtasks 1 (0 open1 closed)

Task #12564: Review 12306-dont-stat-mountsResolvedPeter Amstutz09/22/2017Actions

Related issues

Related to Arvados - Bug #11994: [arv-mount] Do not crash if /sys/fs/fuse/connections is emptyResolvedTom Clegg07/19/2017Actions
Related to Arvados - Bug #12538: crunch-run failing to terminate after completeResolvedPeter Amstutz11/06/2017Actions
Actions #1

Updated by Tom Clegg over 6 years ago

These stuck mounts come up occasionally on Jenkins. When they do, all builds get stuck ("UnmountTest" -- presumably because of this bug), until someone clears the stuck mounts manually using ".../connections/NNN/abort" or "fusermount -u -z".

Actions #2

Updated by Tom Morris over 6 years ago

  • Target version set to 2017-11-08 Sprint
Actions #3

Updated by Tom Morris over 6 years ago

  • Assigned To set to Tom Morris
Actions #4

Updated by Tom Morris over 6 years ago

  • Status changed from New to In Progress
  • Assigned To changed from Tom Morris to Tom Clegg
Actions #5

Updated by Tom Clegg over 6 years ago

Actions #6

Updated by Peter Amstutz over 6 years ago

So following symlinks to mounts seems weird and not something you would normally do, however, the other thing that realpath() does is turn a relative path into an absolute path, which is probably what we were really trying to use it for. So how about adding this back in?

    path = os.path.abspath(path)

(abspath doesn't use stat(), only get os.getcwd()).

Actions #7

Updated by Tom Clegg over 6 years ago

Peter Amstutz wrote:

So following symlinks to mounts seems weird and not something you would normally do

On our shell nodes $HOME is typically /home/username where /home is a symlink, so ~/keep doesn't appear in mountinfo but realpath(~/keep) does.

I wonder if it's worth implementing a more careful realpath() that can resolve ~/keep in such situations without calling lstat() on ~/keep itself. Seems like a bit of a rabbit hole, though.

(abspath doesn't use stat(), only get os.getcwd()).

Indeed, one less opportunity to fall into the realpath() hole. Added.

12306-dont-stat-mounts @ aabf1ca0e99701550f9af785e9f1fee098b0020a

Actions #8

Updated by Peter Amstutz over 6 years ago

Tom Clegg wrote:

Peter Amstutz wrote:

So following symlinks to mounts seems weird and not something you would normally do

On our shell nodes $HOME is typically /home/username where /home is a symlink, so /keep doesn't appear in mountinfo but realpath(/keep) does.

Got it. But does that mean arv-mount --umount won't actually work in this case, when you have a stuck mount which you are trying to unmount on a symlink path?

I wonder if it's worth implementing a more careful realpath() that can resolve ~/keep in such situations without calling lstat() on ~/keep itself. Seems like a bit of a rabbit hole, though.

How about calling realpath() on the parent directory and then joining it with the mount point?

Actions #9

Updated by Tom Clegg over 6 years ago

Indeed, the previous version would have ended up calling realpath() on ~/keep on a system where $HOME contains symlinks.

I think I made it back from the rabbit hole with a version that avoids calling realpath in those cases.

12306-dont-stat-mounts @ 08a4ebba0e5bfbc179103ac5e6916164bc8083fa

Actions #10

Updated by Peter Amstutz over 6 years ago

Tom Clegg wrote:

Indeed, the previous version would have ended up calling realpath() on ~/keep on a system where $HOME contains symlinks.

I think I made it back from the rabbit hole with a version that avoids calling realpath in those cases.

12306-dont-stat-mounts @ 08a4ebba0e5bfbc179103ac5e6916164bc8083fa

Tentatively, safer_realpath seems to work.

I just noticed that arv-mount --unmount requires an unnecessary API token:

$ arv-mount --unmount keep/
2017-11-08 09:49:38 arvados.arv-mount[7740] ERROR: Missing environment: 'ARVADOS_API_TOKEN'

Unmounting an arv-mount which is stuck with SIGSTOP does remove the mount but doesn't kill the daemon:

  1. arv-mount
  2. SIGSTOP
  3. arv-mount --unmount (works)
  4. SIGCONT
  5. arv-mount is still there

Could be a problem if it is occupying a lot of memory and refusing to go away on its own.

Actions #11

Updated by Peter Amstutz over 6 years ago

My preferred method to bring the hammer down:

  1. abort if available
  2. sigkill
  3. fusermount -u -z
Actions #12

Updated by Peter Amstutz over 6 years ago

Otherwise, the main goal of this bugfix (don't get stuck on realpath()) seems to be accomplished, so declare victory and merge.

LGTM

Actions #13

Updated by Anonymous over 6 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:0af053088c83d1107866cb06fd6c5736d9065eee.

Actions

Also available in: Atom PDF