Bug #11209

stuck keep fuse mounts not cleared by crunch-job

Added by Joshua Randall 21 days ago. Updated 7 minutes ago.

Status:In ProgressStart date:03/02/2017
Priority:NormalDue date:
Assignee:Tom Clegg% Done:

0%

Category:FUSE
Target version:2017-03-29 sprint
Story points-Remaining (hours)0.00 hour
Velocity based estimate-

Description

crunch-job attempts to unmount any fuse filesystems that are mounted under $CRUNCH_TMP but it attempts to do so only using fusermount. Often on our system, this fails and a "umount -f <mount_point>" is required to make the node work again.

In addition, this often happens on multiple nodes at the same time - and by the time we have three nodes with wedged fuse mounts, they will rapidly fail all pending jobs. There seems to be no mechanism by which crunch dispatch can decide to stop trying to dispatch to a node that is broken.

Here is the log from a job that suffered from this issue.

dispatching job z8ta6-8i9sb-8mp2qww92moa644 {"docker_image"=>"mercury/gatk-3.5", "min_nodes"=>1, "max_tasks_per_node"=>10, "keep_cache_mb_per_task"=>1280} to humgen-05-07 z8ta6-7ekkf-sa1q59632vhxov6 {"total_cpu_cores":32,"total_ram_mb":257867,"total_scratch_mb":788561}
2017-02-28_17:23:33 salloc: Granted job allocation 17536
2017-02-28_17:23:33 58397  Sanity check is `/usr/bin/docker ps -q`
2017-02-28_17:23:33 58397  sanity check: start
2017-02-28_17:23:33 58397  stderr starting: ['srun','--nodes=1','--ntasks-per-node=1','/usr/bin/docker','ps','-q']
2017-02-28_17:23:33 58397  sanity check: exit 0
2017-02-28_17:23:33 58397  Sanity check OK
2017-02-28_17:23:33 z8ta6-8i9sb-8mp2qww92moa644 58397  running from /var/www/arvados-api/shared/vendor_bundle/ruby/2.1.0/gems/arvados-cli-0.1.20170217221854/bin/crunch-job with arvados-cli Gem version(s) 0.1.20170217221854, 0.1.20161017193526, 0.1.20160503204200, 0.1.20151207150126, 0.1.20151023190001
2017-02-28_17:23:33 z8ta6-8i9sb-8mp2qww92moa644 58397  check slurm allocation
2017-02-28_17:23:33 z8ta6-8i9sb-8mp2qww92moa644 58397  node humgen-05-07 - 10 slots
2017-02-28_17:23:33 z8ta6-8i9sb-8mp2qww92moa644 58397  start
2017-02-28_17:23:34 z8ta6-8i9sb-8mp2qww92moa644 58397  clean work dirs: start
2017-02-28_17:23:34 z8ta6-8i9sb-8mp2qww92moa644 58397  stderr starting: ['srun','--nodelist=humgen-05-07','-D','/data/crunch-tmp','bash','-ec','-o','pipefail','mount -t fuse,fuse.keep | awk "(index(\\$3, \\"$CRUNCH_TMP\\") == 1){print \\$3}" | xargs -r -n 1 fusermount -u -z; sleep 1; rm -rf $JOB_WORK $CRUNCH_INSTALL $CRUNCH_TMP/task $CRUNCH_TMP/src* $CRUNCH_TMP/*.cid']
2017-02-28_17:23:34 z8ta6-8i9sb-8mp2qww92moa644 58397  stderr fusermount: failed to unmount /data/crunch-tmp/crunch-job/task/humgen-05-07.10.keep: Invalid argument
2017-02-28_17:23:34 z8ta6-8i9sb-8mp2qww92moa644 58397  stderr srun: error: humgen-05-07: task 0: Exited with exit code 123
2017-02-28_17:23:34 z8ta6-8i9sb-8mp2qww92moa644 58397  clean work dirs: exit 123
2017-02-28_17:23:34 salloc: Relinquishing job allocation 17536
dispatching job z8ta6-8i9sb-8mp2qww92moa644 {"docker_image"=>"mercury/gatk-3.5", "min_nodes"=>1, "max_tasks_per_node"=>10, "keep_cache_mb_per_task"=>1280} to humgen-04-02 z8ta6-7ekkf-ekzlxvozts92sqm {"total_cpu_cores":40,"total_ram_mb":193289,"total_scratch_mb":68302106}
2017-02-28_17:23:35 salloc: error: Unable to allocate resources: Requested nodes are busy
2017-02-28_17:23:35 salloc: Job allocation 17539 has been revoked.
dispatching job z8ta6-8i9sb-8mp2qww92moa644 {"docker_image"=>"mercury/gatk-3.5", "min_nodes"=>1, "max_tasks_per_node"=>10, "keep_cache_mb_per_task"=>1280} to humgen-05-03 z8ta6-7ekkf-1i1v5zotflg26jn {"total_cpu_cores":32,"total_ram_mb":257867,"total_scratch_mb":788561}
2017-02-28_17:23:36 salloc: Granted job allocation 17540
2017-02-28_17:23:36 58715  Sanity check is `/usr/bin/docker ps -q`
2017-02-28_17:23:36 58715  sanity check: start
2017-02-28_17:23:36 58715  stderr starting: ['srun','--nodes=1','--ntasks-per-node=1','/usr/bin/docker','ps','-q']
2017-02-28_17:23:36 58715  sanity check: exit 0
2017-02-28_17:23:36 58715  Sanity check OK
2017-02-28_17:23:38 z8ta6-8i9sb-8mp2qww92moa644 58715  running from /var/www/arvados-api/shared/vendor_bundle/ruby/2.1.0/gems/arvados-cli-0.1.20170217221854/bin/crunch-job with arvados-cli Gem version(s) 0.1.20170217221854, 0.1.20161017193526, 0.1.20160503204200, 0.1.20151207150126, 0.1.20151023190001
2017-02-28_17:23:38 z8ta6-8i9sb-8mp2qww92moa644 58715  check slurm allocation
2017-02-28_17:23:38 z8ta6-8i9sb-8mp2qww92moa644 58715  node humgen-05-03 - 10 slots
2017-02-28_17:23:38 z8ta6-8i9sb-8mp2qww92moa644 58715  start
2017-02-28_17:23:38 z8ta6-8i9sb-8mp2qww92moa644 58715  clean work dirs: start
2017-02-28_17:23:38 z8ta6-8i9sb-8mp2qww92moa644 58715  stderr starting: ['srun','--nodelist=humgen-05-03','-D','/data/crunch-tmp','bash','-ec','-o','pipefail','mount -t fuse,fuse.keep | awk "(index(\\$3, \\"$CRUNCH_TMP\\") == 1){print \\$3}" | xargs -r -n 1 fusermount -u -z; sleep 1; rm -rf $JOB_WORK $CRUNCH_INSTALL $CRUNCH_TMP/task $CRUNCH_TMP/src* $CRUNCH_TMP/*.cid']
2017-02-28_17:23:38 z8ta6-8i9sb-8mp2qww92moa644 58715  stderr fusermount: failed to unmount /data/crunch-tmp/crunch-job/task/humgen-05-03.4.keep: Invalid argument
2017-02-28_17:23:38 z8ta6-8i9sb-8mp2qww92moa644 58715  stderr srun: error: humgen-05-03: task 0: Exited with exit code 123
2017-02-28_17:23:38 z8ta6-8i9sb-8mp2qww92moa644 58715  clean work dirs: exit 123
2017-02-28_17:23:38 salloc: Relinquishing job allocation 17540
2017-02-28_17:23:38 close failed in file object destructor:
2017-02-28_17:23:38 sys.excepthook is missing
2017-02-28_17:23:38 lost sys.stderr
dispatching job z8ta6-8i9sb-8mp2qww92moa644 {"docker_image"=>"mercury/gatk-3.5", "min_nodes"=>1, "max_tasks_per_node"=>10, "keep_cache_mb_per_task"=>1280} to humgen-04-02 z8ta6-7ekkf-ekzlxvozts92sqm {"total_cpu_cores":40,"total_ram_mb":193289,"total_scratch_mb":68302106}
2017-02-28_17:23:40 salloc: Granted job allocation 17544
2017-02-28_17:23:40 58985  Sanity check is `/usr/bin/docker ps -q`
2017-02-28_17:23:40 58985  sanity check: start
2017-02-28_17:23:40 58985  stderr starting: ['srun','--nodes=1','--ntasks-per-node=1','/usr/bin/docker','ps','-q']
2017-02-28_17:23:40 58985  sanity check: exit 0
2017-02-28_17:23:40 58985  Sanity check OK
2017-02-28_17:23:41 z8ta6-8i9sb-8mp2qww92moa644 58985  running from /var/www/arvados-api/shared/vendor_bundle/ruby/2.1.0/gems/arvados-cli-0.1.20170217221854/bin/crunch-job with arvados-cli Gem version(s) 0.1.20170217221854, 0.1.20161017193526, 0.1.20160503204200, 0.1.20151207150126, 0.1.20151023190001
2017-02-28_17:23:41 z8ta6-8i9sb-8mp2qww92moa644 58985  check slurm allocation
2017-02-28_17:23:41 z8ta6-8i9sb-8mp2qww92moa644 58985  node humgen-04-02 - 10 slots
2017-02-28_17:23:41 z8ta6-8i9sb-8mp2qww92moa644 58985  start
2017-02-28_17:23:41 z8ta6-8i9sb-8mp2qww92moa644 58985  clean work dirs: start
2017-02-28_17:23:41 z8ta6-8i9sb-8mp2qww92moa644 58985  stderr starting: ['srun','--nodelist=humgen-04-02','-D','/data/crunch-tmp','bash','-ec','-o','pipefail','mount -t fuse,fuse.keep | awk "(index(\\$3, \\"$CRUNCH_TMP\\") == 1){print \\$3}" | xargs -r -n 1 fusermount -u -z; sleep 1; rm -rf $JOB_WORK $CRUNCH_INSTALL $CRUNCH_TMP/task $CRUNCH_TMP/src* $CRUNCH_TMP/*.cid']
2017-02-28_17:23:41 z8ta6-8i9sb-8mp2qww92moa644 58985  stderr fusermount: failed to unmount /data/crunch-tmp/crunch-job/task/humgen-04-02.9.keep: Invalid argument
2017-02-28_17:23:41 z8ta6-8i9sb-8mp2qww92moa644 58985  stderr srun: error: humgen-04-02: task 0: Exited with exit code 123
2017-02-28_17:23:41 z8ta6-8i9sb-8mp2qww92moa644 58985  clean work dirs: exit 123
2017-02-28_17:23:41 salloc: Relinquishing job allocation 17544
2017-02-28_17:23:41 close failed in file object destructor:
2017-02-28_17:23:41 sys.excepthook is missing
2017-02-28_17:23:41 lost sys.stderr


Subtasks

Task #11292: Review 11209-unmount-replaceIn ProgressTom Clegg

History

#1 Updated by Tom Clegg 10 days ago

Normally crunch-job frees up mount points using fusermount -u -z but for some reason it isn't working here:

2017-02-28_17:23:34 z8ta6-8i9sb-8mp2qww92moa644 58397  clean work dirs: start
2017-02-28_17:23:34 z8ta6-8i9sb-8mp2qww92moa644 58397  stderr starting: ['srun','--nodelist=humgen-05-07','-D','/data/crunch-tmp','bash','-ec','-o','pipefail','mount -t fuse,fuse.keep | awk "(index(\\$3, \\"$CRUNCH_TMP\\") == 1){print \\$3}" | xargs -r -n 1 fusermount -u -z; sleep 1; rm -rf $JOB_WORK $CRUNCH_INSTALL $CRUNCH_TMP/task $CRUNCH_TMP/src* $CRUNCH_TMP/*.cid']
2017-02-28_17:23:34 z8ta6-8i9sb-8mp2qww92moa644 58397  stderr fusermount: failed to unmount /data/crunch-tmp/crunch-job/task/humgen-05-07.10.keep: Invalid argument

Could this be https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=632258 ? (Looks similar, seems to have been fixed by upgrading fuse from 2.8.5-3 to 2.9.2-4.)

On a debian jessie and ubuntu xenial test systems:
  • writing 1 to /sys/fs/fuse/connections/ZZZ/abort (where ZZZ is the device minor number from /proc/self/mountinfo) kills arv-mount and puts the mountpoint in "transport endpoint is not connected" state, but has no effect at all on a mountpoint that's in that state already. (The fuse docs claim this is the way to kill a mount that "always works".)
  • "umount", "umount -l", "umount -f" all fail EPERM
  • "fusermount -z -u" always works

If "umount" needs root and "fusermount" doesn't work, I'm not sure what we should do. We could use a different mount point, but that would cause zombie mountpoints to accumulate over time, which could eventually put the system in an even worse state (although at least it would take longer to get there).

#2 Updated by Tom Clegg 10 days ago

  • Category set to FUSE
  • Status changed from New to In Progress
  • Assignee set to Tom Clegg

#3 Updated by Joshua Randall 8 days ago

When I run `umount -f` to clear the problem, it has always been as root. Never tried running it as any other user.

#4 Updated by Tom Clegg 8 days ago

The fuse bug seems to be related to a double-mounted mount point. Perhaps the trick is to avoid getting into this state by waiting for the mount to detach (perhaps by calling stat until it works) after calling "fusermount -u -z".

(This problem is occurring on systems with fuse≥2.9.2-4, where supposedly that bug is fixed -- but this seems like good race-prevention behavior anyway.)

#5 Updated by Tom Clegg 6 days ago

  • Target version set to 2017-03-29 sprint

#6 Updated by Tom Clegg 6 days ago

11209-unmount-replace @ 5752685c137c5e37e13845f5328e9a3930fa3100

This should let us replace the "mount|awk|grep|xargs fusermount;sleep" script in crunch-job with "arv-mount --unmount $CRUNCH_TMP/..." and ensure we don't try to proceed any further until all fuse mounts are detached.

#7 Updated by Lucas Di Pentima 7 minutes ago

  • File services/fuse/arvados_fuse/command.py
    • Line 14: Can this line be eliminated because of line 15?
    • Shouldn’t self.args.replace have the same semantics as self.args.unmount regarding the unmount_all() feature?
  • Reusing self.args.unmount_timeout on unmount()/unmount_all() may be problematic as it seems that has a different meaning when used on __exit__, for example it seems that if the user specifies unmount_timeout=0, the unmounting won’t have a timeout, and OTOH, the rest of the code seems to be using unmount_timeout=0 as "don't wait", right?
  • Using an "unmount_timeout < 0" would always produce a timeout exception without trying at least once to unmount.
  • Should these new flags have their related tests?

Also available in: Atom PDF