Bug #11209
closedstuck keep fuse mounts not cleared by crunch-job
Description
crunch-job attempts to unmount any fuse filesystems that are mounted under $CRUNCH_TMP but it attempts to do so only using fusermount. Often on our system, this fails and a "umount -f <mount_point>" is required to make the node work again.
In addition, this often happens on multiple nodes at the same time - and by the time we have three nodes with wedged fuse mounts, they will rapidly fail all pending jobs. There seems to be no mechanism by which crunch dispatch can decide to stop trying to dispatch to a node that is broken.
Here is the log from a job that suffered from this issue.
dispatching job z8ta6-8i9sb-8mp2qww92moa644 {"docker_image"=>"mercury/gatk-3.5", "min_nodes"=>1, "max_tasks_per_node"=>10, "keep_cache_mb_per_task"=>1280} to humgen-05-07 z8ta6-7ekkf-sa1q59632vhxov6 {"total_cpu_cores":32,"total_ram_mb":257867,"total_scratch_mb":788561} 2017-02-28_17:23:33 salloc: Granted job allocation 17536 2017-02-28_17:23:33 58397 Sanity check is `/usr/bin/docker ps -q` 2017-02-28_17:23:33 58397 sanity check: start 2017-02-28_17:23:33 58397 stderr starting: ['srun','--nodes=1','--ntasks-per-node=1','/usr/bin/docker','ps','-q'] 2017-02-28_17:23:33 58397 sanity check: exit 0 2017-02-28_17:23:33 58397 Sanity check OK 2017-02-28_17:23:33 z8ta6-8i9sb-8mp2qww92moa644 58397 running from /var/www/arvados-api/shared/vendor_bundle/ruby/2.1.0/gems/arvados-cli-0.1.20170217221854/bin/crunch-job with arvados-cli Gem version(s) 0.1.20170217221854, 0.1.20161017193526, 0.1.20160503204200, 0.1.20151207150126, 0.1.20151023190001 2017-02-28_17:23:33 z8ta6-8i9sb-8mp2qww92moa644 58397 check slurm allocation 2017-02-28_17:23:33 z8ta6-8i9sb-8mp2qww92moa644 58397 node humgen-05-07 - 10 slots 2017-02-28_17:23:33 z8ta6-8i9sb-8mp2qww92moa644 58397 start 2017-02-28_17:23:34 z8ta6-8i9sb-8mp2qww92moa644 58397 clean work dirs: start 2017-02-28_17:23:34 z8ta6-8i9sb-8mp2qww92moa644 58397 stderr starting: ['srun','--nodelist=humgen-05-07','-D','/data/crunch-tmp','bash','-ec','-o','pipefail','mount -t fuse,fuse.keep | awk "(index(\\$3, \\"$CRUNCH_TMP\\") == 1){print \\$3}" | xargs -r -n 1 fusermount -u -z; sleep 1; rm -rf $JOB_WORK $CRUNCH_INSTALL $CRUNCH_TMP/task $CRUNCH_TMP/src* $CRUNCH_TMP/*.cid'] 2017-02-28_17:23:34 z8ta6-8i9sb-8mp2qww92moa644 58397 stderr fusermount: failed to unmount /data/crunch-tmp/crunch-job/task/humgen-05-07.10.keep: Invalid argument 2017-02-28_17:23:34 z8ta6-8i9sb-8mp2qww92moa644 58397 stderr srun: error: humgen-05-07: task 0: Exited with exit code 123 2017-02-28_17:23:34 z8ta6-8i9sb-8mp2qww92moa644 58397 clean work dirs: exit 123 2017-02-28_17:23:34 salloc: Relinquishing job allocation 17536 dispatching job z8ta6-8i9sb-8mp2qww92moa644 {"docker_image"=>"mercury/gatk-3.5", "min_nodes"=>1, "max_tasks_per_node"=>10, "keep_cache_mb_per_task"=>1280} to humgen-04-02 z8ta6-7ekkf-ekzlxvozts92sqm {"total_cpu_cores":40,"total_ram_mb":193289,"total_scratch_mb":68302106} 2017-02-28_17:23:35 salloc: error: Unable to allocate resources: Requested nodes are busy 2017-02-28_17:23:35 salloc: Job allocation 17539 has been revoked. dispatching job z8ta6-8i9sb-8mp2qww92moa644 {"docker_image"=>"mercury/gatk-3.5", "min_nodes"=>1, "max_tasks_per_node"=>10, "keep_cache_mb_per_task"=>1280} to humgen-05-03 z8ta6-7ekkf-1i1v5zotflg26jn {"total_cpu_cores":32,"total_ram_mb":257867,"total_scratch_mb":788561} 2017-02-28_17:23:36 salloc: Granted job allocation 17540 2017-02-28_17:23:36 58715 Sanity check is `/usr/bin/docker ps -q` 2017-02-28_17:23:36 58715 sanity check: start 2017-02-28_17:23:36 58715 stderr starting: ['srun','--nodes=1','--ntasks-per-node=1','/usr/bin/docker','ps','-q'] 2017-02-28_17:23:36 58715 sanity check: exit 0 2017-02-28_17:23:36 58715 Sanity check OK 2017-02-28_17:23:38 z8ta6-8i9sb-8mp2qww92moa644 58715 running from /var/www/arvados-api/shared/vendor_bundle/ruby/2.1.0/gems/arvados-cli-0.1.20170217221854/bin/crunch-job with arvados-cli Gem version(s) 0.1.20170217221854, 0.1.20161017193526, 0.1.20160503204200, 0.1.20151207150126, 0.1.20151023190001 2017-02-28_17:23:38 z8ta6-8i9sb-8mp2qww92moa644 58715 check slurm allocation 2017-02-28_17:23:38 z8ta6-8i9sb-8mp2qww92moa644 58715 node humgen-05-03 - 10 slots 2017-02-28_17:23:38 z8ta6-8i9sb-8mp2qww92moa644 58715 start 2017-02-28_17:23:38 z8ta6-8i9sb-8mp2qww92moa644 58715 clean work dirs: start 2017-02-28_17:23:38 z8ta6-8i9sb-8mp2qww92moa644 58715 stderr starting: ['srun','--nodelist=humgen-05-03','-D','/data/crunch-tmp','bash','-ec','-o','pipefail','mount -t fuse,fuse.keep | awk "(index(\\$3, \\"$CRUNCH_TMP\\") == 1){print \\$3}" | xargs -r -n 1 fusermount -u -z; sleep 1; rm -rf $JOB_WORK $CRUNCH_INSTALL $CRUNCH_TMP/task $CRUNCH_TMP/src* $CRUNCH_TMP/*.cid'] 2017-02-28_17:23:38 z8ta6-8i9sb-8mp2qww92moa644 58715 stderr fusermount: failed to unmount /data/crunch-tmp/crunch-job/task/humgen-05-03.4.keep: Invalid argument 2017-02-28_17:23:38 z8ta6-8i9sb-8mp2qww92moa644 58715 stderr srun: error: humgen-05-03: task 0: Exited with exit code 123 2017-02-28_17:23:38 z8ta6-8i9sb-8mp2qww92moa644 58715 clean work dirs: exit 123 2017-02-28_17:23:38 salloc: Relinquishing job allocation 17540 2017-02-28_17:23:38 close failed in file object destructor: 2017-02-28_17:23:38 sys.excepthook is missing 2017-02-28_17:23:38 lost sys.stderr dispatching job z8ta6-8i9sb-8mp2qww92moa644 {"docker_image"=>"mercury/gatk-3.5", "min_nodes"=>1, "max_tasks_per_node"=>10, "keep_cache_mb_per_task"=>1280} to humgen-04-02 z8ta6-7ekkf-ekzlxvozts92sqm {"total_cpu_cores":40,"total_ram_mb":193289,"total_scratch_mb":68302106} 2017-02-28_17:23:40 salloc: Granted job allocation 17544 2017-02-28_17:23:40 58985 Sanity check is `/usr/bin/docker ps -q` 2017-02-28_17:23:40 58985 sanity check: start 2017-02-28_17:23:40 58985 stderr starting: ['srun','--nodes=1','--ntasks-per-node=1','/usr/bin/docker','ps','-q'] 2017-02-28_17:23:40 58985 sanity check: exit 0 2017-02-28_17:23:40 58985 Sanity check OK 2017-02-28_17:23:41 z8ta6-8i9sb-8mp2qww92moa644 58985 running from /var/www/arvados-api/shared/vendor_bundle/ruby/2.1.0/gems/arvados-cli-0.1.20170217221854/bin/crunch-job with arvados-cli Gem version(s) 0.1.20170217221854, 0.1.20161017193526, 0.1.20160503204200, 0.1.20151207150126, 0.1.20151023190001 2017-02-28_17:23:41 z8ta6-8i9sb-8mp2qww92moa644 58985 check slurm allocation 2017-02-28_17:23:41 z8ta6-8i9sb-8mp2qww92moa644 58985 node humgen-04-02 - 10 slots 2017-02-28_17:23:41 z8ta6-8i9sb-8mp2qww92moa644 58985 start 2017-02-28_17:23:41 z8ta6-8i9sb-8mp2qww92moa644 58985 clean work dirs: start 2017-02-28_17:23:41 z8ta6-8i9sb-8mp2qww92moa644 58985 stderr starting: ['srun','--nodelist=humgen-04-02','-D','/data/crunch-tmp','bash','-ec','-o','pipefail','mount -t fuse,fuse.keep | awk "(index(\\$3, \\"$CRUNCH_TMP\\") == 1){print \\$3}" | xargs -r -n 1 fusermount -u -z; sleep 1; rm -rf $JOB_WORK $CRUNCH_INSTALL $CRUNCH_TMP/task $CRUNCH_TMP/src* $CRUNCH_TMP/*.cid'] 2017-02-28_17:23:41 z8ta6-8i9sb-8mp2qww92moa644 58985 stderr fusermount: failed to unmount /data/crunch-tmp/crunch-job/task/humgen-04-02.9.keep: Invalid argument 2017-02-28_17:23:41 z8ta6-8i9sb-8mp2qww92moa644 58985 stderr srun: error: humgen-04-02: task 0: Exited with exit code 123 2017-02-28_17:23:41 z8ta6-8i9sb-8mp2qww92moa644 58985 clean work dirs: exit 123 2017-02-28_17:23:41 salloc: Relinquishing job allocation 17544 2017-02-28_17:23:41 close failed in file object destructor: 2017-02-28_17:23:41 sys.excepthook is missing 2017-02-28_17:23:41 lost sys.stderr
Updated by Tom Clegg over 7 years ago
Normally crunch-job frees up mount points using fusermount -u -z
but for some reason it isn't working here:
2017-02-28_17:23:34 z8ta6-8i9sb-8mp2qww92moa644 58397 clean work dirs: start 2017-02-28_17:23:34 z8ta6-8i9sb-8mp2qww92moa644 58397 stderr starting: ['srun','--nodelist=humgen-05-07','-D','/data/crunch-tmp','bash','-ec','-o','pipefail','mount -t fuse,fuse.keep | awk "(index(\\$3, \\"$CRUNCH_TMP\\") == 1){print \\$3}" | xargs -r -n 1 fusermount -u -z; sleep 1; rm -rf $JOB_WORK $CRUNCH_INSTALL $CRUNCH_TMP/task $CRUNCH_TMP/src* $CRUNCH_TMP/*.cid'] 2017-02-28_17:23:34 z8ta6-8i9sb-8mp2qww92moa644 58397 stderr fusermount: failed to unmount /data/crunch-tmp/crunch-job/task/humgen-05-07.10.keep: Invalid argument
Could this be https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=632258 ? (Looks similar, seems to have been fixed by upgrading fuse from 2.8.5-3 to 2.9.2-4.)
On a debian jessie and ubuntu xenial test systems:- writing 1 to /sys/fs/fuse/connections/ZZZ/abort (where ZZZ is the device minor number from /proc/self/mountinfo) kills arv-mount and puts the mountpoint in "transport endpoint is not connected" state, but has no effect at all on a mountpoint that's in that state already. (The fuse docs claim this is the way to kill a mount that "always works".)
- "umount", "umount -l", "umount -f" all fail EPERM
- "fusermount -z -u" always works
If "umount" needs root and "fusermount" doesn't work, I'm not sure what we should do. We could use a different mount point, but that would cause zombie mountpoints to accumulate over time, which could eventually put the system in an even worse state (although at least it would take longer to get there).
Updated by Tom Clegg over 7 years ago
- Category set to FUSE
- Status changed from New to In Progress
- Assigned To set to Tom Clegg
Updated by Joshua Randall over 7 years ago
When I run `umount -f` to clear the problem, it has always been as root. Never tried running it as any other user.
Updated by Tom Clegg over 7 years ago
The fuse bug seems to be related to a double-mounted mount point. Perhaps the trick is to avoid getting into this state by waiting for the mount to detach (perhaps by calling stat until it works) after calling "fusermount -u -z".
(This problem is occurring on systems with fuse≥2.9.2-4, where supposedly that bug is fixed -- but this seems like good race-prevention behavior anyway.)
Updated by Tom Clegg over 7 years ago
11209-unmount-replace @ 5752685c137c5e37e13845f5328e9a3930fa3100
This should let us replace the "mount|awk|grep|xargs fusermount;sleep"
script in crunch-job with "arv-mount --unmount $CRUNCH_TMP/..."
and ensure we don't try to proceed any further until all fuse mounts are detached.
Updated by Lucas Di Pentima over 7 years ago
- File
services/fuse/arvados_fuse/command.py
- Line 14: Can this line be eliminated because of line 15?
- Shouldn’t
self.args.replace
have the same semantics asself.args.unmount
regarding theunmount_all()
feature?
- Reusing
self.args.unmount_timeout
onunmount()
/unmount_all()
may be problematic as it seems that has a different meaning when used on__exit__
, for example it seems that if the user specifiesunmount_timeout=0
, the unmounting won’t have a timeout, and OTOH, the rest of the code seems to be using unmount_timeout=0 as "don't wait", right? - Using an "unmount_timeout < 0" would always produce a timeout exception without trying at least once to unmount.
- Should these new flags have their related tests?
Updated by Tom Clegg over 7 years ago
Lucas Di Pentima wrote:
- File
services/fuse/arvados_fuse/command.py
- Line 14: Can this line be eliminated because of line 15?
Sure, don't see why not.
- Shouldn’t
self.args.replace
have the same semantics asself.args.unmount
regarding theunmount_all()
feature?
The only difference is that "/path/..." means "/path and any mountpoint below it" in unmount_all(). So the question is about what should happen if someone runs
arv-mount --replace /path/...
I figure since we'll try to mount at the literal path "/path/..." we have to assume "/path/..." really means just "/path/..." and only unmount whatever we find at that specific path, not "everything under /path".
Does this make sense?
- Reusing
self.args.unmount_timeout
onunmount()
/unmount_all()
may be problematic as it seems that has a different meaning when used on__exit__
, for example it seems that if the user specifiesunmount_timeout=0
, the unmounting won’t have a timeout, and OTOH, the rest of the code seems to be using unmount_timeout=0 as "don't wait", right?- Using an "unmount_timeout < 0" would always produce a timeout exception without trying at least once to unmount.
Ah, yes, unmount(timeout=0) means "raise exception" which seems useless. Fixed so it always tries at least once.
- Should these new flags have their related tests?
I'm dreading finding new ways for threads/processes to deadlock and leave fuse in weird states ... but yes, it should be possible to make a test case that runs some arv-mount child processes and unmounts them with another.
Updated by Tom Clegg over 7 years ago
- remove superfluous crunchstat import
- unmount(timeout=0) tries unmount 1x
- test cases for --unmount and --replace
- fix missing import so --unmount and --replace actually work (thanks, new test cases!)
Updated by Lucas Di Pentima over 7 years ago
Tom Clegg wrote:
I figure since we'll try to mount at the literal path "/path/..." we have to assume "/path/..." really means just "/path/..." and only unmount whatever we find at that specific path, not "everything under /path".
Does this make sense?
It makes sense, and in that case, it brings me another doubt, if we use "/path/…" as a literal on the args.replace
case, shouldn’t we have to check if "/path/…" exists when using args.unmount
before asuming we’re trying to unmount all mounted dirs below "/path/"? Or maybe, if this convention is too confusing, use an additional flag for the recursive unmount feature?
I'm dreading finding new ways for threads/processes to deadlock and leave fuse in weird states ... but yes, it should be possible to make a test case that runs some arv-mount child processes and unmounts them with another.
I've run them on my local machine, and got some errors, for example:
====================================================================== ERROR: test_replace (tests.test_unmount.UnmountTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/lucas/arvados_local/services/fuse/tests/test_unmount.py", line 29, in test_replace '--exec', 'true']) File "/usr/lib/python2.7/subprocess.py", line 540, in check_call raise CalledProcessError(retcode, cmd) CalledProcessError: Command '['arv-mount', '--subtype', 'test', '--replace', '--unmount-timeout', '10', '/tmp/tmp1_nFm2', '--exec', 'true']' returned non-zero exit status 1 ====================================================================== ERROR: test_replace (tests.test_unmount.UnmountTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/lucas/arvados_local/services/fuse/tests/test_unmount.py", line 15, in tearDown super(UnmountTest, self).tearDown() File "/home/lucas/arvados_local/services/fuse/tests/integration_test.py", line 66, in tearDown os.rmdir(self.mnt) OSError: [Errno 16] Device or resource busy: '/tmp/tmp1_nFm2'
Updated by Tom Clegg over 7 years ago
Fixed a race condition in the tests, and a problem with the refactored "standalone mode" code (evidently it's critical to do DaemonContext() before subscribing to websocket). That might have caused the unmount tests to fail unreliably in b7a66.
"--unmount /path/..." is now "--unmount-all /path"
11209-unmount-replace @ 8b4d5991f9d5691b9fa2898d6f60eef8dbfdf987
Updated by Tom Clegg over 7 years ago
- Target version changed from 2017-03-29 sprint to 2017-04-12 sprint
Updated by Tom Clegg over 7 years ago
11209-unmount-subtype @ 75184884ed798b474e8b9e254045dc0f4354379e
Updated by Tom Clegg over 7 years ago
- Target version changed from 2017-04-12 sprint to 2017-04-26 sprint
Updated by Tom Clegg over 7 years ago
11209-crunch-unmount-all @ d64ed33e94700f8204ec8089c7b235cff918f9f7
2017-04-14_17:20:15 4xphq-8i9sb-5fhfjo3g28krpw5 1564 clean work dirs: start 2017-04-14_17:20:15 4xphq-8i9sb-5fhfjo3g28krpw5 1564 stderr starting: ['srun','--nodelist=compute1','-D','/tmp','bash','-ec',' arv-mount --unmount-timeout 10 --unmount-all ${CRUNCH_TMP} rm -rf ${JOB_WORK} ${CRUNCH_INSTALL} ${CRUNCH_TMP}/task ${CRUNCH_TMP}/src* ${CRUNCH_TMP}/*.cid '] 2017-04-14_17:20:16 4xphq-8i9sb-5fhfjo3g28krpw5 1564 clean work dirs: exit 0
-- https://workbench.4xphq.arvadosapi.com/jobs/4xphq-8i9sb-5fhfjo3g28krpw5#Log
Updated by Lucas Di Pentima over 7 years ago
Updated by Joshua Randall over 7 years ago
I now have a wedged arv-mount on one of my compute nodes on which I have the new arv-mount.
Unfortunately, the new `--unmount-all` option does not appear to clear the stuck mount:
root@humgen-05-13:~# arv-mount --version /usr/bin/arv-mount 0.1.20170407172413 root@humgen-05-13:~# mount -t fuse /dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-13.1.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch) root@humgen-05-13:~# arv-mount --unmount-all /data/crunch-tmp/crunch-job/task/humgen-05-13.1.keep root@humgen-05-13:~# mount -t fuse /dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-13.1.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch)
Updated by Joshua Randall over 7 years ago
I've just been looking through the code to try to figure this out and it looks like the issue is that the wedged mount is not showing up in /proc/self/mountinfo (neither for root nor for the crunch user):
root@humgen-05-13:~# mount -t fuse /dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-13.1.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch) root@humgen-05-13:~# cat /proc/self/mountinfo | grep fuse 23 17 0:17 / /sys/fs/fuse/connections rw,relatime - fusectl none rw
crunch@humgen-05-13:/$ cat /proc/self/mountinfo | grep fuse 23 17 0:17 / /sys/fs/fuse/connections rw,relatime - fusectl none rw crunch@humgen-05-13:/$ mount -t fuse /dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-13.1.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch)
However, running with `--replace` does seem to work:
crunch@humgen-05-13:/$ arv-mount /data/crunch-tmp/crunch-job/task/humgen-05-13.1.keep mount: according to mtab, /dev/fuse is already mounted on /data/crunch-tmp/crunch-job/task/humgen-05-13.1.keep 2017-04-19 00:10:46 arvados.arv-mount[20686] ERROR: arv-mount: exception during mount: fuse_mount failed Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/arvados_fuse/command.py", line 365, in _run_standalone with self: File "/usr/lib/python2.7/dist-packages/arvados_fuse/command.py", line 133, in __enter__ llfuse.init(self.operations, self.args.mountpoint, self._fuse_options()) File "llfuse/fuse_api.pxi", line 253, in llfuse.capi.init (src/llfuse/capi_linux.c:24362) RuntimeError: fuse_mount failed crunch@humgen-05-13:/$ arv-mount --replace /data/crunch-tmp/crunch-job/task/humgen-05-13.1.keep crunch@humgen-05-13:/$ mount -t fuse /dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-13.1.keep type fuse (rw,nosuid,nodev,max_read=131072,user=crunch) crunch@humgen-05-13:/$ ls /data/crunch-tmp/crunch-job/task/humgen-05-13.1.keep by_id by_tag home README shared
Updated by Joshua Randall over 7 years ago
Another wedged node, this time arv-mount --unmount did work:
crunch@humgen-02-02:/$ mount -t fuse /dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-02-02.4.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch) crunch@humgen-02-02:/$ ls /data/crunch-tmp/crunch-job/task/humgen-02-02.4.keep crunch@humgen-02-02:/$ arv-mount /data/crunch-tmp/crunch-job/task/humgen-02-02.4.keep mount: according to mtab, /dev/fuse is already mounted on /data/crunch-tmp/crunch-job/task/humgen-02-02.4.keep 2017-04-19 00:23:39 arvados.arv-mount[9218] ERROR: arv-mount: exception during mount: fuse_mount failed Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/arvados_fuse/command.py", line 365, in _run_standalone with self: File "/usr/lib/python2.7/dist-packages/arvados_fuse/command.py", line 133, in __enter__ llfuse.init(self.operations, self.args.mountpoint, self._fuse_options()) File "llfuse/fuse_api.pxi", line 253, in llfuse.capi.init (src/llfuse/capi_linux.c:24362) RuntimeError: fuse_mount failed crunch@humgen-02-02:/$ arv-mount --unmount /data/crunch-tmp/crunch-job/task/humgen-02-02.4.keep crunch@humgen-02-02:/$ ls /data/crunch-tmp/crunch-job/task/humgen-02-02.4.keep crunch@humgen-02-02:/$ mount -t fuse crunch@humgen-02-02:/$ arv-mount /data/crunch-tmp/crunch-job/task/humgen-02-02.4.keep crunch@humgen-02-02:/$ ls /data/crunch-tmp/crunch-job/task/humgen-02-02.4.keep by_id by_tag home README shared crunch@humgen-02-02:/$ arv-mount --version /usr/bin/arv-mount 0.1.20170407172413
Updated by Joshua Randall over 7 years ago
Another wedged node. On this one, neither --unmount-all nor --unmount worked until after I attempted to mount at the wedged mount point. After that attempt, the --unmount worked:
crunch@humgen-05-03:/$ mount -t fuse /dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-03.5.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch) crunch@humgen-05-03:/$ ls /data/crunch-tmp/crunch-job/task/humgen-05-03.5.keep crunch@humgen-05-03:/$ ps auxwww|grep arv-mount crunch 7047 0.0 0.0 9388 912 pts/2 S+ 00:27 0:00 grep arv-mount crunch@humgen-05-03:/$ arv-mount --unmount-all /data/crunch-tmp/crunch-job/task/humgen-05-03.5.keep crunch@humgen-05-03:/$ ps auxwww|grep arv-mount crunch 7140 0.0 0.0 9388 912 pts/2 S+ 00:28 0:00 grep arv-mount crunch@humgen-05-03:/$ mount -t fuse /dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-03.5.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch) crunch@humgen-05-03:/$ arv-mount --unmount /data/crunch-tmp/crunch-job/task/humgen-05-03.5.keep crunch@humgen-05-03:/$ mount -t fuse /dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-03.5.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch) crunch@humgen-05-03:/$ arv-mount /data/crunch-tmp/crunch-job/task/humgen-05-03.5.keep mount: according to mtab, /dev/fuse is already mounted on /data/crunch-tmp/crunch-job/task/humgen-05-03.5.keep 2017-04-19 00:29:12 arvados.arv-mount[7367] ERROR: arv-mount: exception during mount: fuse_mount failed Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/arvados_fuse/command.py", line 365, in _run_standalone with self: File "/usr/lib/python2.7/dist-packages/arvados_fuse/command.py", line 133, in __enter__ llfuse.init(self.operations, self.args.mountpoint, self._fuse_options()) File "llfuse/fuse_api.pxi", line 253, in llfuse.capi.init (src/llfuse/capi_linux.c:24362) RuntimeError: fuse_mount failed crunch@humgen-05-03:/$ arv-mount --unmount /data/crunch-tmp/crunch-job/task/humgen-05-03.5.keep crunch@humgen-05-03:/$ mount -t fuse crunch@humgen-05-03:/$ arv-mount /data/crunch-tmp/crunch-job/task/humgen-05-03.5.keep crunch@humgen-05-03:/$ ls /data/crunch-tmp/crunch-job/task/humgen-05-03.5.keep by_id by_tag home README shared crunch@humgen-05-03:/$ arv-mount --unmount /data/crunch-tmp/crunch-job/task/humgen-05-03.5.keep crunch@humgen-05-03:/$ ls /data/crunch-tmp/crunch-job/task/humgen-05-03.5.keep crunch@humgen-05-03:/$ mount -t fuse
Updated by Joshua Randall over 7 years ago
Found one last node that is wedged, and managed to do some more diagnosing. It looks like when it is wedged on our systems:
- There is an entry in /etc/mtab
- There is initially no entry in /proc/self/mountinfo so `arv-mount --unmount` and `arv-mount --unmount-all` fail
- After attempting (and failing) to mount over the existing mountpoint, the entry appears in /proc/self/mountinfo after which the `arv-mount --unmount` succeeds
crunch@humgen-05-16:/$ mount -t fuse /dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch) crunch@humgen-05-16:/$ cat /proc/self/mountinfo | grep fuse 23 17 0:17 / /sys/fs/fuse/connections rw,relatime - fusectl none rw crunch@humgen-05-16:/$ ls /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep/ crunch@humgen-05-16:/$ ps auxwww|grep arv-m crunch 31545 0.0 0.0 9388 912 pts/2 S+ 00:33 0:00 grep arv-m crunch@humgen-05-16:/$ arv-mount --unmount-all /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep crunch@humgen-05-16:/$ mount -t fuse /dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch) crunch@humgen-05-16:/$ arv-mount --unmount /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep crunch@humgen-05-16:/$ mount -t fuse /dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch) crunch@humgen-05-16:/$ cat /proc/self/mountinfo | grep humgen-05-16.2.keep crunch@humgen-05-16:/$ ls -l /sys/fs/fuse/connections/ total 0 crunch@humgen-05-16:/$ mount -t fuse /dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch) crunch@humgen-05-16:/$ fusermount -u -z /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep fusermount: failed to unmount /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep: Invalid argument crunch@humgen-05-16:/$ mount -t fuse /dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch) crunch@humgen-05-16:/$ python Python 2.7.3 (default, Oct 26 2016, 21:01:49) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from arvados_fuse.unmount import unmount >>> unmount(path='/data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep', subtype='', timeout=2.0, recursive=False) False >>> quit() crunch@humgen-05-16:/$ mount -t fuse /dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch) crunch@humgen-05-16:/$ python Python 2.7.3 (default, Oct 26 2016, 21:01:49) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from arvados_fuse.unmount import unmount >>> unmount(path='/data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep', subtype=None, timeout=2.0, recursive=False) False >>> quit() crunch@humgen-05-16:/$ mount -t fuse /dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch) crunch@humgen-05-16:/$ python Python 2.7.3 (default, Oct 26 2016, 21:01:49) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from arvados_fuse.unmount import unmount >>> unmount(path='/data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep', timeout=2.0) False >>> quit() crunch@humgen-05-16:/$ mount -t fuse /dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch) crunch@humgen-05-16:/$ cat /proc/self/mountinfo | grep humgen-05-16.2.keep crunch@humgen-05-16:/$ arv-mount /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep mount: according to mtab, /dev/fuse is already mounted on /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep 2017-04-19 00:45:00 arvados.arv-mount[2074] ERROR: arv-mount: exception during mount: fuse_mount failed Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/arvados_fuse/command.py", line 365, in _run_standalone with self: File "/usr/lib/python2.7/dist-packages/arvados_fuse/command.py", line 133, in __enter__ llfuse.init(self.operations, self.args.mountpoint, self._fuse_options()) File "llfuse/fuse_api.pxi", line 253, in llfuse.capi.init (src/llfuse/capi_linux.c:24362) RuntimeError: fuse_mount failed crunch@humgen-05-16:/$ cat /proc/self/mountinfo | grep humgen-05-16.2.keep 43 39 0:31 / /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep rw,nosuid,nodev,relatime - fuse /dev/fuse rw,user_id=15324,group_id=1593,max_read=131072 crunch@humgen-05-16:/$ mount -t fuse /dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch) crunch@humgen-05-16:/$ arv-mount --unmount /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep crunch@humgen-05-16:/$ mount -t fuse
Updated by Joshua Randall over 7 years ago
kernel and system versions:
root@humgen-05-13:~# uname -a Linux humgen-05-13 3.13.0-85-generic #129~precise1-Ubuntu SMP Fri Mar 18 17:38:08 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux root@humgen-05-13:~# lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 12.04.5 LTS Release: 12.04 Codename: precise
Updated by Joshua Randall over 7 years ago
Another set of machines were wedged today. Did some more testing on the call with Tom:
First, on humgen-02-02 we established that `arv-mount --replace` does NOT work initially (before a failed attempt to mount):
crunch@humgen-02-02:/$ cat /proc/self/mountinfo | grep fuse 23 17 0:17 / /sys/fs/fuse/connections rw,relatime - fusectl none rw crunch@humgen-02-02:/$ cat /etc/mtab /dev/sda6 / ext4 rw,relatime,errors=remount-ro,user_xattr 0 0 proc /proc proc rw,noexec,nosuid,nodev 0 0 sysfs /sys sysfs rw,noexec,nosuid,nodev 0 0 none /sys/fs/fuse/connections fusectl rw 0 0 none /sys/kernel/debug debugfs rw 0 0 none /sys/kernel/security securityfs rw 0 0 udev /dev devtmpfs rw,mode=0755 0 0 devpts /dev/pts devpts rw,noexec,nosuid,gid=5,mode=0620 0 0 tmpfs /run tmpfs rw,noexec,nosuid,size=10%,mode=0755 0 0 none /run/lock tmpfs rw,noexec,nosuid,nodev,size=5242880 0 0 none /run/shm tmpfs rw,nosuid,nodev 0 0 cgroup /sys/fs/cgroup tmpfs rw,relatime,mode=755 0 0 cgroup /sys/fs/cgroup/cpuset cgroup rw,relatime,cpuset 0 0 cgroup /sys/fs/cgroup/cpu cgroup rw,relatime,cpu 0 0 cgroup /sys/fs/cgroup/cpuacct cgroup rw,relatime,cpuacct 0 0 cgroup /sys/fs/cgroup/memory cgroup rw,relatime,memory 0 0 cgroup /sys/fs/cgroup/devices cgroup rw,relatime,devices 0 0 /dev/sda1 /boot ext4 rw,errors=remount-ro 0 0 /dev/mapper/data-1 /data ext4 rw 0 0 rpc_pipefs /run/rpc_pipefs rpc_pipefs rw 0 0 /dev/fuse /data/crunch-tmp/crunch-job/task/humgen-02-02.3.keep fuse rw,nosuid,nodev,allow_other,max_read=131072,user=crunch 0 0 crunch@humgen-02-02:/$ arv-mount --replace /data/crunch-tmp/crunch-job/task/humgen-02-02.3.keep mount: according to mtab, /dev/fuse is already mounted on /data/crunch-tmp/crunch-job/task/humgen-02-02.3.keep 2017-04-20 16:26:27 arvados.arv-mount[61468] ERROR: arv-mount: exception during mount: fuse_mount failed Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/arvados_fuse/command.py", line 365, in _run_standalone with self: File "/usr/lib/python2.7/dist-packages/arvados_fuse/command.py", line 133, in __enter__ llfuse.init(self.operations, self.args.mountpoint, self._fuse_options()) File "llfuse/fuse_api.pxi", line 253, in llfuse.capi.init (src/llfuse/capi_linux.c:24362) RuntimeError: fuse_mount failed crunch@humgen-02-02:/$ cat /proc/self/mountinfo | grep fuse 23 17 0:17 / /sys/fs/fuse/connections rw,relatime - fusectl none rw 42 39 0:31 / /data/crunch-tmp/crunch-job/task/humgen-02-02.3.keep rw,nosuid,nodev,relatime - fuse /dev/fuse rw,user_id=15324,group_id=1593,max_read=131072 crunch@humgen-02-02:/$ arv-mount --replace /data/crunch-tmp/crunch-job/task/humgen-02-02.3.keep crunch@humgen-02-02:/$ mount -t fuse /dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-02-02.3.keep type fuse (rw,nosuid,nodev,max_read=131072,user=crunch) crunch@humgen-02-02:/$ fusermount -u /data/crunch-tmp/crunch-job/task/humgen-02-02.3.keep crunch@humgen-02-02:/$ mount -t fuse crunch@humgen-02-02:/$ exit
Then, on humgen-05-03, we discovered that you can use --subtype to mount a different fuse subtype on top of the old mountpoint. However, once unmounted the original "Transport endpoint is not connected" error returns (and there is still an entry in mtab but not in mountinfo until after attempting to mount). It does work to call `arv-mount --replace` twice in a row (first fails, second succeeds):
crunch@humgen-05-03:/$ cat /proc/self/mountinfo | grep fuse 24 17 0:18 / /sys/fs/fuse/connections rw,relatime - fusectl none rw crunch@humgen-05-03:/$ clear crunch@humgen-05-03:/$ arv-mount --subtype foo --replace /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep crunch@humgen-05-03:/$ ls /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep by_id by_tag home README shared crunch@humgen-05-03:/$ mount -t fuse.foo foo on /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep type fuse.foo (rw,nosuid,nodev,max_read=131072,user=crunch) crunch@humgen-05-03:/$ cat /etc/mtab |grep fuse none /sys/fs/fuse/connections fusectl rw 0 0 /dev/fuse /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep fuse rw,nosuid,nodev,allow_other,max_read=131072,user=crunch 0 0 foo /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep fuse.foo rw,nosuid,nodev,max_read=131072,user=crunch 0 0 crunch@humgen-05-03:/$ cat /proc/self/mountinfo |grep fuse 24 17 0:18 / /sys/fs/fuse/connections rw,relatime - fusectl none rw 43 40 0:31 / /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep rw,nosuid,nodev,relatime - fuse.foo foo rw,user_id=15324,group_id=1593,max_read=131072 crunch@humgen-05-03:/$ mount -t fuse.foo foo on /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep type fuse.foo (rw,nosuid,nodev,max_read=131072,user=crunch) crunch@humgen-05-03:/$ mount -t fuse /dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch) crunch@humgen-05-03:/$ arv-mount --subtype bar --replace /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep crunch@humgen-05-03:/$ mount -t fuse /dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch) crunch@humgen-05-03:/$ mount -t fuse.foo crunch@humgen-05-03:/$ mount -t fuse.bar bar on /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep type fuse.bar (rw,nosuid,nodev,max_read=131072,user=crunch) crunch@humgen-05-03:/$ ps auxwww|grep arv-m crunch 23232 1.5 0.0 474972 27840 ? Sl 16:34 0:00 /usr/bin/python2.7 /usr/bin/arv-mount --subtype bar --replace /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep crunch 23297 0.0 0.0 9388 916 pts/2 S+ 16:34 0:00 grep arv-m crunch@humgen-05-03:/$ mount -t fuse /dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch) crunch@humgen-05-03:/$ arv-mount --replace /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep mount: according to mtab, /dev/fuse is already mounted on /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep 2017-04-20 16:35:16 arvados.arv-mount[23408] ERROR: arv-mount: exception during mount: fuse_mount failed Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/arvados_fuse/command.py", line 365, in _run_standalone with self: File "/usr/lib/python2.7/dist-packages/arvados_fuse/command.py", line 133, in __enter__ llfuse.init(self.operations, self.args.mountpoint, self._fuse_options()) File "llfuse/fuse_api.pxi", line 253, in llfuse.capi.init (src/llfuse/capi_linux.c:24362) RuntimeError: fuse_mount failed crunch@humgen-05-03:/$ cat /proc/self/mountinfo | grep self crunch@humgen-05-03:/$ cat /proc/self/mountinfo | grep fuse 24 17 0:18 / /sys/fs/fuse/connections rw,relatime - fusectl none rw 43 40 0:31 / /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep rw,nosuid,nodev,relatime - fuse /dev/fuse rw,user_id=15324,group_id=1593,max_read=131072 crunch@humgen-05-03:/$ ls /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep ls: cannot access /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep: Transport endpoint is not connected crunch@humgen-05-03:/$ ls /sys/fs/fuse/connections/31/ abort congestion_threshold max_background waiting crunch@humgen-05-03:/$ ls /sys/fs/fuse/connections/31/ abort congestion_threshold max_background waiting crunch@humgen-05-03:/$ echo "1" | /sys/fs/fuse/connections/31/abort -su: /sys/fs/fuse/connections/31/abort: Permission denied crunch@humgen-05-03:/$ echo "1" > /sys/fs/fuse/connections/31/abort crunch@humgen-05-03:/$ cat /proc/self/mountinfo | grep fuse 24 17 0:18 / /sys/fs/fuse/connections rw,relatime - fusectl none rw 43 40 0:31 / /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep rw,nosuid,nodev,relatime - fuse /dev/fuse rw,user_id=15324,group_id=1593,max_read=131072 crunch@humgen-05-03:/$ ps auxwww|grep arv-m crunch 24618 0.0 0.0 9388 912 pts/2 S+ 16:39 0:00 grep arv-m crunch@humgen-05-03:/$ ls /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep ls: cannot access /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep: Transport endpoint is not connected crunch@humgen-05-03:/$ cat /etc/mtab | grep fuse none /sys/fs/fuse/connections fusectl rw 0 0 /dev/fuse /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep fuse rw,nosuid,nodev,allow_other,max_read=131072,user=crunch 0 0 crunch@humgen-05-03:/$ cat /etc/mtab | grep fuse none /sys/fs/fuse/connections fusectl rw 0 0 /dev/fuse /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep fuse rw,nosuid,nodev,allow_other,max_read=131072,user=crunch 0 0 crunch@humgen-05-03:/$ fusermount -u -z /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep crunch@humgen-05-03:/$ cat /etc/mtab | grep fuse none /sys/fs/fuse/connections fusectl rw 0 0 crunch@humgen-05-03:/$ mount -t fuse crunch@humgen-05-03:/$ exit
Finally, on humgen-05-10, we found that it does work to manually remove the offending line from /etc/mtab (as root) and then just going ahead with the arv-mount succeeds. This may suggest a race condition in updating /etc/mtab is what is causing the underlying problem?
crunch@humgen-05-10:/$ cat /etc/mtab /dev/sda6 / ext4 rw 0 0 proc /proc proc rw,noexec,nosuid,nodev 0 0 sysfs /sys sysfs rw,noexec,nosuid,nodev 0 0 none /sys/fs/fuse/connections fusectl rw 0 0 none /sys/kernel/debug debugfs rw 0 0 none /sys/kernel/security securityfs rw 0 0 udev /dev devtmpfs rw,mode=0755 0 0 devpts /dev/pts devpts rw,noexec,nosuid,gid=5,mode=0620 0 0 tmpfs /run tmpfs rw,noexec,nosuid,size=10%,mode=0755 0 0 none /run/lock tmpfs rw,noexec,nosuid,nodev,size=5242880 0 0 none /run/shm tmpfs rw,nosuid,nodev 0 0 cgroup /sys/fs/cgroup tmpfs rw,relatime,mode=755 0 0 cgroup /sys/fs/cgroup/cpuset cgroup rw,relatime,cpuset 0 0 cgroup /sys/fs/cgroup/cpu cgroup rw,relatime,cpu 0 0 cgroup /sys/fs/cgroup/cpuacct cgroup rw,relatime,cpuacct 0 0 cgroup /sys/fs/cgroup/memory cgroup rw,relatime,memory 0 0 cgroup /sys/fs/cgroup/devices cgroup rw,relatime,devices 0 0 cgroup /sys/fs/cgroup/freezer cgroup rw,relatime,freezer 0 0 cgroup /sys/fs/cgroup/blkio cgroup rw,relatime,blkio 0 0 cgroup /sys/fs/cgroup/perf_event cgroup rw,relatime,perf_event 0 0 cgroup /sys/fs/cgroup/hugetlb cgroup rw,relatime,hugetlb 0 0 /dev/sda7 /tmp ext4 rw 0 0 /dev/sda8 /data xfs rw 0 0 /dev/sda1 /boot ext4 rw,errors=remount-ro 0 0 rpc_pipefs /run/rpc_pipefs rpc_pipefs rw 0 0 /dev/fuse /data/crunch-tmp/crunch-job/task/humgen-05-10.7.keep fuse rw,nosuid,nodev,allow_other,max_read=131072,user=crunch 0 0 crunch@humgen-05-10:/$ cat /proc/self/mountinfo | grep fuse 23 17 0:17 / /sys/fs/fuse/connections rw,relatime - fusectl none rw crunch@humgen-05-10:/$ exit logout root@humgen-05-10:~# vi /etc/mtab ### MANUALLY REMOVE MTAB ENTRY FOR /data/crunch-tmp/crunch-job/task/humgen-05-10.7.keep root@humgen-05-10:~# su - crunch No directory, logging in with HOME=/ crunch@humgen-05-10:/$ export ARVADOS_API_HOST=api.arvados.sanger.ac.uk crunch@humgen-05-10:/$ export ARVADOS_API_TOKEN=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX crunch@humgen-05-10:/$ arv-mount /data/crunch-tmp/crunch-job/task/humgen-05-10.7.keep crunch@humgen-05-10:/$ ls -l /data/crunch-tmp/crunch-job/task/humgen-05-10.7.keep total 3 dr-xr-xr-x 1 crunch arvados 0 Apr 20 16:44 by_id dr-xr-xr-x 1 crunch arvados 0 Apr 20 16:44 by_tag dr-xr-xr-x 1 crunch arvados 0 Apr 20 16:44 home -r--r--r-- 1 crunch arvados 512 Apr 20 16:44 README dr-xr-xr-x 1 crunch arvados 0 Apr 20 16:44 shared
Updated by Joshua Randall over 7 years ago
There do generally seem to be race conditions with updating /etc/mtab by FUSE filesystems:
https://bugzilla.redhat.com/show_bug.cgi?id=651183
http://fuse.996288.n3.nabble.com/Security-Problem-in-fusermount-td12269.html
It may be a workaround for us to get rid of /etc/mtab and just symlink it to /proc/self/mounts as is done in newer systems?
Updated by Tom Clegg over 7 years ago
Joshua Randall wrote:
It may be a workaround for us to get rid of /etc/mtab and just symlink it to /proc/self/mounts as is done in newer systems?
Yes, this seems to be the way mtab-editing problems end up getting fixed. From https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=94076:
/etc/mtab is now a symlink to /proc/mounts. Bugs which were a result of editing /etc/mtab which make it get out of sync with the real kernel state are now no longer an issue.
Updated by Tom Clegg over 7 years ago
- Status changed from In Progress to Feedback
Updated by Tom Clegg over 7 years ago
- Target version deleted (
2017-04-26 sprint)