https://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422017-03-13T18:48:42ZArvadosArvados - Bug #11209: stuck keep fuse mounts not cleared by crunch-jobhttps://dev.arvados.org/issues/11209?journal_id=494092017-03-13T18:48:42ZTom Cleggtom@curii.com
<ul></ul><p>Normally crunch-job frees up mount points using <code>fusermount -u -z</code> but for some reason it isn't working here:</p>
<pre>
2017-02-28_17:23:34 <a href="https://arvadosapi.com/z8ta6-8i9sb-8mp2qww92moa644">z8ta6-8i9sb-8mp2qww92moa644</a> 58397 clean work dirs: start
2017-02-28_17:23:34 <a href="https://arvadosapi.com/z8ta6-8i9sb-8mp2qww92moa644">z8ta6-8i9sb-8mp2qww92moa644</a> 58397 stderr starting: ['srun','--nodelist=humgen-05-07','-D','/data/crunch-tmp','bash','-ec','-o','pipefail','mount -t fuse,fuse.keep | awk "(index(\\$3, \\"$CRUNCH_TMP\\") == 1){print \\$3}" | xargs -r -n 1 fusermount -u -z; sleep 1; rm -rf $JOB_WORK $CRUNCH_INSTALL $CRUNCH_TMP/task $CRUNCH_TMP/src* $CRUNCH_TMP/*.cid']
2017-02-28_17:23:34 <a href="https://arvadosapi.com/z8ta6-8i9sb-8mp2qww92moa644">z8ta6-8i9sb-8mp2qww92moa644</a> 58397 stderr fusermount: failed to unmount /data/crunch-tmp/crunch-job/task/humgen-05-07.10.keep: Invalid argument
</pre>
<p>Could this be <a class="external" href="https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=632258">https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=632258</a> ? (Looks similar, seems to have been fixed by upgrading fuse from 2.8.5-3 to 2.9.2-4.)</p>
On a debian jessie and ubuntu xenial test systems:
<ul>
<li>writing 1 to /sys/fs/fuse/connections/ZZZ/abort (where ZZZ is the device minor number from /proc/self/mountinfo) kills arv-mount and puts the mountpoint in "transport endpoint is not connected" state, but has no effect at all on a mountpoint that's in that state already. (The fuse docs claim this is the way to kill a mount that "always works".)</li>
<li>"umount", "umount -l", "umount -f" all fail EPERM</li>
<li>"fusermount -z -u" always works</li>
</ul>
<p>If "umount" needs root and "fusermount" doesn't work, I'm not sure what we should do. We could use a different mount point, but that would cause zombie mountpoints to accumulate over time, which could eventually put the system in an even worse state (although at least it would take longer to get there).</p> Arvados - Bug #11209: stuck keep fuse mounts not cleared by crunch-jobhttps://dev.arvados.org/issues/11209?journal_id=494102017-03-13T18:48:55ZTom Cleggtom@curii.com
<ul><li><strong>Category</strong> set to <i>FUSE</i></li><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li><li><strong>Assigned To</strong> set to <i>Tom Clegg</i></li></ul> Arvados - Bug #11209: stuck keep fuse mounts not cleared by crunch-jobhttps://dev.arvados.org/issues/11209?journal_id=494722017-03-15T11:15:10ZJoshua Randalljr17@sanger.ac.uk
<ul></ul><p>When I run `umount -f` to clear the problem, it has always been as root. Never tried running it as any other user.</p> Arvados - Bug #11209: stuck keep fuse mounts not cleared by crunch-jobhttps://dev.arvados.org/issues/11209?journal_id=494792017-03-15T14:54:49ZTom Cleggtom@curii.com
<ul></ul><p>The <a href="https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=632258" class="external">fuse bug</a> seems to be related to a double-mounted mount point. Perhaps the trick is to avoid getting into this state by waiting for the mount to detach (perhaps by calling stat until it works) after calling "fusermount -u -z".</p>
<p>(This problem is occurring on systems with fuse≥2.9.2-4, where supposedly that bug is fixed -- but this seems like good race-prevention behavior anyway.)</p> Arvados - Bug #11209: stuck keep fuse mounts not cleared by crunch-jobhttps://dev.arvados.org/issues/11209?journal_id=497392017-03-17T20:05:19ZTom Cleggtom@curii.com
<ul><li><strong>Target version</strong> set to <i>2017-03-29 sprint</i></li></ul> Arvados - Bug #11209: stuck keep fuse mounts not cleared by crunch-jobhttps://dev.arvados.org/issues/11209?journal_id=497442017-03-17T20:41:05ZTom Cleggtom@curii.com
<ul></ul><p>11209-unmount-replace @ <a class="changeset" title="11209: "--unmount /path/..." unmounts /path and all fuse mounts below it." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/5752685c137c5e37e13845f5328e9a3930fa3100">5752685c137c5e37e13845f5328e9a3930fa3100</a></p>
<p>This should let us replace the <code>"mount|awk|grep|xargs fusermount;sleep"</code> script in crunch-job with <code>"arv-mount --unmount $CRUNCH_TMP/..."</code> and ensure we don't try to proceed any further until all fuse mounts are detached.</p> Arvados - Bug #11209: stuck keep fuse mounts not cleared by crunch-jobhttps://dev.arvados.org/issues/11209?journal_id=499222017-03-23T18:19:57ZLucas Di Pentimalucas.dipentima@curii.com
<ul></ul><ul>
<li>File <code>services/fuse/arvados_fuse/command.py</code>
<ul>
<li>Line 14: Can this line be eliminated because of line 15?</li>
<li>Shouldn’t <code>self.args.replace</code> have the same semantics as <code>self.args.unmount</code> regarding the <code>unmount_all()</code> feature?</li>
</ul>
</li>
<li>Reusing <code>self.args.unmount_timeout</code> on <code>unmount()</code>/<code>unmount_all()</code> may be problematic as it seems that has a different meaning when used on <code>__exit__</code>, for example it seems that if the user specifies <code>unmount_timeout=0</code>, the unmounting won’t have a timeout, and OTOH, the rest of the code seems to be using unmount_timeout=0 as "don't wait", right?</li>
<li>Using an "unmount_timeout < 0" would always produce a timeout exception without trying at least once to unmount.</li>
<li>Should these new flags have their related tests?</li>
</ul> Arvados - Bug #11209: stuck keep fuse mounts not cleared by crunch-jobhttps://dev.arvados.org/issues/11209?journal_id=499492017-03-23T22:15:16ZTom Cleggtom@curii.com
<ul></ul><p>Lucas Di Pentima wrote:</p>
<blockquote>
<ul>
<li>File <code>services/fuse/arvados_fuse/command.py</code>
<ul>
<li>Line 14: Can this line be eliminated because of line 15?</li>
</ul></li>
</ul>
</blockquote>
<p>Sure, don't see why not.</p>
<blockquote>
<ul>
<li>Shouldn’t <code>self.args.replace</code> have the same semantics as <code>self.args.unmount</code> regarding the <code>unmount_all()</code> feature?</li>
</ul>
</blockquote>
<p>The only difference is that "/path/..." means "/path and any mountpoint below it" in unmount_all(). So the question is about what should happen if someone runs</p>
<pre><code>arv-mount --replace /path/...</code></pre>
<p>I figure since we'll try to mount at the literal path "/path/..." we have to assume "/path/..." really means just "/path/..." and only unmount whatever we find at that specific path, not "everything under /path".</p>
<p>Does this make sense?</p>
<blockquote>
<ul>
<li>Reusing <code>self.args.unmount_timeout</code> on <code>unmount()</code>/<code>unmount_all()</code> may be problematic as it seems that has a different meaning when used on <code>__exit__</code>, for example it seems that if the user specifies <code>unmount_timeout=0</code>, the unmounting won’t have a timeout, and OTOH, the rest of the code seems to be using unmount_timeout=0 as "don't wait", right?</li>
<li>Using an "unmount_timeout < 0" would always produce a timeout exception without trying at least once to unmount.</li>
</ul>
</blockquote>
<p>Ah, yes, unmount(timeout=0) means "raise exception" which seems useless. Fixed so it always tries at least once.</p>
<blockquote>
<ul>
<li>Should these new flags have their related tests?</li>
</ul>
</blockquote>
<p>I'm dreading finding new ways for threads/processes to deadlock and leave fuse in weird states ... but yes, it should be possible to make a test case that runs some arv-mount child processes and unmounts them with another.</p> Arvados - Bug #11209: stuck keep fuse mounts not cleared by crunch-jobhttps://dev.arvados.org/issues/11209?journal_id=499642017-03-24T20:03:49ZTom Cleggtom@curii.com
<ul></ul>11209-unmount-replace @ <a class="changeset" title="11209: Remove unused imports." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/b7a664f09052ac048e506bed9bb48b54bc2a9bd4">b7a664f09052ac048e506bed9bb48b54bc2a9bd4</a>
<ul>
<li>remove superfluous crunchstat import</li>
<li>unmount(timeout=0) tries unmount 1x</li>
<li>test cases for --unmount and --replace</li>
<li>fix missing import so --unmount and --replace actually work (thanks, new test cases!)</li>
</ul> Arvados - Bug #11209: stuck keep fuse mounts not cleared by crunch-jobhttps://dev.arvados.org/issues/11209?journal_id=499812017-03-27T18:29:31ZLucas Di Pentimalucas.dipentima@curii.com
<ul></ul><p>Tom Clegg wrote:</p>
<blockquote>
<p>I figure since we'll try to mount at the literal path "/path/..." we have to assume "/path/..." really means just "/path/..." and only unmount whatever we find at that specific path, not "everything under /path".<br />Does this make sense?</p>
</blockquote>
<p>It makes sense, and in that case, it brings me another doubt, if we use "/path/…" as a literal on the <code>args.replace</code> case, shouldn’t we have to check if "/path/…" exists when using <code>args.unmount</code> before asuming we’re trying to unmount all mounted dirs below "/path/"? Or maybe, if this convention is too confusing, use an additional flag for the recursive unmount feature?</p>
<blockquote>
<p>I'm dreading finding new ways for threads/processes to deadlock and leave fuse in weird states ... but yes, it should be possible to make a test case that runs some arv-mount child processes and unmounts them with another.</p>
</blockquote>
<p>I've run them on my local machine, and got some errors, for example:</p>
<pre>
======================================================================
ERROR: test_replace (tests.test_unmount.UnmountTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/lucas/arvados_local/services/fuse/tests/test_unmount.py", line 29, in test_replace
'--exec', 'true'])
File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '['arv-mount', '--subtype', 'test', '--replace', '--unmount-timeout', '10', '/tmp/tmp1_nFm2', '--exec', 'true']' returned non-zero exit status 1
======================================================================
ERROR: test_replace (tests.test_unmount.UnmountTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/lucas/arvados_local/services/fuse/tests/test_unmount.py", line 15, in tearDown
super(UnmountTest, self).tearDown()
File "/home/lucas/arvados_local/services/fuse/tests/integration_test.py", line 66, in tearDown
os.rmdir(self.mnt)
OSError: [Errno 16] Device or resource busy: '/tmp/tmp1_nFm2'
</pre> Arvados - Bug #11209: stuck keep fuse mounts not cleared by crunch-jobhttps://dev.arvados.org/issues/11209?journal_id=499942017-03-28T14:13:04ZTom Cleggtom@curii.com
<ul></ul><p>Fixed a race condition in the tests, and a problem with the refactored "standalone mode" code (evidently it's critical to do DaemonContext() before subscribing to websocket). That might have caused the unmount tests to fail unreliably in b7a66.</p>
<p>"--unmount /path/..." is now "--unmount-all /path"</p>
<p>11209-unmount-replace @ <a class="changeset" title="11209: Test using ./bin/arv-mount from source dir." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/8b4d5991f9d5691b9fa2898d6f60eef8dbfdf987">8b4d5991f9d5691b9fa2898d6f60eef8dbfdf987</a></p> Arvados - Bug #11209: stuck keep fuse mounts not cleared by crunch-jobhttps://dev.arvados.org/issues/11209?journal_id=500242017-03-28T17:33:43ZLucas Di Pentimalucas.dipentima@curii.com
<ul></ul><p>LGTM. All tests passing now.</p> Arvados - Bug #11209: stuck keep fuse mounts not cleared by crunch-jobhttps://dev.arvados.org/issues/11209?journal_id=501482017-03-29T19:08:01ZTom Cleggtom@curii.com
<ul><li><strong>Target version</strong> changed from <i>2017-03-29 sprint</i> to <i>2017-04-12 sprint</i></li></ul> Arvados - Bug #11209: stuck keep fuse mounts not cleared by crunch-jobhttps://dev.arvados.org/issues/11209?journal_id=504612017-04-07T07:43:33ZTom Cleggtom@curii.com
<ul></ul><p>11209-unmount-subtype @ <a class="changeset" title="11209: Restrict --unmount* operations to given --subtype. Add warnings about affecting other fus..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/75184884ed798b474e8b9e254045dc0f4354379e">75184884ed798b474e8b9e254045dc0f4354379e</a></p> Arvados - Bug #11209: stuck keep fuse mounts not cleared by crunch-jobhttps://dev.arvados.org/issues/11209?journal_id=504692017-04-07T14:09:26ZLucas Di Pentimalucas.dipentima@curii.com
<ul></ul><p>LGTM.</p> Arvados - Bug #11209: stuck keep fuse mounts not cleared by crunch-jobhttps://dev.arvados.org/issues/11209?journal_id=506822017-04-12T19:05:15ZTom Cleggtom@curii.com
<ul><li><strong>Target version</strong> changed from <i>2017-04-12 sprint</i> to <i>2017-04-26 sprint</i></li></ul> Arvados - Bug #11209: stuck keep fuse mounts not cleared by crunch-jobhttps://dev.arvados.org/issues/11209?journal_id=508122017-04-14T17:25:21ZTom Cleggtom@curii.com
<ul></ul><p>11209-crunch-unmount-all @ <a class="changeset" title="11209: Use arv-mount --unmount-all instead of mount|awk|xargs script to clean up stale mounts fro..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/d64ed33e94700f8204ec8089c7b235cff918f9f7">d64ed33e94700f8204ec8089c7b235cff918f9f7</a></p>
<pre>
2017-04-14_17:20:15 <a href="https://arvadosapi.com/4xphq-8i9sb-5fhfjo3g28krpw5">4xphq-8i9sb-5fhfjo3g28krpw5</a> 1564 clean work dirs: start
2017-04-14_17:20:15 <a href="https://arvadosapi.com/4xphq-8i9sb-5fhfjo3g28krpw5">4xphq-8i9sb-5fhfjo3g28krpw5</a> 1564 stderr starting: ['srun','--nodelist=compute1','-D','/tmp','bash','-ec',' arv-mount --unmount-timeout 10 --unmount-all ${CRUNCH_TMP} rm -rf ${JOB_WORK} ${CRUNCH_INSTALL} ${CRUNCH_TMP}/task ${CRUNCH_TMP}/src* ${CRUNCH_TMP}/*.cid ']
2017-04-14_17:20:16 <a href="https://arvadosapi.com/4xphq-8i9sb-5fhfjo3g28krpw5">4xphq-8i9sb-5fhfjo3g28krpw5</a> 1564 clean work dirs: exit 0
</pre><br />-- <a class="external" href="https://workbench.4xphq.arvadosapi.com/jobs/4xphq-8i9sb-5fhfjo3g28krpw5#Log">https://workbench.4xphq.arvadosapi.com/jobs/4xphq-8i9sb-5fhfjo3g28krpw5#Log</a> Arvados - Bug #11209: stuck keep fuse mounts not cleared by crunch-jobhttps://dev.arvados.org/issues/11209?journal_id=508282017-04-14T20:43:10ZLucas Di Pentimalucas.dipentima@curii.com
<ul></ul><p><a class="changeset" title="11209: Use arv-mount --unmount-all instead of mount|awk|xargs script to clean up stale mounts fro..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/d64ed33e94700f8204ec8089c7b235cff918f9f7">d64ed33e94700f8204ec8089c7b235cff918f9f7</a> LGTM.</p> Arvados - Bug #11209: stuck keep fuse mounts not cleared by crunch-jobhttps://dev.arvados.org/issues/11209?journal_id=508882017-04-18T22:53:24ZJoshua Randalljr17@sanger.ac.uk
<ul></ul><p>I now have a wedged arv-mount on one of my compute nodes on which I have the new arv-mount.</p>
<p>Unfortunately, the new `--unmount-all` option does not appear to clear the stuck mount: <br /><pre>
root@humgen-05-13:~# arv-mount --version
/usr/bin/arv-mount 0.1.20170407172413
root@humgen-05-13:~# mount -t fuse
/dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-13.1.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch)
root@humgen-05-13:~# arv-mount --unmount-all /data/crunch-tmp/crunch-job/task/humgen-05-13.1.keep
root@humgen-05-13:~# mount -t fuse
/dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-13.1.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch)
</pre></p> Arvados - Bug #11209: stuck keep fuse mounts not cleared by crunch-jobhttps://dev.arvados.org/issues/11209?journal_id=508892017-04-18T23:20:23ZJoshua Randalljr17@sanger.ac.uk
<ul></ul><p>I've just been looking through the code to try to figure this out and it looks like the issue is that the wedged mount is not showing up in /proc/self/mountinfo (neither for root nor for the crunch user):</p>
<pre>
root@humgen-05-13:~# mount -t fuse
/dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-13.1.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch)
root@humgen-05-13:~# cat /proc/self/mountinfo | grep fuse
23 17 0:17 / /sys/fs/fuse/connections rw,relatime - fusectl none rw
</pre>
<pre>
crunch@humgen-05-13:/$ cat /proc/self/mountinfo | grep fuse
23 17 0:17 / /sys/fs/fuse/connections rw,relatime - fusectl none rw
crunch@humgen-05-13:/$ mount -t fuse
/dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-13.1.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch)
</pre>
<p>However, running with `--replace` does seem to work:<br /><pre>
crunch@humgen-05-13:/$ arv-mount /data/crunch-tmp/crunch-job/task/humgen-05-13.1.keep
mount: according to mtab, /dev/fuse is already mounted on /data/crunch-tmp/crunch-job/task/humgen-05-13.1.keep
2017-04-19 00:10:46 arvados.arv-mount[20686] ERROR: arv-mount: exception during mount: fuse_mount failed
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/arvados_fuse/command.py", line 365, in _run_standalone
with self:
File "/usr/lib/python2.7/dist-packages/arvados_fuse/command.py", line 133, in __enter__
llfuse.init(self.operations, self.args.mountpoint, self._fuse_options())
File "llfuse/fuse_api.pxi", line 253, in llfuse.capi.init (src/llfuse/capi_linux.c:24362)
RuntimeError: fuse_mount failed
crunch@humgen-05-13:/$ arv-mount --replace /data/crunch-tmp/crunch-job/task/humgen-05-13.1.keep
crunch@humgen-05-13:/$ mount -t fuse
/dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-13.1.keep type fuse (rw,nosuid,nodev,max_read=131072,user=crunch)
crunch@humgen-05-13:/$ ls /data/crunch-tmp/crunch-job/task/humgen-05-13.1.keep
by_id by_tag home README shared
</pre></p> Arvados - Bug #11209: stuck keep fuse mounts not cleared by crunch-jobhttps://dev.arvados.org/issues/11209?journal_id=508902017-04-18T23:25:05ZJoshua Randalljr17@sanger.ac.uk
<ul></ul><p>Another wedged node, this time arv-mount --unmount did work:</p>
<pre>
crunch@humgen-02-02:/$ mount -t fuse
/dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-02-02.4.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch)
crunch@humgen-02-02:/$ ls /data/crunch-tmp/crunch-job/task/humgen-02-02.4.keep
crunch@humgen-02-02:/$ arv-mount /data/crunch-tmp/crunch-job/task/humgen-02-02.4.keep
mount: according to mtab, /dev/fuse is already mounted on /data/crunch-tmp/crunch-job/task/humgen-02-02.4.keep
2017-04-19 00:23:39 arvados.arv-mount[9218] ERROR: arv-mount: exception during mount: fuse_mount failed
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/arvados_fuse/command.py", line 365, in _run_standalone
with self:
File "/usr/lib/python2.7/dist-packages/arvados_fuse/command.py", line 133, in __enter__
llfuse.init(self.operations, self.args.mountpoint, self._fuse_options())
File "llfuse/fuse_api.pxi", line 253, in llfuse.capi.init (src/llfuse/capi_linux.c:24362)
RuntimeError: fuse_mount failed
crunch@humgen-02-02:/$ arv-mount --unmount /data/crunch-tmp/crunch-job/task/humgen-02-02.4.keep
crunch@humgen-02-02:/$ ls /data/crunch-tmp/crunch-job/task/humgen-02-02.4.keep
crunch@humgen-02-02:/$ mount -t fuse
crunch@humgen-02-02:/$ arv-mount /data/crunch-tmp/crunch-job/task/humgen-02-02.4.keep
crunch@humgen-02-02:/$ ls /data/crunch-tmp/crunch-job/task/humgen-02-02.4.keep
by_id by_tag home README shared
crunch@humgen-02-02:/$ arv-mount --version
/usr/bin/arv-mount 0.1.20170407172413
</pre> Arvados - Bug #11209: stuck keep fuse mounts not cleared by crunch-jobhttps://dev.arvados.org/issues/11209?journal_id=508912017-04-18T23:31:33ZJoshua Randalljr17@sanger.ac.uk
<ul></ul><p>Another wedged node. On this one, neither --unmount-all nor --unmount worked until after I attempted to mount at the wedged mount point. After that attempt, the --unmount worked:</p>
<pre>
crunch@humgen-05-03:/$ mount -t fuse
/dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-03.5.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch)
crunch@humgen-05-03:/$ ls /data/crunch-tmp/crunch-job/task/humgen-05-03.5.keep
crunch@humgen-05-03:/$ ps auxwww|grep arv-mount
crunch 7047 0.0 0.0 9388 912 pts/2 S+ 00:27 0:00 grep arv-mount
crunch@humgen-05-03:/$ arv-mount --unmount-all /data/crunch-tmp/crunch-job/task/humgen-05-03.5.keep
crunch@humgen-05-03:/$ ps auxwww|grep arv-mount
crunch 7140 0.0 0.0 9388 912 pts/2 S+ 00:28 0:00 grep arv-mount
crunch@humgen-05-03:/$ mount -t fuse
/dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-03.5.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch)
crunch@humgen-05-03:/$ arv-mount --unmount /data/crunch-tmp/crunch-job/task/humgen-05-03.5.keep
crunch@humgen-05-03:/$ mount -t fuse
/dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-03.5.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch)
crunch@humgen-05-03:/$ arv-mount /data/crunch-tmp/crunch-job/task/humgen-05-03.5.keep
mount: according to mtab, /dev/fuse is already mounted on /data/crunch-tmp/crunch-job/task/humgen-05-03.5.keep
2017-04-19 00:29:12 arvados.arv-mount[7367] ERROR: arv-mount: exception during mount: fuse_mount failed
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/arvados_fuse/command.py", line 365, in _run_standalone
with self:
File "/usr/lib/python2.7/dist-packages/arvados_fuse/command.py", line 133, in __enter__
llfuse.init(self.operations, self.args.mountpoint, self._fuse_options())
File "llfuse/fuse_api.pxi", line 253, in llfuse.capi.init (src/llfuse/capi_linux.c:24362)
RuntimeError: fuse_mount failed
crunch@humgen-05-03:/$ arv-mount --unmount /data/crunch-tmp/crunch-job/task/humgen-05-03.5.keep
crunch@humgen-05-03:/$ mount -t fuse
crunch@humgen-05-03:/$ arv-mount /data/crunch-tmp/crunch-job/task/humgen-05-03.5.keep
crunch@humgen-05-03:/$ ls /data/crunch-tmp/crunch-job/task/humgen-05-03.5.keep
by_id by_tag home README shared
crunch@humgen-05-03:/$ arv-mount --unmount /data/crunch-tmp/crunch-job/task/humgen-05-03.5.keep
crunch@humgen-05-03:/$ ls /data/crunch-tmp/crunch-job/task/humgen-05-03.5.keep
crunch@humgen-05-03:/$ mount -t fuse
</pre> Arvados - Bug #11209: stuck keep fuse mounts not cleared by crunch-jobhttps://dev.arvados.org/issues/11209?journal_id=508922017-04-18T23:50:17ZJoshua Randalljr17@sanger.ac.uk
<ul></ul><p>Found one last node that is wedged, and managed to do some more diagnosing. It looks like when it is wedged on our systems:<br /> - There is an entry in /etc/mtab<br /> - There is initially no entry in /proc/self/mountinfo so `arv-mount --unmount` and `arv-mount --unmount-all` fail<br /> - After attempting (and failing) to mount over the existing mountpoint, the entry appears in /proc/self/mountinfo after which the `arv-mount --unmount` succeeds</p>
<pre>
crunch@humgen-05-16:/$ mount -t fuse
/dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch)
crunch@humgen-05-16:/$ cat /proc/self/mountinfo | grep fuse
23 17 0:17 / /sys/fs/fuse/connections rw,relatime - fusectl none rw
crunch@humgen-05-16:/$ ls /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep/
crunch@humgen-05-16:/$ ps auxwww|grep arv-m
crunch 31545 0.0 0.0 9388 912 pts/2 S+ 00:33 0:00 grep arv-m
crunch@humgen-05-16:/$ arv-mount --unmount-all /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep
crunch@humgen-05-16:/$ mount -t fuse
/dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch)
crunch@humgen-05-16:/$ arv-mount --unmount /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep
crunch@humgen-05-16:/$ mount -t fuse
/dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch)
crunch@humgen-05-16:/$ cat /proc/self/mountinfo | grep humgen-05-16.2.keep
crunch@humgen-05-16:/$ ls -l /sys/fs/fuse/connections/
total 0
crunch@humgen-05-16:/$ mount -t fuse
/dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch)
crunch@humgen-05-16:/$ fusermount -u -z /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep
fusermount: failed to unmount /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep: Invalid argument
crunch@humgen-05-16:/$ mount -t fuse
/dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch)
crunch@humgen-05-16:/$ python
Python 2.7.3 (default, Oct 26 2016, 21:01:49)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from arvados_fuse.unmount import unmount
>>> unmount(path='/data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep', subtype='', timeout=2.0, recursive=False)
False
>>> quit()
crunch@humgen-05-16:/$ mount -t fuse
/dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch)
crunch@humgen-05-16:/$ python
Python 2.7.3 (default, Oct 26 2016, 21:01:49)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from arvados_fuse.unmount import unmount
>>> unmount(path='/data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep', subtype=None, timeout=2.0, recursive=False)
False
>>> quit()
crunch@humgen-05-16:/$ mount -t fuse
/dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch)
crunch@humgen-05-16:/$ python
Python 2.7.3 (default, Oct 26 2016, 21:01:49)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from arvados_fuse.unmount import unmount
>>> unmount(path='/data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep', timeout=2.0)
False
>>> quit()
crunch@humgen-05-16:/$ mount -t fuse
/dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch)
crunch@humgen-05-16:/$ cat /proc/self/mountinfo | grep humgen-05-16.2.keep
crunch@humgen-05-16:/$ arv-mount /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep
mount: according to mtab, /dev/fuse is already mounted on /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep
2017-04-19 00:45:00 arvados.arv-mount[2074] ERROR: arv-mount: exception during mount: fuse_mount failed
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/arvados_fuse/command.py", line 365, in _run_standalone
with self:
File "/usr/lib/python2.7/dist-packages/arvados_fuse/command.py", line 133, in __enter__
llfuse.init(self.operations, self.args.mountpoint, self._fuse_options())
File "llfuse/fuse_api.pxi", line 253, in llfuse.capi.init (src/llfuse/capi_linux.c:24362)
RuntimeError: fuse_mount failed
crunch@humgen-05-16:/$ cat /proc/self/mountinfo | grep humgen-05-16.2.keep
43 39 0:31 / /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep rw,nosuid,nodev,relatime - fuse /dev/fuse rw,user_id=15324,group_id=1593,max_read=131072
crunch@humgen-05-16:/$ mount -t fuse
/dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch)
crunch@humgen-05-16:/$ arv-mount --unmount /data/crunch-tmp/crunch-job/task/humgen-05-16.2.keep
crunch@humgen-05-16:/$ mount -t fuse
</pre> Arvados - Bug #11209: stuck keep fuse mounts not cleared by crunch-jobhttps://dev.arvados.org/issues/11209?journal_id=509242017-04-20T15:19:48ZJoshua Randalljr17@sanger.ac.uk
<ul></ul><p>kernel and system versions:</p>
<pre>
root@humgen-05-13:~# uname -a
Linux humgen-05-13 3.13.0-85-generic #129~precise1-Ubuntu SMP Fri Mar 18 17:38:08 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
root@humgen-05-13:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 12.04.5 LTS
Release: 12.04
Codename: precise
</pre> Arvados - Bug #11209: stuck keep fuse mounts not cleared by crunch-jobhttps://dev.arvados.org/issues/11209?journal_id=509282017-04-20T16:07:19ZJoshua Randalljr17@sanger.ac.uk
<ul></ul><p>Another set of machines were wedged today. Did some more testing on the call with Tom:</p>
<p>First, on humgen-02-02 we established that `arv-mount --replace` does NOT work initially (before a failed attempt to mount):<br /><pre>
crunch@humgen-02-02:/$ cat /proc/self/mountinfo | grep fuse
23 17 0:17 / /sys/fs/fuse/connections rw,relatime - fusectl none rw
crunch@humgen-02-02:/$ cat /etc/mtab
/dev/sda6 / ext4 rw,relatime,errors=remount-ro,user_xattr 0 0
proc /proc proc rw,noexec,nosuid,nodev 0 0
sysfs /sys sysfs rw,noexec,nosuid,nodev 0 0
none /sys/fs/fuse/connections fusectl rw 0 0
none /sys/kernel/debug debugfs rw 0 0
none /sys/kernel/security securityfs rw 0 0
udev /dev devtmpfs rw,mode=0755 0 0
devpts /dev/pts devpts rw,noexec,nosuid,gid=5,mode=0620 0 0
tmpfs /run tmpfs rw,noexec,nosuid,size=10%,mode=0755 0 0
none /run/lock tmpfs rw,noexec,nosuid,nodev,size=5242880 0 0
none /run/shm tmpfs rw,nosuid,nodev 0 0
cgroup /sys/fs/cgroup tmpfs rw,relatime,mode=755 0 0
cgroup /sys/fs/cgroup/cpuset cgroup rw,relatime,cpuset 0 0
cgroup /sys/fs/cgroup/cpu cgroup rw,relatime,cpu 0 0
cgroup /sys/fs/cgroup/cpuacct cgroup rw,relatime,cpuacct 0 0
cgroup /sys/fs/cgroup/memory cgroup rw,relatime,memory 0 0
cgroup /sys/fs/cgroup/devices cgroup rw,relatime,devices 0 0
/dev/sda1 /boot ext4 rw,errors=remount-ro 0 0
/dev/mapper/data-1 /data ext4 rw 0 0
rpc_pipefs /run/rpc_pipefs rpc_pipefs rw 0 0
/dev/fuse /data/crunch-tmp/crunch-job/task/humgen-02-02.3.keep fuse rw,nosuid,nodev,allow_other,max_read=131072,user=crunch 0 0
crunch@humgen-02-02:/$ arv-mount --replace /data/crunch-tmp/crunch-job/task/humgen-02-02.3.keep
mount: according to mtab, /dev/fuse is already mounted on /data/crunch-tmp/crunch-job/task/humgen-02-02.3.keep
2017-04-20 16:26:27 arvados.arv-mount[61468] ERROR: arv-mount: exception during mount: fuse_mount failed
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/arvados_fuse/command.py", line 365, in _run_standalone
with self:
File "/usr/lib/python2.7/dist-packages/arvados_fuse/command.py", line 133, in __enter__
llfuse.init(self.operations, self.args.mountpoint, self._fuse_options())
File "llfuse/fuse_api.pxi", line 253, in llfuse.capi.init (src/llfuse/capi_linux.c:24362)
RuntimeError: fuse_mount failed
crunch@humgen-02-02:/$ cat /proc/self/mountinfo | grep fuse
23 17 0:17 / /sys/fs/fuse/connections rw,relatime - fusectl none rw
42 39 0:31 / /data/crunch-tmp/crunch-job/task/humgen-02-02.3.keep rw,nosuid,nodev,relatime - fuse /dev/fuse rw,user_id=15324,group_id=1593,max_read=131072
crunch@humgen-02-02:/$ arv-mount --replace /data/crunch-tmp/crunch-job/task/humgen-02-02.3.keep
crunch@humgen-02-02:/$ mount -t fuse
/dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-02-02.3.keep type fuse (rw,nosuid,nodev,max_read=131072,user=crunch)
crunch@humgen-02-02:/$ fusermount -u /data/crunch-tmp/crunch-job/task/humgen-02-02.3.keep
crunch@humgen-02-02:/$ mount -t fuse
crunch@humgen-02-02:/$ exit
</pre></p>
<p>Then, on humgen-05-03, we discovered that you <strong>can</strong> use --subtype to mount a different fuse subtype on top of the old mountpoint. However, once unmounted the original "Transport endpoint is not connected" error returns (and there is still an entry in mtab but not in mountinfo until after attempting to mount). It does work to call `arv-mount --replace` twice in a row (first fails, second succeeds): <br /><pre>
crunch@humgen-05-03:/$ cat /proc/self/mountinfo | grep fuse
24 17 0:18 / /sys/fs/fuse/connections rw,relatime - fusectl none rw
crunch@humgen-05-03:/$ clear
crunch@humgen-05-03:/$ arv-mount --subtype foo --replace /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep
crunch@humgen-05-03:/$ ls /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep
by_id by_tag home README shared
crunch@humgen-05-03:/$ mount -t fuse.foo
foo on /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep type fuse.foo (rw,nosuid,nodev,max_read=131072,user=crunch)
crunch@humgen-05-03:/$ cat /etc/mtab |grep fuse
none /sys/fs/fuse/connections fusectl rw 0 0
/dev/fuse /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep fuse rw,nosuid,nodev,allow_other,max_read=131072,user=crunch 0 0
foo /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep fuse.foo rw,nosuid,nodev,max_read=131072,user=crunch 0 0
crunch@humgen-05-03:/$ cat /proc/self/mountinfo |grep fuse
24 17 0:18 / /sys/fs/fuse/connections rw,relatime - fusectl none rw
43 40 0:31 / /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep rw,nosuid,nodev,relatime - fuse.foo foo rw,user_id=15324,group_id=1593,max_read=131072
crunch@humgen-05-03:/$ mount -t fuse.foo
foo on /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep type fuse.foo (rw,nosuid,nodev,max_read=131072,user=crunch)
crunch@humgen-05-03:/$ mount -t fuse
/dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch)
crunch@humgen-05-03:/$ arv-mount --subtype bar --replace /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep
crunch@humgen-05-03:/$ mount -t fuse
/dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch)
crunch@humgen-05-03:/$ mount -t fuse.foo
crunch@humgen-05-03:/$ mount -t fuse.bar
bar on /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep type fuse.bar (rw,nosuid,nodev,max_read=131072,user=crunch)
crunch@humgen-05-03:/$ ps auxwww|grep arv-m
crunch 23232 1.5 0.0 474972 27840 ? Sl 16:34 0:00 /usr/bin/python2.7 /usr/bin/arv-mount --subtype bar --replace /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep
crunch 23297 0.0 0.0 9388 916 pts/2 S+ 16:34 0:00 grep arv-m
crunch@humgen-05-03:/$ mount -t fuse
/dev/fuse on /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep type fuse (rw,nosuid,nodev,allow_other,max_read=131072,user=crunch)
crunch@humgen-05-03:/$ arv-mount --replace /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep
mount: according to mtab, /dev/fuse is already mounted on /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep
2017-04-20 16:35:16 arvados.arv-mount[23408] ERROR: arv-mount: exception during mount: fuse_mount failed
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/arvados_fuse/command.py", line 365, in _run_standalone
with self:
File "/usr/lib/python2.7/dist-packages/arvados_fuse/command.py", line 133, in __enter__
llfuse.init(self.operations, self.args.mountpoint, self._fuse_options())
File "llfuse/fuse_api.pxi", line 253, in llfuse.capi.init (src/llfuse/capi_linux.c:24362)
RuntimeError: fuse_mount failed
crunch@humgen-05-03:/$ cat /proc/self/mountinfo | grep self
crunch@humgen-05-03:/$ cat /proc/self/mountinfo | grep fuse
24 17 0:18 / /sys/fs/fuse/connections rw,relatime - fusectl none rw
43 40 0:31 / /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep rw,nosuid,nodev,relatime - fuse /dev/fuse rw,user_id=15324,group_id=1593,max_read=131072
crunch@humgen-05-03:/$ ls /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep
ls: cannot access /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep: Transport endpoint is not connected
crunch@humgen-05-03:/$ ls /sys/fs/fuse/connections/31/
abort congestion_threshold max_background waiting
crunch@humgen-05-03:/$ ls /sys/fs/fuse/connections/31/
abort congestion_threshold max_background waiting
crunch@humgen-05-03:/$ echo "1" | /sys/fs/fuse/connections/31/abort
-su: /sys/fs/fuse/connections/31/abort: Permission denied
crunch@humgen-05-03:/$ echo "1" > /sys/fs/fuse/connections/31/abort
crunch@humgen-05-03:/$ cat /proc/self/mountinfo | grep fuse
24 17 0:18 / /sys/fs/fuse/connections rw,relatime - fusectl none rw
43 40 0:31 / /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep rw,nosuid,nodev,relatime - fuse /dev/fuse rw,user_id=15324,group_id=1593,max_read=131072
crunch@humgen-05-03:/$ ps auxwww|grep arv-m
crunch 24618 0.0 0.0 9388 912 pts/2 S+ 16:39 0:00 grep arv-m
crunch@humgen-05-03:/$ ls /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep
ls: cannot access /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep: Transport endpoint is not connected
crunch@humgen-05-03:/$ cat /etc/mtab | grep fuse
none /sys/fs/fuse/connections fusectl rw 0 0
/dev/fuse /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep fuse rw,nosuid,nodev,allow_other,max_read=131072,user=crunch 0 0
crunch@humgen-05-03:/$ cat /etc/mtab | grep fuse
none /sys/fs/fuse/connections fusectl rw 0 0
/dev/fuse /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep fuse rw,nosuid,nodev,allow_other,max_read=131072,user=crunch 0 0
crunch@humgen-05-03:/$ fusermount -u -z /data/crunch-tmp/crunch-job/task/humgen-05-03.10.keep
crunch@humgen-05-03:/$ cat /etc/mtab | grep fuse
none /sys/fs/fuse/connections fusectl rw 0 0
crunch@humgen-05-03:/$ mount -t fuse
crunch@humgen-05-03:/$ exit
</pre></p>
<p>Finally, on humgen-05-10, we found that it does work to manually remove the offending line from /etc/mtab (as root) and then just going ahead with the arv-mount succeeds. This may suggest a race condition in updating /etc/mtab is what is causing the underlying problem? <br /><pre>
crunch@humgen-05-10:/$ cat /etc/mtab
/dev/sda6 / ext4 rw 0 0
proc /proc proc rw,noexec,nosuid,nodev 0 0
sysfs /sys sysfs rw,noexec,nosuid,nodev 0 0
none /sys/fs/fuse/connections fusectl rw 0 0
none /sys/kernel/debug debugfs rw 0 0
none /sys/kernel/security securityfs rw 0 0
udev /dev devtmpfs rw,mode=0755 0 0
devpts /dev/pts devpts rw,noexec,nosuid,gid=5,mode=0620 0 0
tmpfs /run tmpfs rw,noexec,nosuid,size=10%,mode=0755 0 0
none /run/lock tmpfs rw,noexec,nosuid,nodev,size=5242880 0 0
none /run/shm tmpfs rw,nosuid,nodev 0 0
cgroup /sys/fs/cgroup tmpfs rw,relatime,mode=755 0 0
cgroup /sys/fs/cgroup/cpuset cgroup rw,relatime,cpuset 0 0
cgroup /sys/fs/cgroup/cpu cgroup rw,relatime,cpu 0 0
cgroup /sys/fs/cgroup/cpuacct cgroup rw,relatime,cpuacct 0 0
cgroup /sys/fs/cgroup/memory cgroup rw,relatime,memory 0 0
cgroup /sys/fs/cgroup/devices cgroup rw,relatime,devices 0 0
cgroup /sys/fs/cgroup/freezer cgroup rw,relatime,freezer 0 0
cgroup /sys/fs/cgroup/blkio cgroup rw,relatime,blkio 0 0
cgroup /sys/fs/cgroup/perf_event cgroup rw,relatime,perf_event 0 0
cgroup /sys/fs/cgroup/hugetlb cgroup rw,relatime,hugetlb 0 0
/dev/sda7 /tmp ext4 rw 0 0
/dev/sda8 /data xfs rw 0 0
/dev/sda1 /boot ext4 rw,errors=remount-ro 0 0
rpc_pipefs /run/rpc_pipefs rpc_pipefs rw 0 0
/dev/fuse /data/crunch-tmp/crunch-job/task/humgen-05-10.7.keep fuse rw,nosuid,nodev,allow_other,max_read=131072,user=crunch 0 0
crunch@humgen-05-10:/$ cat /proc/self/mountinfo | grep fuse
23 17 0:17 / /sys/fs/fuse/connections rw,relatime - fusectl none rw
crunch@humgen-05-10:/$ exit
logout
root@humgen-05-10:~# vi /etc/mtab ### MANUALLY REMOVE MTAB ENTRY FOR /data/crunch-tmp/crunch-job/task/humgen-05-10.7.keep
root@humgen-05-10:~# su - crunch
No directory, logging in with HOME=/
crunch@humgen-05-10:/$ export ARVADOS_API_HOST=api.arvados.sanger.ac.uk
crunch@humgen-05-10:/$ export ARVADOS_API_TOKEN=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
crunch@humgen-05-10:/$ arv-mount /data/crunch-tmp/crunch-job/task/humgen-05-10.7.keep
crunch@humgen-05-10:/$ ls -l /data/crunch-tmp/crunch-job/task/humgen-05-10.7.keep
total 3
dr-xr-xr-x 1 crunch arvados 0 Apr 20 16:44 by_id
dr-xr-xr-x 1 crunch arvados 0 Apr 20 16:44 by_tag
dr-xr-xr-x 1 crunch arvados 0 Apr 20 16:44 home
-r--r--r-- 1 crunch arvados 512 Apr 20 16:44 README
dr-xr-xr-x 1 crunch arvados 0 Apr 20 16:44 shared
</pre></p> Arvados - Bug #11209: stuck keep fuse mounts not cleared by crunch-jobhttps://dev.arvados.org/issues/11209?journal_id=509292017-04-20T16:24:24ZJoshua Randalljr17@sanger.ac.uk
<ul></ul><p>There do generally seem to be race conditions with updating /etc/mtab by FUSE filesystems:</p>
<p><a class="external" href="https://bugzilla.redhat.com/show_bug.cgi?id=651183">https://bugzilla.redhat.com/show_bug.cgi?id=651183</a><br /><a class="external" href="http://fuse.996288.n3.nabble.com/Security-Problem-in-fusermount-td12269.html">http://fuse.996288.n3.nabble.com/Security-Problem-in-fusermount-td12269.html</a></p>
<p>It may be a workaround for us to get rid of /etc/mtab and just symlink it to /proc/self/mounts as is done in newer systems?</p> Arvados - Bug #11209: stuck keep fuse mounts not cleared by crunch-jobhttps://dev.arvados.org/issues/11209?journal_id=510662017-04-26T14:45:33ZTom Cleggtom@curii.com
<ul></ul><p>Joshua Randall wrote:</p>
<blockquote>
<p>It may be a workaround for us to get rid of /etc/mtab and just symlink it to /proc/self/mounts as is done in newer systems?</p>
</blockquote>
<p>Yes, this seems to be the way mtab-editing problems end up getting fixed. From <a class="external" href="https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=94076">https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=94076</a>:</p>
<blockquote>
<p>/etc/mtab is now a symlink to /proc/mounts. Bugs which were a result of editing /etc/mtab which make it get out of sync with the real kernel state are now no longer an issue.</p>
</blockquote> Arvados - Bug #11209: stuck keep fuse mounts not cleared by crunch-jobhttps://dev.arvados.org/issues/11209?journal_id=511162017-04-26T18:52:35ZTom Cleggtom@curii.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Feedback</i></li></ul> Arvados - Bug #11209: stuck keep fuse mounts not cleared by crunch-jobhttps://dev.arvados.org/issues/11209?journal_id=511172017-04-26T18:52:56ZTom Cleggtom@curii.com
<ul><li><strong>Target version</strong> deleted (<del><i>2017-04-26 sprint</i></del>)</li></ul> Arvados - Bug #11209: stuck keep fuse mounts not cleared by crunch-jobhttps://dev.arvados.org/issues/11209?journal_id=518112017-05-16T18:23:31ZTom Cleggtom@curii.com
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Resolved</i></li></ul>