Bug #15937

[arv-mount] [crunch-run] fusermount: failed to unmount

Added by Peter Amstutz 7 months ago. Updated 6 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
01/08/2020
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

Saw this error, need to investigate

2019-12-16T19:12:25.790683383Z Attaching container streams
2019-12-16T19:12:25.937617510Z Starting Docker container id '653468fc6855c900c5da0aee2d82b70687cf2f8c40c3b34c62b2ea09d4cfbfc8'
2019-12-16T19:12:27.044140997Z notice: reading stats from /sys/fs/cgroup/cpuacct/docker/653468fc6855c900c5da0aee2d82b70687cf2f8c40c3b34c62b2ea09d4cfbfc8/cgroup.procs
2019-12-16T19:12:27.044186897Z notice: monitoring temp dir /tmp/crunch-run.ce8i5-dz642-dv9tnw5a6tkjtms.228716346
2019-12-16T19:12:27.044345600Z notice: reading stats from /sys/fs/cgroup/memory/docker/653468fc6855c900c5da0aee2d82b70687cf2f8c40c3b34c62b2ea09d4cfbfc8/memory.stat
2019-12-16T19:12:27.044658404Z mem 0 cache 0 swap 0 pgmajfault 442368 rss
2019-12-16T19:12:27.044697305Z notice: reading stats from /sys/fs/cgroup/cpuacct/docker/653468fc6855c900c5da0aee2d82b70687cf2f8c40c3b34c62b2ea09d4cfbfc8/cpuacct.stat
2019-12-16T19:12:27.044750506Z notice: reading stats from /sys/fs/cgroup/cpuset/docker/653468fc6855c900c5da0aee2d82b70687cf2f8c40c3b34c62b2ea09d4cfbfc8/cpuset.cpus
2019-12-16T19:12:27.044782906Z cpu 0.0000 user 0.0100 sys 1 cpus
2019-12-16T19:12:27.044903608Z statfs 10676027392 available 44613632 used 10720641024 total
2019-12-16T19:12:27.961017612Z Hello, Crunch!
2019-12-16T19:12:28.113808721Z Waiting for container to finish
2019-12-16T19:12:28.464785570Z Container exited with code: 0
2019-12-16T19:12:28.516499314Z Complete
2019-12-16T19:12:28.726092228Z Running [arv-mount --unmount-timeout=8 --unmount /tmp/crunch-run.ce8i5-dz642-dv9tnw5a6tkjtms.228716346/keep231647569]
2019-12-16T19:12:29.405618705Z fusermount: failed to unmount /tmp/crunch-run.ce8i5-dz642-dv9tnw5a6tkjtms.228716346/keep231647569: Invalid argument
2019-12-16T19:12:29.531401284Z crunch-run finished

Subtasks

Task #15978: Review 15937-failed-to-unmountResolvedPeter Amstutz

Associated revisions

Revision b27c4cbe
Added by Tom Clegg 6 months ago

Merge branch '15937-failed-to-unmount'

fixes #15937

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

History

#1 Updated by Peter Amstutz 7 months ago

  • Status changed from New to In Progress

#2 Updated by Peter Amstutz 7 months ago

  • Status changed from In Progress to New
  • Description updated (diff)
  • Subject changed from Failed to unmount to [crunch-run] fusermount: failed to unmount

#3 Updated by Tom Clegg 7 months ago

  • Assigned To set to Tom Clegg
  • Subject changed from [crunch-run] fusermount: failed to unmount to [arv-mount] [crunch-run] fusermount: failed to unmount

#4 Updated by Tom Clegg 6 months ago

The "arv-mount --unmount" command uses a retry loop:
  • hit /sys/fs/fuse/connections/$id/abort to kill process
  • call fusermount -u -z to detach mount

It seems normal for fusermount to report "invalid argument" when the underlying fuse process is dead/unresponsive.

I think the appropriate fix is to hide fusermount's stderr until/unless we determine (in the next loop iteration) that the mount still exists.

#5 Updated by Tom Clegg 6 months ago

  • Status changed from New to In Progress

#6 Updated by Tom Clegg 6 months ago

I haven't been able to reproduce the "invalid argument" message, but I found that repeating "mount && unmount" would occasionally report "fusermount: entry for [...] not found in /etc/mtab" on unmount, maybe 20% of the time. I think the above reasoning applies regardless of the exact error: messages from "fusermount" are only interesting if the mount still exists afterward.

With this change, I ran a few hundred "mount && unmount" with no reported errors.

15937-failed-to-unmount @ 6a4098aa1207c276e2e6bf9a6128b30084ad4b1e -- https://ci.arvados.org/view/Developer/job/developer-run-tests/1707/

#7 Updated by Peter Amstutz 6 months ago

Tom Clegg wrote:

I haven't been able to reproduce the "invalid argument" message, but I found that repeating "mount && unmount" would occasionally report "fusermount: entry for [...] not found in /etc/mtab" on unmount, maybe 20% of the time. I think the above reasoning applies regardless of the exact error: messages from "fusermount" are only interesting if the mount still exists afterward.

With this change, I ran a few hundred "mount && unmount" with no reported errors.

15937-failed-to-unmount @ 6a4098aa1207c276e2e6bf9a6128b30084ad4b1e -- https://ci.arvados.org/view/Developer/job/developer-run-tests/1707/

This LGTM.

#8 Updated by Anonymous 6 months ago

  • Status changed from In Progress to Resolved

#9 Updated by Peter Amstutz 6 months ago

  • Release set to 22

Also available in: Atom PDF