Project

General

Profile

Actions

Bug #14693

open

[arvbox] runsv fatal: unable to lock supervise/lock

Added by Eric Biagiotti over 5 years ago. Updated about 2 months ago.

Status:
In Progress
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
-
Release:
Release relationship:
Auto

Description

- runit: $Id: 25da3b86f7bed4038b8a039d2f8e8c9bbcf0822b $: booting.
- runit: warning: unable to open /dev/console: file does not exist
- runit: enter stage: /etc/runit/1
- runit: leave stage: /etc/runit/1
- runit: enter stage: /etc/runit/2

Arvados-in-a-box starting

runsv workbench: fatal: unable to lock supervise/lock: temporary failure
runsv crunch-dispatch1: fatal: unable to lock supervise/lock: temporary failure
runsv crunch-dispatch-local: fatal: unable to lock supervise/lock: temporary failure
runsv sdk: fatal: unable to lock supervise/lock: temporary failure
runsv sso: fatal: unable to lock supervise/lock: temporary failure
runsv keepstore0: fatal: unable to lock supervise/lock: temporary failure
runsv arv-git-httpd: fatal: unable to lock supervise/lock: temporary failure
runsv keepstore1: fatal: unable to lock supervise/lock: temporary failure
runsv keep-web: fatal: unable to lock supervise/lock: temporary failure
runsv keep-web: fatal: unable to lock supervise/lock: temporary failure
runsv sso: fatal: unable to lock supervise/lock: temporary failure
runsv crunch-dispatch-local: fatal: unable to lock supervise/lock: temporary failure
runsv crunch-dispatch1: fatal: unable to lock supervise/lock: temporary failure
runsv arv-git-httpd: fatal: unable to lock supervise/lock: temporary failure
runsv sdk: fatal: unable to lock supervise/lock: temporary failure
runsv keepstore0: fatal: unable to lock supervise/lock: temporary failure
runsv keepstore1: fatal: unable to lock supervise/lock: temporary failure
runsv workbench: fatal: unable to lock supervise/lock: temporary failure
runsv crunch-dispatch-local: fatal: unable to lock supervise/lock: temporary failure

Actions #1

Updated by Peter Amstutz over 5 years ago

  • Subject changed from Running Arvbox on VM to [arvbox] runsv fatal: unable to lock supervise/lock

This is a very weird error.

Looking at the process tree inside the container shows (a) defunct runsv processes and (b) daemon processes that should be under a runsvdir->runsv instance are owned by the pid 1 runit process instead. This suggests that runsv is crashing/exiting abnormally.

The locking error presumably happens because runsv is clever and shares the lockfile descriptor when it spawns the child daemon process, so as long as the child daemon continues to run, the runsv process won't be able to get the lock, so it won't run another instance of the service. But it reports the lock error.

Actions #2

Updated by Peter Amstutz over 5 years ago

  • Status changed from New to In Progress

Differences between my workstation and the systems showing the problem:

My system:
Debian 9
Docker 18.09.0
Kernel 4.9.0-8-amd64

VM:
Ubuntu 18.04
Docker 17.05.0-ce
Kernel 4.15.0-1036-azure

runsvdir uses the inode and device number to decide if a service directory matches one seen previously. I had a theory that overlayfs could be reporting a different inode or device number, which would cause runsvdir to start a new instance of the service, but I haven't been able to confirm that is what is happening.

For some reason, everything except ssh eventually settles down and stops getting restarted. This suggests that the runsv warning is correlated with service restarts, despite the fact that service restarts are supposed to be handled by runsv, so runsvdir should only spin up new instances of runsv if runsv itself terminated. However, the runsv processes all have low pids, suggesting they haven't restarted.

Actions #3

Updated by Peter Amstutz about 1 year ago

  • Release set to 60
Actions #4

Updated by Peter Amstutz about 2 months ago

  • Target version set to Future
Actions

Also available in: Atom PDF