Project

General

Profile

Actions

Bug #20748

open

Compute node AMI scripts may be run too late, making docker restart after a launched container

Added by Lucas Di Pentima about 1 year ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
Crunch
Target version:
Story points:
-

Description

The runcmd module from cloud-init is used on our AMI build scripts to specify a series of config script executions.
In these scripts, the docker service gets briefly stopped, reconfigured and restarted. This is OK under the assumption that the script is run early in the system startup procedure, but that's not necessarily the case.

The runcmd module is defined to be inside the "config" stage of cloud-init, but what runcmd does is just write the script that will be executed later on, right after the user_data script.

We have recently added to arvados-dispatch-cloud the ability to set up an arbitrary user_data script, so here's the problem: If the configured user_data script takes too long, the dispatcher's BootProbe could pass early, launching a container and some minutes later, runcmd would restart the docker service, making the container to fail.

I think the ideal solution is to move the scripts from runcmd to another module that ensures early execution, making it independent of whatever the user_data is at any given time.

No data to display

Actions

Also available in: Atom PDF