Bug #20748
openCompute node AMI scripts may be run too late, making docker restart after a launched container
Description
The runcmd
module from cloud-init
is used on our AMI build scripts to specify a series of config script executions.
In these scripts, the docker service gets briefly stopped, reconfigured and restarted. This is OK under the assumption that the script is run early in the system startup procedure, but that's not necessarily the case.
The runcmd
module is defined to be inside the "config" stage of cloud-init, but what runcmd
does is just write the script that will be executed later on, right after the user_data
script.
We have recently added to arvados-dispatch-cloud
the ability to set up an arbitrary user_data
script, so here's the problem: If the configured user_data
script takes too long, the dispatcher's BootProbe could pass early, launching a container and some minutes later, runcmd
would restart the docker service, making the container to fail.
I think the ideal solution is to move the scripts from runcmd
to another module that ensures early execution, making it independent of whatever the user_data
is at any given time.
No data to display