Compute node AMI scripts may be run too late, making docker restart after a launched container
runcmd module from
cloud-init is used on our AMI build scripts to specify a series of config script executions.
In these scripts, the docker service gets briefly stopped, reconfigured and restarted. This is OK under the assumption that the script is run early in the system startup procedure, but that's not necessarily the case.
runcmd module is defined to be inside the "config" stage of cloud-init, but what
runcmd does is just write the script that will be executed later on, right after the
We have recently added to
arvados-dispatch-cloud the ability to set up an arbitrary
user_data script, so here's the problem: If the configured
user_data script takes too long, the dispatcher's BootProbe could pass early, launching a container and some minutes later,
runcmd would restart the docker service, making the container to fail.
I think the ideal solution is to move the scripts from
runcmd to another module that ensures early execution, making it independent of whatever the
user_data is at any given time.
No data to display