Project

General

Profile

Bug #8810

Updated by Brett Smith about 8 years ago

<pre>2016-03-22_16:33:38 wx7k5-8i9sb-ose8gk9vuxqe9gd 48074    stderr starting: ['srun','--nodelist=compute11','/bin/bash','-o','pipefail','-ec',' if ! docker.io images -q --no-trunc --all | grep -qxF d33416e64af4370471ed15d19211e84991a8e158626199f4e4747e4310144b83; then       arv-get 17b65db74aae73465b5e286d1cdb0e23\\+798\\/d33416e64af4370471ed15d19211e84991a8e158626199f4e4747e4310144b83\\.tar | docker.io load fi '] 
 2016-03-22_16:33:40 wx7k5-8i9sb-ose8gk9vuxqe9gd 48074    stderr Post http:///var/run/docker.sock/v1.20/images/load: EOF. 
 2016-03-22_16:33:40 wx7k5-8i9sb-ose8gk9vuxqe9gd 48074    stderr * Are you trying to connect to a TLS-enabled daemon without TLS? 
 2016-03-22_16:33:40 wx7k5-8i9sb-ose8gk9vuxqe9gd 48074    stderr * Is your docker daemon up and running? 
 2016-03-22_16:41:14 wx7k5-8i9sb-ose8gk9vuxqe9gd 48074    stderr srun: error: Node failure on compute11 
 2016-03-22_16:41:14 wx7k5-8i9sb-ose8gk9vuxqe9gd 48074    stderr srun: Job step aborted: Waiting up to 2 seconds for job step to finish. 
 2016-03-22_16:41:14 wx7k5-8i9sb-ose8gk9vuxqe9gd 48074    load docker image: exit 0</pre> 

 From here the job continued running and generating errors until the UID 0 check failed.    Instead crunch-job should detect this error and exit such that crunch-dispatch retries the job.

Back