Project

General

Profile

Bug #9214

Updated by Ward Vandewege almost 8 years ago

 
 In su92l-d1hrv-xswkd33we27fopw, the docker load command failed: 

 <pre> 
 2016-05-12_19:19:29 salloc: Granted job allocation 14353 
 2016-05-12_19:19:29 48525    Sanity check is `docker.io ps -q` 
 2016-05-12_19:19:29 48525    sanity check: start 
 2016-05-12_19:19:29 48525    stderr starting: ['srun','--nodes=1','--ntasks-per-node=1','docker.io','ps','-q'] 
 2016-05-12_19:19:29 48525    sanity check: exit 0 
 2016-05-12_19:19:29 48525    Sanity check OK 
 2016-05-12_19:19:30 su92l-8i9sb-3wmb3ogss5hvqwb 48525    running from /usr/local/arvados/src/sdk/cli/bin/crunch-job with arvados-cli Gem version(s) 0.1.20151207150126, 0.1.20151023190001, 0.1.20150205181653 
 2016-05-12_19:19:30 su92l-8i9sb-3wmb3ogss5hvqwb 48525    check slurm allocation 
 2016-05-12_19:19:30 su92l-8i9sb-3wmb3ogss5hvqwb 48525    node compute0 - 1 slots 
 2016-05-12_19:19:30 su92l-8i9sb-3wmb3ogss5hvqwb 48525    start 
 2016-05-12_19:19:30 su92l-8i9sb-3wmb3ogss5hvqwb 48525    clean work dirs: start 
 2016-05-12_19:19:30 su92l-8i9sb-3wmb3ogss5hvqwb 48525    stderr starting: ['srun','--nodelist=compute0','-D','/tmp','bash','-ec','-o','pipefail','mount -t fuse,fuse.keep | awk "(index(\\$3, \\"$CRUNCH_TMP\\") == 1){print \\$3}" | xargs -r -n 1 fusermount -u -z; sleep 1; rm -rf $JOB_WORK $CRUNCH_INSTALL $CRUNCH_TMP/task $CRUNCH_TMP/src* $CRUNCH_TMP/*.cid'] 
 2016-05-12_19:19:31 su92l-8i9sb-3wmb3ogss5hvqwb 48525    clean work dirs: exit 0 
 2016-05-12_19:19:32 su92l-8i9sb-3wmb3ogss5hvqwb 48525    Install docker image 256f21bb3abfcd8e08a893886bf3e7c0+5082 
 2016-05-12_19:19:32 su92l-8i9sb-3wmb3ogss5hvqwb 48525    docker image hash is f4eafaf1e2d738e0f8d947feb725b5945f0219c5c4956eec6e164a0788abbab8 
 2016-05-12_19:19:32 su92l-8i9sb-3wmb3ogss5hvqwb 48525    load docker image: start 
 2016-05-12_19:19:32 su92l-8i9sb-3wmb3ogss5hvqwb 48525    stderr starting: ['srun','--nodelist=compute0','/bin/bash','-o','pipefail','-ec',' if docker.io images -q --no-trunc --all | grep -xF f4eafaf1e2d738e0f8d947feb725b5945f0219c5c4956eec6e164a0788abbab8 >/dev/null; then       exit 0 fi declare -a exit_codes=("${PIPESTATUS[@]}") if [ 0 != "${exit_codes[0]}" ]; then      exit "${exit_codes[0]}"    # `docker images` failed elif [ 1 != "${exit_codes[1]}" ]; then      exit "${exit_codes[1]}"    # `grep` encountered an error else      # Everything worked fine, but grep didn\'t find the image on this host.      arv-get 256f21bb3abfcd8e08a893886bf3e7c0\\+5082\\/f4eafaf1e2d738e0f8d947feb725b5945f0219c5c4956eec6e164a0788abbab8\\.tar | docker.io load fi '] 
 2016-05-12_19:37:30 su92l-8i9sb-3wmb3ogss5hvqwb 48525    stderr An error occurred trying to connect: Post http:///var/run/docker.sock/v1.21/images/load: EOF 
 2016-05-12_19:37:30 su92l-8i9sb-3wmb3ogss5hvqwb 48525    stderr srun: error: compute0: task 0: Exited with exit code 1 
 2016-05-12_19:37:30 su92l-8i9sb-3wmb3ogss5hvqwb 48525    load docker image: exit 1 
 2016-05-12_19:37:30 salloc: Relinquishing job allocation 14353 
 </pre> 

 After this, the job was back in 'pending' state, yet no new node was started up. 

 Not until I queued another job did this job start running again, because there was a node available: 

 <pre> 
 2016-05-12_20:05:56 salloc: Granted job allocation 14354 
 2016-05-12_20:05:57 7368    Sanity check is `docker.io ps -q` 
 2016-05-12_20:05:57 7368    sanity check: start 
 2016-05-12_20:05:57 7368    stderr starting: ['srun','--nodes=1','--ntasks-per-node=1','docker.io','ps','-q'] 
 2016-05-12_20:05:58 7368    sanity check: exit 0 
 2016-05-12_20:05:58 7368    Sanity check OK 
 2016-05-12_20:05:58 su92l-8i9sb-3wmb3ogss5hvqwb 7368    running from /usr/local/arvados/src/sdk/cli/bin/crunch-job with arvados-cli Gem version(s) 0.1.20151207150126, 0.1.20151023190001, 0.1.20150205181653 
 2016-05-12_20:05:58 su92l-8i9sb-3wmb3ogss5hvqwb 7368    check slurm allocation 
 2016-05-12_20:05:58 su92l-8i9sb-3wmb3ogss5hvqwb 7368    node compute1 - 1 slots 
 2016-05-12_20:05:58 su92l-8i9sb-3wmb3ogss5hvqwb 7368    start 
 2016-05-12_20:05:58 su92l-8i9sb-3wmb3ogss5hvqwb 7368    clean work dirs: start 
 2016-05-12_20:05:58 su92l-8i9sb-3wmb3ogss5hvqwb 7368    stderr starting: ['srun','--nodelist=compute1','-D','/tmp','bash','-ec','-o','pipefail','mount -t fuse,fuse.keep | awk "(index(\\$3, \\"$CRUNCH_TMP\\") == 1){print \\$3}" | xargs -r -n 1 fusermount -u -z; sleep 1; rm -rf $JOB_WORK $CRUNCH_INSTALL $CRUNCH_TMP/task $CRUNCH_TMP/src* $CRUNCH_TMP/*.cid'] 
 2016-05-12_20:05:59 su92l-8i9sb-3wmb3ogss5hvqwb 7368    clean work dirs: exit 0 
 2016-05-12_20:05:59 su92l-8i9sb-3wmb3ogss5hvqwb 7368    Install docker image 256f21bb3abfcd8e08a893886bf3e7c0+5082 
 2016-05-12_20:05:59 su92l-8i9sb-3wmb3ogss5hvqwb 7368    docker image hash is f4eafaf1e2d738e0f8d947feb725b5945f0219c5c4956eec6e164a0788abbab8 
 2016-05-12_20:05:59 su92l-8i9sb-3wmb3ogss5hvqwb 7368    load docker image: start 
 2016-05-12_20:05:59 su92l-8i9sb-3wmb3ogss5hvqwb 7368    stderr starting: ['srun','--nodelist=compute1','/bin/bash','-o','pipefail','-ec',' if docker.io images -q --no-trunc --all | grep -xF f4eafaf1e2d738e0f8d947feb725b5945f0219c5c4956eec6e164a0788abbab8 >/dev/null; then       exit 0 fi declare -a exit_codes=("${PIPESTATUS[@]}") if [ 0 != "${exit_codes[0]}" ]; then      exit "${exit_codes[0]}"    # `docker images` failed elif [ 1 != "${exit_codes[1]}" ]; then      exit "${exit_codes[1]}"    # `grep` encountered an error else      # Everything worked fine, but grep didn\'t find the image on this host.      arv-get 256f21bb3abfcd8e08a893886bf3e7c0\\+5082\\/f4eafaf1e2d738e0f8d947feb725b5945f0219c5c4956eec6e164a0788abbab8\\.tar | docker.io load fi '] 
 </pre>

Back