Bug #11494

bcbio NA12878 validation runs: failed steps due to being unable to setup container

Added by Brad Chapman 11 days ago.

Status:NewStart date:04/13/2017
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:Crunch
Target version:-
Story points-
Velocity based estimate-

Description

I'm running bcbio CWL validation workflows on NA12878 chr20 in preparation for the GA4GH workflow challenge:

https://github.com/bcbio/bcbio_validation_workflows

We had an almost successful run:

https://cloud.curoverse.com/pipeline_instances/qr1hi-d1hrv-xzppm82ydryksab

but have 3 failed variant calling jobs. It looks like bcbio never actually run and appears to be
an issue with setting up the instance:

https://cloud.curoverse.com/jobs/qr1hi-8i9sb-h8g7y5er07o8dp7#Log
https://cloud.curoverse.com/jobs/qr1hi-8i9sb-m9okvuwbhem966d#Log
https://cloud.curoverse.com/jobs/qr1hi-8i9sb-07qhvcy8jpomdn1#Log

This looks like the useful part of the log:
```
stderr starting: ['srun','--nodes=1','/bin/sh','-ec','/usr/bin/docker.io run --user=crunch a458ac1b067f2938da2860b2d3212900660905e3713906ce20caa5c353cdb45a id --user']
stderr Unable to find user crunch
stderr Error response from daemon: Cannot start container d51100ab82905a45759faeb14fa7487102141930c256381a609b69f39ed05c0f: [8] System error: Unable to find user crunch
stderr srun: error: compute56: task 0: Exited with exit code 1
check whether user 'crunch' is UID 0: exit 1
check whether user 'nobody' is UID 0: start
stderr starting: ['srun','--nodes=1','/bin/sh','-ec','/usr/bin/docker.io run --user=nobody a458ac1b067f2938da2860b2d3212900660905e3713906ce20caa5c353cdb45a id --user']
stderr srun: error: Unable to create job step: Required node not available (down or drained)
```
Thanks for any suggestions and help debugging.

Also available in: Atom PDF