Project

General

Profile

Actions

Bug #4471

closed

[Crunch] srun: error: Application launch failed: Communication connection failure

Added by Nancy Ouyang over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
-
Category:
Crunch
Target version:
Story points:
0.5

Description

$ arv run /bin/bash createtwofiles.sh

======
Upload local files: "createtwofiles.sh"
Uploaded to qr1hi-4zz18-ezuts5lpkpj6o6b
Running pipeline qr1hi-d1hrv-zv53r0mhykuj7cq
2014-11-07 22:37:27 arvados.events22240 WARNING: Got exception _ssl.c:331: No root certificates specified for verification of other-side certificates. trying to connect to websockets at wss://ws.qr1hi.arvadosapi.com/websocket
2014-11-07 22:37:27 arvados.events22240 WARNING: Websockets not available, falling back to log table polling
Fri Nov 7 22:37:39 2014 salloc: Granted job allocation 8146
Fri Nov 7 22:37:39 2014 qr1hi-8i9sb-ue3o9q5pi2r0eg7 29981 check slurm allocation
Fri Nov 7 22:37:39 2014 qr1hi-8i9sb-ue3o9q5pi2r0eg7 29981 node compute18 - 8 slots
Fri Nov 7 22:37:40 2014 qr1hi-8i9sb-ue3o9q5pi2r0eg7 29981 start
Fri Nov 7 22:37:40 2014 qr1hi-8i9sb-ue3o9q5pi2r0eg7 29981 Clean work dirs
Fri Nov 7 22:37:41 2014 qr1hi-8i9sb-ue3o9q5pi2r0eg7 29981 Cleanup command exited 0
Fri Nov 7 22:37:41 2014 qr1hi-8i9sb-ue3o9q5pi2r0eg7 29981 Looking for version 8a9fe2e16f1203f303afabc8c88b6e1ded9cec57 from repository arvados
Fri Nov 7 22:37:41 2014 qr1hi-8i9sb-ue3o9q5pi2r0eg7 29981 Using local repository '/var/lib/arvados/internal.git'
Fri Nov 7 22:37:41 2014 qr1hi-8i9sb-ue3o9q5pi2r0eg7 29981 Version 8a9fe2e16f1203f303afabc8c88b6e1ded9cec57 is commit 8a9fe2e16f1203f303afabc8c88b6e1ded9cec57
Fri Nov 7 22:37:41 2014 qr1hi-8i9sb-ue3o9q5pi2r0eg7 29981 Run install script on all workers
Fri Nov 7 22:37:41 2014 srun: error: Task launch for 8146.1 failed on node compute18: Communication connection failure
Fri Nov 7 22:37:41 2014 srun: error: Application launch failed: Communication connection failure
Fri Nov 7 22:37:41 2014 srun: Job step aborted: Waiting up to 2 seconds for job step to finish.
Fri Nov 7 22:37:43 2014 srun: error: Timed out waiting for job step to complete
Fri Nov 7 22:37:43 2014 qr1hi-8i9sb-ue3o9q5pi2r0eg7 29981 Install script exited 1
Fri Nov 7 22:37:48 2014 srun: error: Task launch for 8146.2 failed on node compute18: Communication connection failure
Fri Nov 7 22:37:48 2014 srun: error: Application launch failed: Communication connection failure
Fri Nov 7 22:37:48 2014 srun: Job step aborted: Waiting up to 2 seconds for job step to finish.
Fri Nov 7 22:37:50 2014 srun: error: Timed out waiting for job step to complete
Fri Nov 7 22:37:50 2014 qr1hi-8i9sb-ue3o9q5pi2r0eg7 29981 Installing Docker image from e22cdc86e1acc044f7cf446b37c7ead8+966 exited 1 at /usr/local/arvados/src/sdk/cli/bin/crunch-job line 603
Fri Nov 7 22:37:50 2014 qr1hi-8i9sb-ue3o9q5pi2r0eg7 29981 Freeze not implemented
Fri Nov 7 22:37:50 2014 qr1hi-8i9sb-ue3o9q5pi2r0eg7 29981 collate
Fri Nov 7 22:37:51 2014 Collection saved as 'Saved at 2014-11-07 22:37:40 UTC by '
Fri Nov 7 22:37:51 2014 qr1hi-8i9sb-ue3o9q5pi2r0eg7 29981 log manifest is 13095e803daf57a9c389deca80a46ed0+83
Fri Nov 7 22:37:51 2014 Died at /usr/local/arvados/src/sdk/cli/bin/crunch-job line 1464, <DATA> line 1.
Fri Nov 7 22:37:51 2014 salloc: Relinquishing job allocation 8146
Pipeline is Failed
No output

Actions

Also available in: Atom PDF