Actions
Bug #5500
closed[Crunch] Detect temporary error conditions
Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
Story points:
-
Description
2015-03-17_21:29:22 tb05z-8i9sb-2e16dypy0eg7m59 15615 50 stderr You are using pip version 6.0.6, however version 6.0.8 is available. 2015-03-17_21:29:22 tb05z-8i9sb-2e16dypy0eg7m59 15615 50 stderr You should consider upgrading via the 'pip install --upgrade pip' command. 2015-03-17_21:29:22 tb05z-8i9sb-2e16dypy0eg7m59 15615 50 stderr Hash of the package https://pypi.python.org/packages/source/h/httplib2/httplib2-0.9.tar.gz#md5=09d8e8016911fc40e2e4c58f1aa3ec24 (from https://pypi.python.org/simple/httplib2/) (db87123118b60fecc4b91288e9f988c0) doesn't match the expected hash 09d8e8016911fc40e2e4c58f1aa3ec24! 2015-03-17_21:29:22 tb05z-8i9sb-2e16dypy0eg7m59 15615 50 stderr Bad md5 hash for package https://pypi.python.org/packages/source/h/httplib2/httplib2-0.9.tar.gz#md5=09d8e8016911fc40e2e4c58f1aa3ec24 (from https://pypi.python.org/simple/httplib2/) 2015-03-17_21:29:22 tb05z-8i9sb-2e16dypy0eg7m59 15615 50 stderr /tmp/crunch-job-work/.arvados.venv/bin/pip --quiet install -I /tmp/crunch-job/opt/python failed (): exit 1 signal 0 at - line 198. 2015-03-17_21:29:23 tb05z-8i9sb-2e16dypy0eg7m59 15615 50 stderr srun: error: compute1: task 0: Exited with exit code 29 2015-03-17_21:29:23 tb05z-8i9sb-2e16dypy0eg7m59 15615 50 child 11234 on compute1.1 exit 29 success= 2015-03-17_21:29:23 tb05z-8i9sb-2e16dypy0eg7m59 15615 50 failure (#1, permanent) after 44 seconds
2015-03-17_23:59:15 tb05z-8i9sb-2e16dypy0eg7m59 15615 105 stderr srun: error: Task launch for 381.108 failed on node compute1: Communication connection failure 2015-03-17_23:59:15 tb05z-8i9sb-2e16dypy0eg7m59 15615 105 stderr srun: error: Application launch failed: Communication connection failure 2015-03-17_23:59:15 tb05z-8i9sb-2e16dypy0eg7m59 15615 105 stderr srun: Job step aborted: Waiting up to 2 seconds for job step to finish. 2015-03-17_23:59:17 tb05z-8i9sb-2e16dypy0eg7m59 15615 105 stderr srun: error: Timed out waiting for job step to complete 2015-03-17_23:59:17 tb05z-8i9sb-2e16dypy0eg7m59 15615 105 child 13887 on compute1.16 exit 1 success= 2015-03-17_23:59:17 tb05z-8i9sb-2e16dypy0eg7m59 15615 105 failure (#1, permanent) after 7 seconds
Updated by Peter Amstutz over 9 years ago
- Description updated (diff)
- Category set to Crunch
- Status changed from New to In Progress
- Assigned To set to Peter Amstutz
Updated by Peter Amstutz over 9 years ago
- Target version changed from Bug Triage to 2015-04-01 sprint
Updated by Tom Clegg over 9 years ago
At 3365d47
$tempfail
should probably be called$exitcode
: it doesn't actually signify tempfail unless it happens to be 111. If your intent is to make it more clear what 111 means,use constant TEMPFAIL => 111;
might be better?- I'm not sure
$code >> 8
is necessarily non-zero if the child is killed by a signal. Perhaps we shouldexit (($code >> 8) || 1)
to make sure we never accidentallyexit 0
here? - The "complain but don't exit" version of
die
iswarn
, notprint STDERR
: we should either usewarn
(which includes the line number in the install script, fwiw) or add a\n
to the end of the message.
The slurm part lgtm. :)
Updated by Peter Amstutz over 9 years ago
All done. Now at a239e2db534cc36aa8c3e08077383d84bf6ba8e8
Updated by Peter Amstutz over 9 years ago
- Status changed from In Progress to Resolved
Actions