Project

General

Profile

Actions

Bug #7225

closed

[SDKs] Script hangs on exit after writing a Collection file that spans multiple Keep blocks

Added by Brett Smith over 9 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
SDKs
Target version:
Story points:
2.0

Description

qr1hi-8i9sb-h5yt6xmpk8u6dps is a pretty typical BWA aligner job. The aligner apparently ran fine, but then run-command got stuck uploading the data. These lines were are the last interesting ones that appear in the log:

2015-09-04_15:16:52 qr1hi-8i9sb-h5yt6xmpk8u6dps 4106 0 stderr run-command: /keep/39c6f22d40001074f4200a72559ae7eb+5745/bwa completed with exit code 0 (success)
2015-09-04_15:16:52 qr1hi-8i9sb-h5yt6xmpk8u6dps 4106 0 stderr run-command: the following output files will be saved to keep:
2015-09-04_15:16:52 qr1hi-8i9sb-h5yt6xmpk8u6dps 4106 0 stderr run-command: 1455988972 ./[filename].sai
2015-09-04_15:16:52 qr1hi-8i9sb-h5yt6xmpk8u6dps 4106 0 stderr run-command: start writing output to keep

After that, run-command was never heard from again. When I checked on the compute node, the run-command process was still alive, but not doing anything. strace reported it was stuck in a futex call.

The last two lines in run-command update the task with success and output information, and exit. The API server logs show that it received and handled the task update with no problem, shortly after those last lines in the log, implying run-command got stuck somewhere between sending the request and exiting.

If the fix for this requires users to make specific API calls, Tom should sign off on those requirements as architect, and the requirements should be clearly documented.

The branch that fixes this is expected to include a test for the unsigned locator race condition.


Files

7225.py (488 Bytes) 7225.py Brett Smith, 09/07/2015 01:53 PM

Subtasks 1 (0 open1 closed)

Task #7346: Review 7225-collection-hangResolvedPeter Amstutz09/07/2015Actions

Related issues 1 (0 open1 closed)

Related to Arvados - Bug #5496: [SDKs] PySDK test_rewrite_on_empty_file is not reliableResolved03/18/2015Actions
Actions

Also available in: Atom PDF