Bug #5906

[API] crunch-dispatch should mark a job failed when its repository cannot be fetched

Added by Abram Connelly over 4 years ago. Updated over 4 years ago.

Assigned To:
Target version:
Start date:
Due date:
% Done:


Estimated time:
Story points:


Development story

If you build a pipeline that refers to a remote repository that cannot be fetched (e.g., typo in the URL), the pipeline will appear to "hang." Nothing will run, and the pipeline state will not change.

crunch-dispatch should detect this error case, mark the job failed, and report the error in such a way that the user can see it in Workbench's log tabs for the job and the parent pipeline (if any).

Original bug report

Pipeline su92l-d1hrv-6lv8np1xul1fcaf looks to have hung indefinitely. It looks like the cause of it is a bad 'repository' parameter. Here is the relevant section:

      "script": "run-command",
      "script_version": "master",
      "repository": "https://git.curoverse.com/get-evidence-arvados-scripts",

Trying to issue a git clone command on the above repository causes it to hang for 60 seconds and then fail:

$ git clone https://git.curoverse.com/get-evidence-arvados-scripts
Cloning into 'get-evidence-arvados-scripts'...
fatal: unable to access 'https://git.curoverse.com/get-evidence-arvados-scripts/': Failed to connect to git.curoverse.com port 443: Connection timed out

As of this writing, the pipeline has been queued for over 16 hours. Also as of this writing, the queue count from the dashboard indicates 0 pipelines queued. There is one 'busy' node that looks to be processing another active pipeline. Screenshot attached.

waiting.png (57.4 KB) waiting.png Abram Connelly, 05/05/2015 02:36 PM

Related issues

Has duplicate Arvados - Bug #5845: Pipeline has failed but no jobs are marked as failedClosed04/28/2015


#1 Updated by Brett Smith over 4 years ago

  • Subject changed from bad pointer to repository causes pipeline to hang to [API] crunch-dispatch should mark a job failed when its repository cannot be fetched
  • Description updated (diff)
  • Category set to API
  • Target version changed from Bug Triage to Arvados Future Sprints
  • Story points set to 1.0

This happened because the remote repository URL was ill-specified. I worked with Abram to correct that problem so he could proceed. We should improve error reporting to help users understand the problem and that the job/pipeline will not run.

Also available in: Atom PDF