[API] crunch-dispatch should mark a job failed when its repository cannot be fetched
If you build a pipeline that refers to a remote repository that cannot be fetched (e.g., typo in the URL), the pipeline will appear to "hang." Nothing will run, and the pipeline state will not change.
crunch-dispatch should detect this error case, mark the job failed, and report the error in such a way that the user can see it in Workbench's log tabs for the job and the parent pipeline (if any).
Original bug report¶
Pipeline su92l-d1hrv-6lv8np1xul1fcaf looks to have hung indefinitely. It looks like the cause of it is a bad 'repository' parameter. Here is the relevant section:
... "script": "run-command", "script_version": "master", "repository": "https://git.curoverse.com/get-evidence-arvados-scripts", ...
Trying to issue a
git clone command on the above repository causes it to hang for 60 seconds and then fail:
$ git clone https://git.curoverse.com/get-evidence-arvados-scripts Cloning into 'get-evidence-arvados-scripts'... fatal: unable to access 'https://git.curoverse.com/get-evidence-arvados-scripts/': Failed to connect to git.curoverse.com port 443: Connection timed out
As of this writing, the pipeline has been queued for over 16 hours. Also as of this writing, the queue count from the dashboard indicates 0 pipelines queued. There is one 'busy' node that looks to be processing another active pipeline. Screenshot attached.
#1 Updated by Brett Smith about 4 years ago
- Subject changed from bad pointer to repository causes pipeline to hang to [API] crunch-dispatch should mark a job failed when its repository cannot be fetched
- Description updated (diff)
- Category set to API
- Target version changed from Bug Triage to Arvados Future Sprints
- Story points set to 1.0
This happened because the remote repository URL was ill-specified. I worked with Abram to correct that problem so he could proceed. We should improve error reporting to help users understand the problem and that the job/pipeline will not run.