Bug #7620

[Crunch] crunch-job expect the scripts under a non-existing crunch_scripts/ folder

Added by Chen Chen over 4 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Assigned To:
-
Category:
Crunch
Target version:
-
Start date:
10/21/2015
Due date:
% Done:

0%

Estimated time:
Story points:
-

Description

When running a pipeline, it said:
[Crunch] Using Arvados SDK:
stderr [Crunch] arvados-python-client==0.1.20151020215415
stderr Cannot exec `/tmp/crunch-job/src/crunch_scripts/hash.py`: No such file or directory at - line 108.

I logged in on the compute node, and see scripts directly under /tmp/crunch-job/src:
[root@compute0 src]# ls /tmp/crunch-job/src
hash.py

The crunch_scripts folder is not mentioned elsewhere. This pull request removed this non-existing folder, and make everyone happy.
https://github.com/curoverse/arvados/pull/31


Related issues

Related to Arvados - Story #7621: [API] Job model validates that the script exists in the repository at script_versionNew10/21/2015

Related to Arvados - Bug #6027: [Documentation] Update "Working with an Arvados git repository" to document crunch_scripts subdirectoryResolved12/02/2015

History

#1 Updated by Brett Smith over 4 years ago

  • Subject changed from crunch-job expect the scripts under a non-existing folder to [Crunch] crunch-job expect the scripts under a non-existing crunch_scripts/ folder
  • Status changed from New to In Progress

Thanks very much for reporting this issue, and for making the pull request. I'm really glad you're engaged with Arvados at such a deep technical level. I wanted to take this opportunity to elaborate on some of the design background that motivated the current code, and what improvements might be most helpful from here.

Arvados expects Crunch scripts to exist in a crunch_scripts/ subdirectory inside the Git repository. The primary motivation for this is to make it easy for Crunch scripts to be added to existing repositories. For example, this is how the Crunch scripts shipped with Arvados itself work: the run-command script lives in crunch_scripts/run-command in our own repository. This is covered somewhat as you go through the user guide. If you go through the tutorial about writing your own Crunch script, it instructs you to create a crunch_scripts directory in your repository, and write the script there.

Because so many existing Git repositories already follow this layout, we would break a lot of pipelines if we simply stopped searching for Crunch scripts there. I understand that the documentation could be clearer about this layout, and we already have a bug filed about that, #6027. I've also just filed #7621, as a feature enhancement that could help users debug this issue more easily. You're absolutely right that there's plenty of room for improvement here. We're very open to other suggestions for ways we might make this whole process more user-friendly. However, in order to preserve backward compatibility, those changes will need to be a little more nuanced than to stop looking under crunch_scrips/.

I hope this helps explain why we couldn't accept the pull request as written verbatim. If anything's unclear, or if you have any follow-up questions, please don't hesitate to ask. Thanks again for the contributions.

#2 Updated by Chen Chen over 4 years ago

Thanks for your detail explanation for the fundamental layout. I was too hasty in filing the bug and pull request before reading the documents carefully. Please reject the pull request and close this issue if you wish.

Brett Smith wrote:

Arvados expects Crunch scripts to exist in a crunch_scripts/ subdirectory inside the Git repository. The primary motivation for this is to make it easy for Crunch scripts to be added to existing repositories. For example, this is how the Crunch scripts shipped with Arvados itself work: the run-command script lives in crunch_scripts/run-command in our own repository. This is covered somewhat as you go through the user guide. If you go through the tutorial about writing your own Crunch script, it instructs you to create a crunch_scripts directory in your repository, and write the script there.

I suggest that the "Add new Repository" web UI could create a skeleton layout (e.g., with crunch_scripts folder already created) in the git repository master branch. Besides, "arv create pipeline_template" could generate a general template with blanks and hints waiting for the user to fill, because most users are usually not computer experts.

#3 Updated by Brett Smith over 4 years ago

  • Status changed from In Progress to Closed

Also available in: Atom PDF