Feature #4042

[Crunch] run-command script supports running MxN tasks (e.g., M directories x N chromosomes)

Added by Tom Clegg almost 5 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
Start date:
10/06/2014
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
1.0

Description

Example job patterns that should be supported:
  • 6 directories, each with 8 pairs of fastq files → 48 tasks, each operating on one pair
  • 3 dirs with 1 pair of fastq files each + 3 dirs with 4 pairs of fastq files each → 15 tasks, each operating on one pair
  • 6 dirs, each with 8 bam files → 6 tasks, each operating on one dir
  • 6 dirs, each with 1 bam file, plus a static list of chromosomes → 6×23 tasks, each operating on one {bam file, chromosome}
  • 6 dirs, each with 23 bam files → 6×23 tasks, each operating on one file
  • 6 dirs, each with 23 bam files → 6 tasks, each operating on (all of the bam files in) one dir

The output should preserve the input directory structure: in each of the above examples, the output collection should have 6 directories with the same names as the 6 input directories.


Subtasks

Task #4103: Review 4042-run-command-MxNResolvedPeter Amstutz


Related issues

Has duplicate Arvados - Feature #3501: Ability to run one pipeline over a set of inputsClosed

Associated revisions

Revision dd59a50f
Added by Peter Amstutz almost 5 years ago

Merge branch '4042-run-command-MxN' closes #4042

History

#1 Updated by Tom Clegg almost 5 years ago

  • Description updated (diff)
  • Category set to Crunch

#2 Updated by Tom Clegg almost 5 years ago

  • Subject changed from [Crunch] run-command script supports queueing MxN tasks (e.g., M directories x N chromosomes) to [Crunch] run-command script supports running MxN tasks (e.g., M directories x N chromosomes)
  • Description updated (diff)

#3 Updated by Tom Clegg almost 5 years ago

  • Description updated (diff)

#4 Updated by Peter Amstutz almost 5 years ago

  • Target version changed from Arvados Future Sprints to 2014-10-08 sprint

#5 Updated by Peter Amstutz almost 5 years ago

  • Assigned To set to Peter Amstutz

#6 Updated by Peter Amstutz almost 5 years ago

  • Status changed from New to In Progress

#7 Updated by Brett Smith almost 5 years ago

General

In both error messages and documentation, "accessable" is misspelled; it should be "accessible."

Code

  • Python dictionaries can use tuples as keys. Given this, I think add_to_group would be better off using match.groups() directly as the key. That way, there's no risk that different group matches will look identical because of conflicts with the join string, even as cute as ^_^ is.
  • There are a couple of places in expand_item that say p = pattern.match(i). This masks the original p argument that was passed in. I don't think that causes any behavior bugs in the current code (we never try to read the original p after), but changing it now might help prevent heartache later.
  • This doesn't need to change for the branch, but in case it's helpful for the future, you might find os.path.isdir() and os.path.isfile() easier to use than mucking around with stat.

Documentation

  • The two run-command include examples are identical. _run_command_foreach_example.liquid doesn't use foreach anywhere.
  • Please run a spell checker on the run-command reference. Some typos I found: "evaluted", "substition", "commands’s", "concatinated."
  • The description of the first example says run “echo” with the first argument “hello”, but the code echoes "hello world".
  • The description for $(task.outdir) seems to be cut off.
  • The second hyphen appears to be a typo in "The directory containing the source code for the run-command- script "
  • I think "Note that parameter expansion is not performed lists produced this way." should say "performed on lists."
  • "list context, as described below" - it's described above.
  • "list items functions" should say "list item" (singular) for consistency with the rest of the documentation.
  • Similarly, the index documentation should use monospace fonts for literal inputs and outputs, for consistency with the other function documentation. Plus Textile turns -- into an em dash otherwise.
  • "write it output files" should say "its".

#8 Updated by Peter Amstutz almost 5 years ago

Brett Smith wrote:

General

In both error messages and documentation, "accessable" is misspelled; it should be "accessible."

Fixed.

Code

  • Python dictionaries can use tuples as keys. Given this, I think add_to_group would be better off using match.groups() directly as the key. That way, there's no risk that different group matches will look identical because of conflicts with the join string, even as cute as ^_^ is.

I tried that with a list and it said "lists are unhashable"; I didn't realize it worked for tuples. Fixed.

  • There are a couple of places in expand_item that say p = pattern.match(i). This masks the original p argument that was passed in. I don't think that causes any behavior bugs in the current code (we never try to read the original p after), but changing it now might help prevent heartache later.

Fixed.

  • This doesn't need to change for the branch, but in case it's helpful for the future, you might find os.path.isdir() and os.path.isfile() easier to use than mucking around with stat.

Good to know.

Documentation

  • The two run-command include examples are identical. _run_command_foreach_example.liquid doesn't use foreach anywhere.
  • Please run a spell checker on the run-command reference. Some typos I found: "evaluted", "substition", "commands’s", "concatinated."
  • The description of the first example says run “echo” with the first argument “hello”, but the code echoes "hello world".
  • The description for $(task.outdir) seems to be cut off.
  • The second hyphen appears to be a typo in "The directory containing the source code for the run-command- script "
  • I think "Note that parameter expansion is not performed lists produced this way." should say "performed on lists."
  • "list context, as described below" - it's described above.
  • "list items functions" should say "list item" (singular) for consistency with the rest of the documentation.
  • Similarly, the index documentation should use monospace fonts for literal inputs and outputs, for consistency with the other function documentation. Plus Textile turns -- into an em dash otherwise.
  • "write it output files" should say "its".

All fixed.

#9 Updated by Brett Smith almost 5 years ago

Peter Amstutz wrote:

  • Python dictionaries can use tuples as keys. Given this, I think add_to_group would be better off using match.groups() directly as the key. That way, there's no risk that different group matches will look identical because of conflicts with the join string, even as cute as ^_^ is.

I tried that with a list and it said "lists are unhashable"; I didn't realize it worked for tuples. Fixed.

The rule is "keys have to be immutable," since you'd get bizarre results if you put a key in a dictionary and then mutated it. Fortunately, mutability is the differentiator between lists and tuples.

Reviewing bdd309b. Just a few small typos, go ahead and merge once these are fixed.

  • The preface for the foreach example says "over multiple sample;" it should say "samples" plural.
  • The foreach example is missing a comma after the sample parameter hash.
  • Small capitalization fixes: I think case-sensitive names like "run-command" and "script_parameters" should always be written with their correct case (i.e., lowercase), even at the beginning of sentences, to avoid bad input. Please capitalize the proper name "Unix."

Thanks.

#10 Updated by Anonymous almost 5 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:dd59a50f9f3933c359930806516b43899a8b4957.

Also available in: Atom PDF