Project

General

Profile

Actions

Feature #4042

closed

[Crunch] run-command script supports running MxN tasks (e.g., M directories x N chromosomes)

Added by Tom Clegg over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
Story points:
1.0

Description

Example job patterns that should be supported:
  • 6 directories, each with 8 pairs of fastq files → 48 tasks, each operating on one pair
  • 3 dirs with 1 pair of fastq files each + 3 dirs with 4 pairs of fastq files each → 15 tasks, each operating on one pair
  • 6 dirs, each with 8 bam files → 6 tasks, each operating on one dir
  • 6 dirs, each with 1 bam file, plus a static list of chromosomes → 6×23 tasks, each operating on one {bam file, chromosome}
  • 6 dirs, each with 23 bam files → 6×23 tasks, each operating on one file
  • 6 dirs, each with 23 bam files → 6 tasks, each operating on (all of the bam files in) one dir

The output should preserve the input directory structure: in each of the above examples, the output collection should have 6 directories with the same names as the 6 input directories.


Subtasks 1 (0 open1 closed)

Task #4103: Review 4042-run-command-MxNResolvedPeter Amstutz10/06/2014Actions

Related issues

Has duplicate Arvados - Feature #3501: Ability to run one pipeline over a set of inputsClosedActions
Actions #1

Updated by Tom Clegg over 9 years ago

  • Description updated (diff)
  • Category set to Crunch
Actions #2

Updated by Tom Clegg over 9 years ago

  • Subject changed from [Crunch] run-command script supports queueing MxN tasks (e.g., M directories x N chromosomes) to [Crunch] run-command script supports running MxN tasks (e.g., M directories x N chromosomes)
  • Description updated (diff)
Actions #3

Updated by Tom Clegg over 9 years ago

  • Description updated (diff)
Actions #4

Updated by Peter Amstutz over 9 years ago

  • Target version changed from Arvados Future Sprints to 2014-10-08 sprint
Actions #5

Updated by Peter Amstutz over 9 years ago

  • Assigned To set to Peter Amstutz
Actions #6

Updated by Peter Amstutz over 9 years ago

  • Status changed from New to In Progress
Actions #7

Updated by Brett Smith over 9 years ago

General

In both error messages and documentation, "accessable" is misspelled; it should be "accessible."

Code

  • Python dictionaries can use tuples as keys. Given this, I think add_to_group would be better off using match.groups() directly as the key. That way, there's no risk that different group matches will look identical because of conflicts with the join string, even as cute as ^_^ is.
  • There are a couple of places in expand_item that say p = pattern.match(i). This masks the original p argument that was passed in. I don't think that causes any behavior bugs in the current code (we never try to read the original p after), but changing it now might help prevent heartache later.
  • This doesn't need to change for the branch, but in case it's helpful for the future, you might find os.path.isdir() and os.path.isfile() easier to use than mucking around with stat.

Documentation

  • The two run-command include examples are identical. _run_command_foreach_example.liquid doesn't use foreach anywhere.
  • Please run a spell checker on the run-command reference. Some typos I found: "evaluted", "substition", "commands’s", "concatinated."
  • The description of the first example says run “echo” with the first argument “hello”, but the code echoes "hello world".
  • The description for $(task.outdir) seems to be cut off.
  • The second hyphen appears to be a typo in "The directory containing the source code for the run-command- script "
  • I think "Note that parameter expansion is not performed lists produced this way." should say "performed on lists."
  • "list context, as described below" - it's described above.
  • "list items functions" should say "list item" (singular) for consistency with the rest of the documentation.
  • Similarly, the index documentation should use monospace fonts for literal inputs and outputs, for consistency with the other function documentation. Plus Textile turns -- into an em dash otherwise.
  • "write it output files" should say "its".
Actions #8

Updated by Peter Amstutz over 9 years ago

Brett Smith wrote:

General

In both error messages and documentation, "accessable" is misspelled; it should be "accessible."

Fixed.

Code

  • Python dictionaries can use tuples as keys. Given this, I think add_to_group would be better off using match.groups() directly as the key. That way, there's no risk that different group matches will look identical because of conflicts with the join string, even as cute as ^_^ is.

I tried that with a list and it said "lists are unhashable"; I didn't realize it worked for tuples. Fixed.

  • There are a couple of places in expand_item that say p = pattern.match(i). This masks the original p argument that was passed in. I don't think that causes any behavior bugs in the current code (we never try to read the original p after), but changing it now might help prevent heartache later.

Fixed.

  • This doesn't need to change for the branch, but in case it's helpful for the future, you might find os.path.isdir() and os.path.isfile() easier to use than mucking around with stat.

Good to know.

Documentation

  • The two run-command include examples are identical. _run_command_foreach_example.liquid doesn't use foreach anywhere.
  • Please run a spell checker on the run-command reference. Some typos I found: "evaluted", "substition", "commands’s", "concatinated."
  • The description of the first example says run “echo” with the first argument “hello”, but the code echoes "hello world".
  • The description for $(task.outdir) seems to be cut off.
  • The second hyphen appears to be a typo in "The directory containing the source code for the run-command- script "
  • I think "Note that parameter expansion is not performed lists produced this way." should say "performed on lists."
  • "list context, as described below" - it's described above.
  • "list items functions" should say "list item" (singular) for consistency with the rest of the documentation.
  • Similarly, the index documentation should use monospace fonts for literal inputs and outputs, for consistency with the other function documentation. Plus Textile turns -- into an em dash otherwise.
  • "write it output files" should say "its".

All fixed.

Actions #9

Updated by Brett Smith over 9 years ago

Peter Amstutz wrote:

  • Python dictionaries can use tuples as keys. Given this, I think add_to_group would be better off using match.groups() directly as the key. That way, there's no risk that different group matches will look identical because of conflicts with the join string, even as cute as ^_^ is.

I tried that with a list and it said "lists are unhashable"; I didn't realize it worked for tuples. Fixed.

The rule is "keys have to be immutable," since you'd get bizarre results if you put a key in a dictionary and then mutated it. Fortunately, mutability is the differentiator between lists and tuples.

Reviewing bdd309b. Just a few small typos, go ahead and merge once these are fixed.

  • The preface for the foreach example says "over multiple sample;" it should say "samples" plural.
  • The foreach example is missing a comma after the sample parameter hash.
  • Small capitalization fixes: I think case-sensitive names like "run-command" and "script_parameters" should always be written with their correct case (i.e., lowercase), even at the beginning of sentences, to avoid bad input. Please capitalize the proper name "Unix."

Thanks.

Actions #10

Updated by Anonymous over 9 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:dd59a50f9f3933c359930806516b43899a8b4957.

Actions

Also available in: Atom PDF