Project

General

Profile

Actions

Idea #9369

closed

[Documentation] Update documentation & tutorials for running CWL on Arvados

Added by Peter Amstutz over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
07/01/2016
Due date:
Story points:
1.0

Subtasks 2 (0 open2 closed)

Task #9377: Review 9369-arv-cwl-docsResolvedPeter Amstutz07/12/2016Actions
Task #9376: Write tutorialsResolvedPeter Amstutz07/01/2016Actions
Actions #1

Updated by Peter Amstutz over 8 years ago

  • Story points set to 1.0
Actions #2

Updated by Peter Amstutz over 8 years ago

  • Assigned To set to Peter Amstutz
Actions #3

Updated by Brett Smith over 8 years ago

  • Target version changed from 2016-06-22 sprint to 2016-07-06 sprint
Actions #4

Updated by Radhika Chippada over 8 years ago

  • Target version changed from 2016-07-06 sprint to 2016-07-20 sprint
Actions #5

Updated by Brett Smith over 8 years ago

Reviewing 51b150b.

I pushed 3f69ccb to the branch with small copyedits throughout. Just the kind of stuff that's easier to commit than describe. Beyond that:

I don't think we should add the install instructions to the Python SDK page. I don't think readers would look for it here, because the fact that arvados-cwl-runner is written in Python should be an implementation detail (and I bet that will change in the future). If it's going to get added to a current page, I would advocate we add it to the CLI SDK install guide. No, that currently doesn't cover any Python installation, but that's a bug anyway.

arvados-cwl-runner needs access to Docker. This isn't mentioned. This also puts us in the usual awkward position we have demoing Docker-based tools, where all the data is on qr1hi and we generally encourage people to run there, but since users don't get Docker access for security reasons they can't run the tools. I don't know how to square this circle.

In arv-copy --src cloud --dst settings 2463fa9efeb75e099685528b3b9071e0+438, suggest changing cloud to qr1hi because that's how it's set up in the arv-copy documentation.

bwa-mem-template.yml is executable but doesn't have a shebang line.

It's a small thing but we should probably change the 3d0ga UUID prefix throughout to qr1hi, for consistency with the rest of the documentation if nothing else.

Thanks.

Actions #6

Updated by Brett Smith over 8 years ago

Running the suggested cwltool command gave me this error on shell.qr1hi:

Got workflow error
Traceback (most recent call last):
  File "/data/scratch/brett/cwl/local/lib/python2.7/site-packages/cwltool/main.py", line 205, in single_job_executor
    for r in jobiter:
  File "/data/scratch/brett/cwl/local/lib/python2.7/site-packages/cwltool/draft2tool.py", line 290, in job
    j.command_line = flatten(map(builder.generate_arg, builder.bindings))
  File "/data/scratch/brett/cwl/local/lib/python2.7/site-packages/cwltool/builder.py", line 162, in generate_arg
    value = self.do_eval(binding["do_eval"], context=value)
  File "/data/scratch/brett/cwl/local/lib/python2.7/site-packages/cwltool/builder.py", line 204, in do_eval
    timeout=self.timeout)
  File "/data/scratch/brett/cwl/local/lib/python2.7/site-packages/cwltool/expression.py", line 146, in do_eval
    timeout=timeout)
  File "/data/scratch/brett/cwl/local/lib/python2.7/site-packages/cwltool/sandboxjs.py", line 173, in interpolate
    e = execjs(scan[w[0]+1:w[1]], jslib, timeout=timeout)
  File "/data/scratch/brett/cwl/local/lib/python2.7/site-packages/cwltool/sandboxjs.py", line 79, in execjs
    raise JavascriptException(u"Long-running script killed after %s seconds.\nscript was:\n%s\n" % (timeout, fn_linenum()))
JavascriptException: Long-running script killed after 20 seconds.
script was:
0001 "use strict";
0002 var inputs = {
0003     "reference": {
0004         "path": "/var/lib/cwl/job469616780_bwa-mem/19.fasta.bwt",
0005         "containerfs": true,
0006         "class": "File",
0007         "secondaryFiles": [
0008             {
0009                 "path": "/var/lib/cwl/job469616780_bwa-mem/19.fasta.ann",
0010                 "containerfs": true,
0011                 "class": "File" 
0012             },
0013             {
0014                 "path": "/var/lib/cwl/job469616780_bwa-mem/19.fasta.amb",
0015                 "containerfs": true,
0016                 "class": "File" 
0017             },
0018             {
0019                 "path": "/var/lib/cwl/job469616780_bwa-mem/19.fasta.pac",
0020                 "containerfs": true,
0021                 "class": "File" 
0022             },
0023             {
0024                 "path": "/var/lib/cwl/job469616780_bwa-mem/19.fasta.sa",
0025                 "containerfs": true,
0026                 "class": "File" 
0027             }
0028         ]
0029     },
0030     "read_p1": {
0031         "path": "/var/lib/cwl/job469616780_bwa-mem/HWI-ST1027_129_D0THKACXX.1_1.fastq",
0032         "containerfs": true,
0033         "class": "File" 
0034     },
0035     "read_p2": {
0036         "path": "/var/lib/cwl/job469616780_bwa-mem/HWI-ST1027_129_D0THKACXX.1_2.fastq",
0037         "containerfs": true,
0038         "class": "File" 
0039     },
0040     "sample_id": "HWI-ST1027_129",
0041     "group_id": "arvados_tutorial",
0042     "PL": "illumina" 
0043 };
0044 var self = {
0045     "path": "/var/lib/cwl/job469616780_bwa-mem/19.fasta.bwt",
0046     "containerfs": true,
0047     "class": "File",
0048     "secondaryFiles": [
0049         {
0050             "path": "/var/lib/cwl/job469616780_bwa-mem/19.fasta.ann",
0051             "containerfs": true,
0052             "class": "File" 
0053         },
0054         {
0055             "path": "/var/lib/cwl/job469616780_bwa-mem/19.fasta.amb",
0056             "containerfs": true,
0057             "class": "File" 
0058         },
0059         {
0060             "path": "/var/lib/cwl/job469616780_bwa-mem/19.fasta.pac",
0061             "containerfs": true,
0062             "class": "File" 
0063         },
0064         {
0065             "path": "/var/lib/cwl/job469616780_bwa-mem/19.fasta.sa",
0066             "containerfs": true,
0067             "class": "File" 
0068         }
0069     ]
0070 }; 
0071 var runtime = {
0072     "outdirSize": 1024,
0073     "ram": 1024,
0074     "tmpdirSize": 1024,
0075     "cores": 1,
0076     "tmpdir": "/tmp",
0077     "outdir": "/var/spool/cwl" 
0078 };
0079 (function(){return ((self.path.match(/(.*)\.[^.]+$/)[1]));})()
Actions #7

Updated by Peter Amstutz over 8 years ago

Brett Smith wrote:

Reviewing 51b150b.

I pushed 3f69ccb to the branch with small copyedits throughout. Just the kind of stuff that's easier to commit than describe. Beyond that:

I don't think we should add the install instructions to the Python SDK page. I don't think readers would look for it here, because the fact that arvados-cwl-runner is written in Python should be an implementation detail (and I bet that will change in the future). If it's going to get added to a current page, I would advocate we add it to the CLI SDK install guide. No, that currently doesn't cover any Python installation, but that's a bug anyway.

I moved the install instructions to the CLI page and also added some instructions about the Python SDK. Note that the Python SDK page also discusses mentions deb/rpm packages, but the CLI page does not.

arvados-cwl-runner needs access to Docker. This isn't mentioned. This also puts us in the usual awkward position we have demoing Docker-based tools, where all the data is on qr1hi and we generally encourage people to run there, but since users don't get Docker access for security reasons they can't run the tools. I don't know how to square this circle.

I added a paragraph discussing it.

In arv-copy --src cloud --dst settings 2463fa9efeb75e099685528b3b9071e0+438, suggest changing cloud to qr1hi because that's how it's set up in the arv-copy documentation.

Done.

bwa-mem-template.yml is executable but doesn't have a shebang line.

That wasn't intended to be executable, removed the x bit.

It's a small thing but we should probably change the 3d0ga UUID prefix throughout to qr1hi, for consistency with the rest of the documentation if nothing else.

Fixed.

JavascriptException: Long-running script killed after 20 seconds.

I added a section about what to do about that.

Actions #8

Updated by Brett Smith over 8 years ago

Pushed another round of commits in f17319c. Beyond that, all my comments are on the install section now.

  • The CWL instructions are still on the Python SDK install page, and this feels a little incongruous to me. I get that the CWL stuff is in princple an SDK written in Python, so from that perspective it makes sense. But literally nothing else in the Python SDK documentation in this branch discusses the CWL SDK; and I don't think we're really expecting anyone to use the library as an SDK right now, just to use the included tools. As long as both of those remain true, I think it'd be more sensible from readers' perspective to leave the CWL SDK out of the Python SDK documentation completely.
  • Why do the gem/pip shell prompts specify working in the Arvados source tree? (None of the shell in this section really follows the usual documentation convention for cd'ing around, but in general I'm fine to leave that as a separate issue, as long as the new stuff either follows convention correctly or just doesn't include paths like the rest of the page.)
  • The CLI install instructions say sudo -i python setup.py install. This is dangerous because it's the kind of invocation that can cause setuptools to overwrite system packages. Per Paul Tagliamonte, who maintains a lot of Python Debian packages, "Use pip without sudo always. Don't tell people to use sudo."
    The corresponding Python SDK install instructions don't use sudo. Neither should this.
  • The source instructions implicitly expect users to cd back to their home directory between installing each SDK component. It seems like it would be more user-friendly to make this more copy-paste friendly, by having commands like cd ../python and cd ../cwl.

Peter Amstutz wrote:

I moved the install instructions to the CLI page and also added some instructions about the Python SDK. Note that the Python SDK page also discusses mentions deb/rpm packages, but the CLI page does not.

This can be addressed separately, but is there any reason to believe this is anything besides an historical accident or oversight?

Actions #9

Updated by Peter Amstutz over 8 years ago

Brett Smith wrote:

Pushed another round of commits in f17319c. Beyond that, all my comments are on the install section now.

  • The CWL instructions are still on the Python SDK install page, and this feels a little incongruous to me. I get that the CWL stuff is in princple an SDK written in Python, so from that perspective it makes sense. But literally nothing else in the Python SDK documentation in this branch discusses the CWL SDK; and I don't think we're really expecting anyone to use the library as an SDK right now, just to use the included tools. As long as both of those remain true, I think it'd be more sensible from readers' perspective to leave the CWL SDK out of the Python SDK documentation completely.

That was an oversight, I took out the other references but left one in. Fixed.

  • Why do the gem/pip shell prompts specify working in the Arvados source tree? (None of the shell in this section really follows the usual documentation convention for cd'ing around, but in general I'm fine to leave that as a separate issue, as long as the new stuff either follows convention correctly or just doesn't include paths like the rest of the page.)

Copy and paste error, I think. Fixed.

I was copying the gem instructions which used sudo -i (for consistency) but per our conversation on engineering I took it out.

The corresponding Python SDK install instructions don't use sudo. Neither should this.
  • The source instructions implicitly expect users to cd back to their home directory between installing each SDK component. It seems like it would be more user-friendly to make this more copy-paste friendly, by having commands like cd ../python and cd ../cwl.

Fixed.

Peter Amstutz wrote:

I moved the install instructions to the CLI page and also added some instructions about the Python SDK. Note that the Python SDK page also discusses mentions deb/rpm packages, but the CLI page does not.

This can be addressed separately, but is there any reason to believe this is anything besides an historical accident or oversight?

No, I'm pretty sure it's just historical accident/oversight.

Actions #10

Updated by Brett Smith over 8 years ago

Pushed one last sentence structure copyedit at de4d6cb. That's good to merge. Thanks.

Actions #11

Updated by Peter Amstutz over 8 years ago

  • Status changed from New to Resolved
  • % Done changed from 50 to 100

Applied in changeset arvados|commit:931236cc8c9bea9c796aed23c1636abce3ee4534.

Actions

Also available in: Atom PDF