Project

General

Profile

Actions

Bug #11584

closed

Allow "keep:" references on command line.

Added by Peter Amstutz over 7 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
CWL
Target version:
Story points:
-
Release relationship:
Auto

Description

arvados-cwl-runner lets you provide input parameter values on the command line. However, currently it assumes they are relative paths local files and doesn't allow you to use references to keep. This should be fixed to permit references to keep.

As a secondary problem, "file not found" when checking command line parameters results in a backtrace, rather being caught and printing a friendlier error message.


Related issues

Related to Arvados Epics - Idea #17848: CWL runner improvementsResolved07/01/202103/29/2023Actions
Actions #1

Updated by Peter Amstutz over 7 years ago

  • Subject changed from [CWL] Allow keep: references on command line. to [CWL] Allow "keep:" references on command line.
  • Description updated (diff)
Actions #2

Updated by Peter Amstutz over 7 years ago

  • Description updated (diff)
Actions #3

Updated by Tom Morris about 7 years ago

  • Target version set to Arvados Future Sprints
Actions #4

Updated by Peter Amstutz over 3 years ago

  • Related to Idea #16011: CWL support, docs, training, website added
Actions #5

Updated by Peter Amstutz over 3 years ago

  • Target version changed from Arvados Future Sprints to 2021-03-31 sprint
  • Assigned To set to Jiayong Li
Actions #6

Updated by Peter Amstutz over 3 years ago

  • Category set to CWL
  • Subject changed from [CWL] Allow "keep:" references on command line. to Allow "keep:" references on command line.
Actions #7

Updated by Jiayong Li over 3 years ago

  • Description updated (diff)
Actions #8

Updated by Peter Amstutz over 3 years ago

  • Target version changed from 2021-03-31 sprint to 2021-04-14 sprint
Actions #9

Updated by Peter Amstutz over 3 years ago

  • Target version changed from 2021-04-14 sprint to 2021-04-28 bughunt sprint
Actions #10

Updated by Peter Amstutz over 3 years ago

  • Target version deleted (2021-04-28 bughunt sprint)
Actions #11

Updated by Peter Amstutz over 3 years ago

  • Related to Idea #17848: CWL runner improvements added
Actions #12

Updated by Peter Amstutz over 3 years ago

  • Related to deleted (Idea #16011: CWL support, docs, training, website)
Actions #13

Updated by Lucas Di Pentima almost 3 years ago

  • Assigned To changed from Jiayong Li to Lucas Di Pentima
Actions #14

Updated by Lucas Di Pentima almost 3 years ago

I think I've tracked down the issue.

cwltool's main.py file does some argument parsing in init_job_order() and translates for example:

['--script', 'keep:zztop-4zz18-x9a7cx3mw6a2czo/OSC_Preprocessing.ssf', '--sessions', 'keep:3c2af8506ae4d7844701000867858f67+8593/20220113']

to

{'job_order': None, 'script': {'class': 'File', 'location': 'file:///Users/lucas/Devel/git/arvados/sdk/cwl/keep%3Azztop-4zz18-x9a7cx3mw6a2czo/OSC_Preprocessing.ssf'}, 'sessions': [{'class': 'Directory', 'location': 'file:///Users/lucas/Devel/git/arvados/sdk/cwl/keep%3A3c2af8506ae4d7844701000867858f67%2B8593/20220113'}]}

At a-c-r's side, there're some checks for inputs starting with "keep:" that don't get executed because of this translation.

What's not clear to me is if a-c-r should decode the file:///full-path/keep%3Asome-uuid/some-file URI to something understandable like keep:some-uuid/some-file, or if it's a cwltool bug (or feature that can be disabled)

Actions #15

Updated by Lucas Di Pentima almost 3 years ago

I think the problem might be here:

https://github.com/common-workflow-language/cwltool/blob/318876b264480df545456c1468c2b60eb577acad/cwltool/argparser.py#L738

The location of File and Directory inputs get translated to an os.path.abspath() without even checking that what we have is a valid (and existing) relative path.

Actions #16

Updated by Peter Amstutz almost 3 years ago

Lucas Di Pentima wrote:

I think the problem might be here:

https://github.com/common-workflow-language/cwltool/blob/318876b264480df545456c1468c2b60eb577acad/cwltool/argparser.py#L738

The location of File and Directory inputs get translated to an os.path.abspath() without even checking that what we have is a valid (and existing) relative path.

Good catch! Probably not that hard to fix either, it just needs to expand those file/URI references differently.

Actions #17

Updated by Lucas Di Pentima almost 3 years ago

  • Status changed from New to In Progress

What I did is something like this:

diff --git a/cwltool/argparser.py b/cwltool/argparser.py
index ea885b64..ca135a5d 100644
--- a/cwltool/argparser.py
+++ b/cwltool/argparser.py
@@ -730,12 +730,16 @@ class FSAction(argparse.Action):
         values: Union[AnyStr, Sequence[Any], None],
         option_string: Optional[str] = None,
     ) -> None:
+        if os.path.exists(os.path.abspath(cast(AnyStr, values))):
+            location = file_uri(str(os.path.abspath(cast(AnyStr, values))))
+        else:
+            location = cast(AnyStr, values)
         setattr(
             namespace,
             self.dest,
             {
                 "class": self.objclass,
-                "location": file_uri(str(os.path.abspath(cast(AnyStr, values)))),
+                "location": location,
             },
         )

@@ -762,10 +766,14 @@ class FSAppendAction(argparse.Action):
         if not g:
             g = []
             setattr(namespace, self.dest, g)
+        if os.path.exists(os.path.abspath(cast(AnyStr, values))):
+            location = file_uri(str(os.path.abspath(cast(AnyStr, values))))
+        else:
+            location = cast(AnyStr, values)
         g.append(
             {
                 "class": self.objclass,
-                "location": file_uri(str(os.path.abspath(cast(AnyStr, values)))),
+                "location": location,
             }
         )

My thought process is that as cwltool doesn't know the keep:xxx URIs, and it's not an existing relative path, it shouldn't translate it at all. I guess cwltool users will get an "file not found" error somewhere downstream anyways.

I've ran the tests and 5 fail, although the same number of tests fail on main, so I'm not sure if this change broke something.

What would be the correct procedure to submit these changes to the project? (provided that they make sense)

Actions #18

Updated by Peter Amstutz almost 3 years ago

There's a better way to do this, there is a urljoin method that cwltool uses, which arvados-cwl-runner hooks into to make it aware of "keep:" URIs. It should "urljoin" on the file URI of the current directory and the string from the command line, and that has the correct logic to recognize when something is a URI and when it is a relative file reference (that's how this works for all the file loading, it should have been used here and was just overlooked).

Actions #19

Updated by Peter Amstutz almost 3 years ago

  • Target version set to 2022-02-16 sprint
Actions #20

Updated by Lucas Di Pentima almost 3 years ago

Fix PR at: https://github.com/common-workflow-language/cwltool/pull/1613 -- we'll have to update arvados-cwl-runner's pin on cwltool when this gets merged & released.

Actions #21

Updated by Peter Amstutz almost 3 years ago

  • Release set to 49
Actions #22

Updated by Peter Amstutz almost 3 years ago

Ticket #18723 is merged which includes the cwltool update with the bug fix, could you confirm this bug is fixed in arvados-cwl-runner?

Actions #23

Updated by Lucas Di Pentima almost 3 years ago

Yes, it seems to be working now. Thanks!

Actions #24

Updated by Peter Amstutz almost 3 years ago

  • Assigned To changed from Lucas Di Pentima to Peter Amstutz
  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF