Project

General

Profile

Actions

Bug #20933

closed

arv-copy needs to follow run: fields when scanning cwl dependencies

Added by Peter Amstutz over 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
SDKs
Story points:
-
Release relationship:
Auto

Subtasks 1 (0 open1 closed)

Task #20941: Review 20933-arv-copy-cwlResolvedPeter Amstutz09/08/2023Actions
Actions #1

Updated by Peter Amstutz over 1 year ago

  • Status changed from New to In Progress
Actions #3

Updated by Peter Amstutz over 1 year ago

  • Release set to 66
Actions #4

Updated by Peter Amstutz over 1 year ago

  • Release changed from 66 to 67
Actions #5

Updated by Peter Amstutz over 1 year ago

  • Target version changed from Development 2023-09-13 sprint to Development 2023-09-27 sprint
Actions #6

Updated by Peter Amstutz about 1 year ago

  • Target version changed from Development 2023-09-27 sprint to Development 2023-10-11 sprint
Actions #7

Updated by Peter Amstutz about 1 year ago

  • Category set to SDKs
Actions #8

Updated by Peter Amstutz about 1 year ago

20933-arv-copy-cwl @ 96a389052be846aed84b7c9aead713ce8a54075e

developer-run-tests: #3848

  • All agreed upon points are implemented / addressed.
    • arv-copy can now correctly copy workflows where workflow steps refer to other CWL objects with keep: or arvwf: references
  • Anything not implemented (discovered or discussed during work) has a follow-up story.
    • n/a
  • Code is tested and passing, both automated and manual, what manual testing was done is described
    • added automated tests, also tested manually copying a real-world workflow between two clusters and confirmed that all the dependencies were correctly copied
  • Documentation has been updated.
    • Added a mention that it requires arvados-cwl-runner and a small update to remove redundant --src and --dst from a couple of examples
  • Behaves appropriately at the intended scale (describe intended scale).
    • Any workflow that can reasonably be run should work since it's just parsing the file and copying collections
  • Considered backwards and forwards compatibility issues between client and server.
    • Can copy workflows registered with older versions of a-c-r, but also fixed a bug registering workflows with the current a-c-r (it wasn't adding dockerCollectionPDH in certain situations where it should have)
  • Follows our coding standards and GUI style guidelines.
    • yes

The problem this is solving is that following #19688, arv-copy could not longer copy workflows correctly. This is because the copying strategy previously relied on the fact that the entire workflow was shoved into the "definition" field of the workflow record, so there was only one data record that needed to be processed. With the new strategy (the workflow definition only contains a "wrapper" workflow that has an external reference to the real workflow, among other changes) it became necessary to be able to follow all the references between CWL files.

The solution works by adding a new command arvados-cwl-runner --print-keep-deps which loads a workflow definition, finds all keep collection PDHs and docker image references, and prints them out as a simple json list of strings. arv-copy uses this this to get the list of collection dependencies and copies them.

Having arv-copy depend on arvados-cwl-runner this way is a bit clunky. Unfortunately, it seems like the best option. This is because having arv-copy import the arvados_cwl module would be a circular dependency; duplicating the full logic for loading and traversing workflows is infeasible; moving arv-copy into a separate package would be create its own overhead and confusion; and arvados-cwl-runner is already complicated enough without adding workflow copying.

Actions #9

Updated by Brett Smith about 1 year ago

Peter Amstutz wrote in #note-8:

20933-arv-copy-cwl @ 96a389052be846aed84b7c9aead713ce8a54075e

In arvados.util:

keep_file_locator_pattern = re.compile(r'([0-9a-f]{32}\+\d+)/(.*)')
keepuri_pattern = re.compile(r'keep:([0-9a-f]{32}\+\d+)/(.*)')

These regexps are overpermissive in a couple ways. The first is a pre-existing problem with other regexps in arvados.util, but \d can match any Unicode digit, including non-ASCII ones:

>>> re.match(r'\d', '\u0668')
<re.Match object; span=(0, 1), match='٨'>

We could either say [0-9], or pass flags=re.ASCII.

Second, the other regexps in arvados.util typically use \S* for file paths, that would be nicer than .* to avoid matching more than intended.

In print_keep_deps:

arvRunner.stdout.write(json.dumps(sorted(list(references)))+"\n")

It would be more performant, and seems cleaner, to say:

json.dump(sorted(references), arvRunner.stdout)
print(file=arvRunner.stdout)

sorted always makes and returns a list, so sorted(list(it)) is redundant work, and this doesn't require json to build a potentially large string just to write it back out later.

In arv-copy:

        try:
            result = subprocess.run(["arvados-cwl-runner", "--quiet", "--print-keep-deps", "arvwf:"+wf_uuid],
                                    capture_output=True, env=env)
        except (FileNotFoundError, subprocess.CalledProcessError):
            logger.error('Copying workflows requires arvados-cwl-runner 2.7.1 or later to be installed in PATH.')
            return

Note that subprocess.run won't raise CalledProcessError unless you pass check=True.

But beyond that, it would be nice to limit the check to a returncode of 2 (EXIT_INVALIDARGUMENT), which argparse uses when it gets an unrecognized option. That way we can tell "arv-copy too old" apart from other problems. So it could look like:

        try:
            result = subprocess.run(["arvados-cwl-runner", "--quiet", "--print-keep-deps", "arvwf:"+wf_uuid],
                                    capture_output=True, env=env)
        except FileNotFoundError:
            no_arv_copy = True
        else:
            no_arv_copy = result.returncode == 2
        if no_arv_copy:
            logger.error('Copying workflows requires arvados-cwl-runner 2.7.1 or later to be installed in PATH.')
            return
        elif result.returncode != 0:
            # Other error handling?

I'm also concerned about how this error gets propagated. Basically, the error log is all you get. Even after you encounter this problem, arv-copy reports that everything is fine:

% PATH=/usr/bin /opt/arvados/arvenv/bin/arv-copy --project-uuid pirca-j7d0g-bbuwax2ne0jqkop pirca-j7d0g-gtfh1yipuc20vsj    
2023-10-06 14:38:58 arvados.arv-copy[656605] ERROR: Copying workflows requires arvados-cwl-runner 2.7.1 or later to be installed in PATH.
2023-10-06 14:38:58 arvados.arv-copy[656605] ERROR: Copying workflows requires arvados-cwl-runner 2.7.1 or later to be installed in PATH.
2023-10-06 14:38:58 arvados.arv-copy[656605] ERROR: Copying workflows requires arvados-cwl-runner 2.7.1 or later to be installed in PATH.
2023-10-06 14:38:58 arvados.arv-copy[656605] ERROR: Copying workflows requires arvados-cwl-runner 2.7.1 or later to be installed in PATH.
pirca-j7d0g-2o53rt3drgk4gfo                                                                                                              
2023-10-06 14:38:58 arvados.arv-copy[656605] INFO: Success: created copy with uuid pirca-j7d0g-2o53rt3drgk4gfo

I don't mind it copying what it can, but I feel like that could be better reflected in the final log message and exit code.

In setup.py:

'zipp<3.16.0'

Can you please add a comment explaining why this dependency is necessary? It doesn't seem to be used directly, so I assume it's a conflict resolution fix, which is fine but those situations become really hard to remember after a while. Especially since this won't get security updates.

Thanks.

Actions #10

Updated by Peter Amstutz about 1 year ago

  • Target version changed from Development 2023-10-11 sprint to Development 2023-10-25 sprint
Actions #11

Updated by Peter Amstutz about 1 year ago

Brett Smith wrote in #note-9:

Peter Amstutz wrote in #note-8:

20933-arv-copy-cwl @ 96a389052be846aed84b7c9aead713ce8a54075e

These regexps are overpermissive in a couple ways. The first is a pre-existing problem with other regexps in arvados.util, but \d can match any Unicode digit, including non-ASCII ones:

We could either say [0-9], or pass flags=re.ASCII.

Switched them over to using [0-9]

Second, the other regexps in arvados.util typically use \S* for file paths, that would be nicer than .* to avoid matching more than intended.

I did not change these as I am not 100% confident that they wouldn't sometimes be used with unquoted filenames containing spaces.

sorted always makes and returns a list, so sorted(list(it)) is redundant work, and this doesn't require json to build a potentially large string just to write it back out later.

Done.

But beyond that, it would be nice to limit the check to a returncode of 2 (EXIT_INVALIDARGUMENT), which argparse uses when it gets an unrecognized option. That way we can tell "arv-copy too old" apart from other problems. So it could look like:

Done.

I'm also concerned about how this error gets propagated. Basically, the error log is all you get. Even after you encounter this problem, arv-copy reports that everything is fine:

I don't mind it copying what it can, but I feel like that could be better reflected in the final log message and exit code.

The two places where copy_workflow is called now handle exceptions, so I switched these over to raising exceptions.

In setup.py:

[...]

Can you please add a comment explaining why this dependency is necessary? It doesn't seem to be used directly, so I assume it's a conflict resolution fix, which is fine but those situations become really hard to remember after a while. Especially since this won't get security updates.

Done. zipp 3.16 dropped Python 3.7 support and we're currently still supporting Python 3.7. The fact that a library backport, whose reason to exist is to support older Python versions, is dropping support for older Python versions is irksome.

It's likely that we'll drop Python 3.7 for the next major version of Arvados but for the time being this avoids rocking the boat.

20933-arv-copy-cwl @ 94c05a1d46dfda5766c3a3a6a220e3471fd4b5ec

developer-run-tests: #3860

Actions #12

Updated by Brett Smith about 1 year ago

Peter Amstutz wrote in #note-11:

Done. zipp 3.16 dropped Python 3.7 support and we're currently still supporting Python 3.7. The fact that a library backport, whose reason to exist is to support older Python versions, is dropping support for older Python versions is irksome.

That's funny. I know a lot of open source maintainers find the fact that companies make demands on their work, while contributing zero development or financial resources themselves, to be irksome.

Can we limit this pin to the affected Python versions? That should still solve the problem, while avoiding the security risks of a permanent pin.

          'zipp<3.16.0; python_version<"3.8"',

We're already doing this in services/dockercleaner/setup.py if another example would help.

If that sounds good to you I think this is good to merge with that change, thanks.

Actions #13

Updated by Peter Amstutz about 1 year ago

Brett Smith wrote in #note-12:

Peter Amstutz wrote in #note-11:

Done. zipp 3.16 dropped Python 3.7 support and we're currently still supporting Python 3.7. The fact that a library backport, whose reason to exist is to support older Python versions, is dropping support for older Python versions is irksome.

That's funny. I know a lot of open source maintainers find the fact that companies make demands on their work, while contributing zero development or financial resources themselves, to be irksome.

Can we limit this pin to the affected Python versions? That should still solve the problem, while avoiding the security risks of a permanent pin.

[...]

We're already doing this in services/dockercleaner/setup.py if another example would help.

If that sounds good to you I think this is good to merge with that change, thanks.

Done, thanks.

Actions #14

Updated by Peter Amstutz about 1 year ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF