Bug #20933
closedarv-copy needs to follow run: fields when scanning cwl dependencies
Updated by Peter Amstutz over 1 year ago
20933-arv-copy-cwl @ bb7f5448594e55b86efb54a69bd58f886e3833a9
Updated by Peter Amstutz over 1 year ago
- Target version changed from Development 2023-09-13 sprint to Development 2023-09-27 sprint
Updated by Peter Amstutz about 1 year ago
- Target version changed from Development 2023-09-27 sprint to Development 2023-10-11 sprint
Updated by Peter Amstutz about 1 year ago
20933-arv-copy-cwl @ 96a389052be846aed84b7c9aead713ce8a54075e
- All agreed upon points are implemented / addressed.
- arv-copy can now correctly copy workflows where workflow steps refer to other CWL objects with keep: or arvwf: references
- Anything not implemented (discovered or discussed during work) has a follow-up story.
- n/a
- Code is tested and passing, both automated and manual, what manual testing was done is described
- added automated tests, also tested manually copying a real-world workflow between two clusters and confirmed that all the dependencies were correctly copied
- Documentation has been updated.
- Added a mention that it requires arvados-cwl-runner and a small update to remove redundant
--src
and--dst
from a couple of examples
- Added a mention that it requires arvados-cwl-runner and a small update to remove redundant
- Behaves appropriately at the intended scale (describe intended scale).
- Any workflow that can reasonably be run should work since it's just parsing the file and copying collections
- Considered backwards and forwards compatibility issues between client and server.
- Can copy workflows registered with older versions of a-c-r, but also fixed a bug registering workflows with the current a-c-r (it wasn't adding dockerCollectionPDH in certain situations where it should have)
- Follows our coding standards and GUI style guidelines.
- yes
The problem this is solving is that following #19688, arv-copy
could not longer copy workflows correctly. This is because the copying strategy previously relied on the fact that the entire workflow was shoved into the "definition" field of the workflow record, so there was only one data record that needed to be processed. With the new strategy (the workflow definition only contains a "wrapper" workflow that has an external reference to the real workflow, among other changes) it became necessary to be able to follow all the references between CWL files.
The solution works by adding a new command arvados-cwl-runner --print-keep-deps
which loads a workflow definition, finds all keep collection PDHs and docker image references, and prints them out as a simple json list of strings. arv-copy
uses this this to get the list of collection dependencies and copies them.
Having arv-copy
depend on arvados-cwl-runner
this way is a bit clunky. Unfortunately, it seems like the best option. This is because having arv-copy
import the arvados_cwl
module would be a circular dependency; duplicating the full logic for loading and traversing workflows is infeasible; moving arv-copy
into a separate package would be create its own overhead and confusion; and arvados-cwl-runner
is already complicated enough without adding workflow copying.
Updated by Brett Smith about 1 year ago
Peter Amstutz wrote in #note-8:
20933-arv-copy-cwl @ 96a389052be846aed84b7c9aead713ce8a54075e
In arvados.util
:
keep_file_locator_pattern = re.compile(r'([0-9a-f]{32}\+\d+)/(.*)')
keepuri_pattern = re.compile(r'keep:([0-9a-f]{32}\+\d+)/(.*)')
These regexps are overpermissive in a couple ways. The first is a pre-existing problem with other regexps in arvados.util, but \d
can match any Unicode digit, including non-ASCII ones:
>>> re.match(r'\d', '\u0668') <re.Match object; span=(0, 1), match='٨'>
We could either say [0-9]
, or pass flags=re.ASCII
.
Second, the other regexps in arvados.util typically use \S*
for file paths, that would be nicer than .*
to avoid matching more than intended.
In print_keep_deps
:
arvRunner.stdout.write(json.dumps(sorted(list(references)))+"\n")
It would be more performant, and seems cleaner, to say:
json.dump(sorted(references), arvRunner.stdout)
print(file=arvRunner.stdout)
sorted
always makes and returns a list, so sorted(list(it))
is redundant work, and this doesn't require json
to build a potentially large string just to write it back out later.
In arv-copy
:
try:
result = subprocess.run(["arvados-cwl-runner", "--quiet", "--print-keep-deps", "arvwf:"+wf_uuid],
capture_output=True, env=env)
except (FileNotFoundError, subprocess.CalledProcessError):
logger.error('Copying workflows requires arvados-cwl-runner 2.7.1 or later to be installed in PATH.')
return
Note that subprocess.run
won't raise CalledProcessError
unless you pass check=True
.
But beyond that, it would be nice to limit the check to a returncode of 2 (EXIT_INVALIDARGUMENT
), which argparse uses when it gets an unrecognized option. That way we can tell "arv-copy too old" apart from other problems. So it could look like:
try:
result = subprocess.run(["arvados-cwl-runner", "--quiet", "--print-keep-deps", "arvwf:"+wf_uuid],
capture_output=True, env=env)
except FileNotFoundError:
no_arv_copy = True
else:
no_arv_copy = result.returncode == 2
if no_arv_copy:
logger.error('Copying workflows requires arvados-cwl-runner 2.7.1 or later to be installed in PATH.')
return
elif result.returncode != 0:
# Other error handling?
I'm also concerned about how this error gets propagated. Basically, the error log is all you get. Even after you encounter this problem, arv-copy
reports that everything is fine:
% PATH=/usr/bin /opt/arvados/arvenv/bin/arv-copy --project-uuid pirca-j7d0g-bbuwax2ne0jqkop pirca-j7d0g-gtfh1yipuc20vsj 2023-10-06 14:38:58 arvados.arv-copy[656605] ERROR: Copying workflows requires arvados-cwl-runner 2.7.1 or later to be installed in PATH. 2023-10-06 14:38:58 arvados.arv-copy[656605] ERROR: Copying workflows requires arvados-cwl-runner 2.7.1 or later to be installed in PATH. 2023-10-06 14:38:58 arvados.arv-copy[656605] ERROR: Copying workflows requires arvados-cwl-runner 2.7.1 or later to be installed in PATH. 2023-10-06 14:38:58 arvados.arv-copy[656605] ERROR: Copying workflows requires arvados-cwl-runner 2.7.1 or later to be installed in PATH. pirca-j7d0g-2o53rt3drgk4gfo 2023-10-06 14:38:58 arvados.arv-copy[656605] INFO: Success: created copy with uuid pirca-j7d0g-2o53rt3drgk4gfo
I don't mind it copying what it can, but I feel like that could be better reflected in the final log message and exit code.
In setup.py
:
'zipp<3.16.0'
Can you please add a comment explaining why this dependency is necessary? It doesn't seem to be used directly, so I assume it's a conflict resolution fix, which is fine but those situations become really hard to remember after a while. Especially since this won't get security updates.
Thanks.
Updated by Peter Amstutz about 1 year ago
- Target version changed from Development 2023-10-11 sprint to Development 2023-10-25 sprint
Updated by Peter Amstutz about 1 year ago
Brett Smith wrote in #note-9:
Peter Amstutz wrote in #note-8:
20933-arv-copy-cwl @ 96a389052be846aed84b7c9aead713ce8a54075e
These regexps are overpermissive in a couple ways. The first is a pre-existing problem with other regexps in arvados.util, but
\d
can match any Unicode digit, including non-ASCII ones:We could either say
[0-9]
, or passflags=re.ASCII
.
Switched them over to using [0-9]
Second, the other regexps in arvados.util typically use
\S*
for file paths, that would be nicer than.*
to avoid matching more than intended.
I did not change these as I am not 100% confident that they wouldn't sometimes be used with unquoted filenames containing spaces.
sorted
always makes and returns a list, sosorted(list(it))
is redundant work, and this doesn't requirejson
to build a potentially large string just to write it back out later.
Done.
But beyond that, it would be nice to limit the check to a returncode of 2 (
EXIT_INVALIDARGUMENT
), which argparse uses when it gets an unrecognized option. That way we can tell "arv-copy too old" apart from other problems. So it could look like:
Done.
I'm also concerned about how this error gets propagated. Basically, the error log is all you get. Even after you encounter this problem,
arv-copy
reports that everything is fine:I don't mind it copying what it can, but I feel like that could be better reflected in the final log message and exit code.
The two places where copy_workflow is called now handle exceptions, so I switched these over to raising exceptions.
In
setup.py
:[...]
Can you please add a comment explaining why this dependency is necessary? It doesn't seem to be used directly, so I assume it's a conflict resolution fix, which is fine but those situations become really hard to remember after a while. Especially since this won't get security updates.
Done. zipp 3.16 dropped Python 3.7 support and we're currently still supporting Python 3.7. The fact that a library backport, whose reason to exist is to support older Python versions, is dropping support for older Python versions is irksome.
It's likely that we'll drop Python 3.7 for the next major version of Arvados but for the time being this avoids rocking the boat.
20933-arv-copy-cwl @ 94c05a1d46dfda5766c3a3a6a220e3471fd4b5ec
Updated by Brett Smith about 1 year ago
Peter Amstutz wrote in #note-11:
Done. zipp 3.16 dropped Python 3.7 support and we're currently still supporting Python 3.7. The fact that a library backport, whose reason to exist is to support older Python versions, is dropping support for older Python versions is irksome.
That's funny. I know a lot of open source maintainers find the fact that companies make demands on their work, while contributing zero development or financial resources themselves, to be irksome.
Can we limit this pin to the affected Python versions? That should still solve the problem, while avoiding the security risks of a permanent pin.
'zipp<3.16.0; python_version<"3.8"',
We're already doing this in services/dockercleaner/setup.py
if another example would help.
If that sounds good to you I think this is good to merge with that change, thanks.
Updated by Peter Amstutz about 1 year ago
Brett Smith wrote in #note-12:
Peter Amstutz wrote in #note-11:
Done. zipp 3.16 dropped Python 3.7 support and we're currently still supporting Python 3.7. The fact that a library backport, whose reason to exist is to support older Python versions, is dropping support for older Python versions is irksome.
That's funny. I know a lot of open source maintainers find the fact that companies make demands on their work, while contributing zero development or financial resources themselves, to be irksome.
Can we limit this pin to the affected Python versions? That should still solve the problem, while avoiding the security risks of a permanent pin.
[...]
We're already doing this in
services/dockercleaner/setup.py
if another example would help.If that sounds good to you I think this is good to merge with that change, thanks.
Done, thanks.
Updated by Peter Amstutz about 1 year ago
- Status changed from In Progress to Resolved