Peter Amstutz wrote in #note-8:
20933-arv-copy-cwl @ 96a389052be846aed84b7c9aead713ce8a54075e
In arvados.util
:
keep_file_locator_pattern = re.compile(r'([0-9a-f]{32}\+\d+)/(.*)')
keepuri_pattern = re.compile(r'keep:([0-9a-f]{32}\+\d+)/(.*)')
These regexps are overpermissive in a couple ways. The first is a pre-existing problem with other regexps in arvados.util, but \d
can match any Unicode digit, including non-ASCII ones:
>>> re.match(r'\d', '\u0668')
<re.Match object; span=(0, 1), match='٨'>
We could either say [0-9]
, or pass flags=re.ASCII
.
Second, the other regexps in arvados.util typically use \S*
for file paths, that would be nicer than .*
to avoid matching more than intended.
In print_keep_deps
:
arvRunner.stdout.write(json.dumps(sorted(list(references)))+"\n")
It would be more performant, and seems cleaner, to say:
json.dump(sorted(references), arvRunner.stdout)
print(file=arvRunner.stdout)
sorted
always makes and returns a list, so sorted(list(it))
is redundant work, and this doesn't require json
to build a potentially large string just to write it back out later.
In arv-copy
:
try:
result = subprocess.run(["arvados-cwl-runner", "--quiet", "--print-keep-deps", "arvwf:"+wf_uuid],
capture_output=True, env=env)
except (FileNotFoundError, subprocess.CalledProcessError):
logger.error('Copying workflows requires arvados-cwl-runner 2.7.1 or later to be installed in PATH.')
return
Note that subprocess.run
won't raise CalledProcessError
unless you pass check=True
.
But beyond that, it would be nice to limit the check to a returncode of 2 (EXIT_INVALIDARGUMENT
), which argparse uses when it gets an unrecognized option. That way we can tell "arv-copy too old" apart from other problems. So it could look like:
try:
result = subprocess.run(["arvados-cwl-runner", "--quiet", "--print-keep-deps", "arvwf:"+wf_uuid],
capture_output=True, env=env)
except FileNotFoundError:
no_arv_copy = True
else:
no_arv_copy = result.returncode == 2
if no_arv_copy:
logger.error('Copying workflows requires arvados-cwl-runner 2.7.1 or later to be installed in PATH.')
return
elif result.returncode != 0:
# Other error handling?
I'm also concerned about how this error gets propagated. Basically, the error log is all you get. Even after you encounter this problem, arv-copy
reports that everything is fine:
% PATH=/usr/bin /opt/arvados/arvenv/bin/arv-copy --project-uuid pirca-j7d0g-bbuwax2ne0jqkop pirca-j7d0g-gtfh1yipuc20vsj
2023-10-06 14:38:58 arvados.arv-copy[656605] ERROR: Copying workflows requires arvados-cwl-runner 2.7.1 or later to be installed in PATH.
2023-10-06 14:38:58 arvados.arv-copy[656605] ERROR: Copying workflows requires arvados-cwl-runner 2.7.1 or later to be installed in PATH.
2023-10-06 14:38:58 arvados.arv-copy[656605] ERROR: Copying workflows requires arvados-cwl-runner 2.7.1 or later to be installed in PATH.
2023-10-06 14:38:58 arvados.arv-copy[656605] ERROR: Copying workflows requires arvados-cwl-runner 2.7.1 or later to be installed in PATH.
pirca-j7d0g-2o53rt3drgk4gfo
2023-10-06 14:38:58 arvados.arv-copy[656605] INFO: Success: created copy with uuid pirca-j7d0g-2o53rt3drgk4gfo
I don't mind it copying what it can, but I feel like that could be better reflected in the final log message and exit code.
In setup.py
:
'zipp<3.16.0'
Can you please add a comment explaining why this dependency is necessary? It doesn't seem to be used directly, so I assume it's a conflict resolution fix, which is fine but those situations become really hard to remember after a while. Especially since this won't get security updates.
Thanks.