Idea #13306
closedarvados-cwl-runner supports Python 3
Description
Check that all the dependencies support python3 (>= 3.4)
'cwltool==1.0.20180403145700' → yes
'schema-salad==2.6.20171201034858' → yes
'typing==3.5.3.0' → yes
'ruamel.yaml==0.13.7' → yes
'arvados-python-client>=0.1.20170526013812' → yes
'setuptools' → yes
'ciso8601 >=1.0.0, <=1.0.4' → yes
Related issues
Updated by Tom Morris over 6 years ago
- Related to Idea #11308: Support Python3 for arvados-python-client & command line utilities added
Updated by Tom Morris almost 6 years ago
- Blocks Idea #14532: [Epic] Port to Python 3 to for Python 2 sunset in December 2019 added
Updated by Tom Morris almost 6 years ago
- Assigned To set to Eric Biagiotti
- Target version changed from To Be Groomed to 2019-01-16 Sprint
Updated by Eric Biagiotti almost 6 years ago
- Status changed from New to In Progress
Updated by Eric Biagiotti almost 6 years ago
- Related to Bug #14698: Running Python2 and Python3 tests seperately added
Updated by Tom Morris almost 6 years ago
- Target version changed from 2019-01-16 Sprint to 2019-01-30 Sprint
Updated by Eric Biagiotti almost 6 years ago
- Target version changed from 2019-01-30 Sprint to 2019-02-13 Sprint
Updated by Peter Amstutz almost 6 years ago
- for f, p in generatemapper.items():
+ for f, p in list(generatemapper.items()):
Is this necessary? My understanding was that the difference is py2 .items() returns a copy of a list and in py3 it returns an iterator. However in this construct, wouldn't they have equivalent behavior?
- for k,v in job_spec["script_parameters"].items():
+ for k,v in viewitems(job_spec["script_parameters"]):
What's the difference between .items(), list(.items()), and viewitems() ?
with final.open("cwl.output.json", "w") as f:
- json.dump(outputObj, f, sort_keys=True, indent=4, separators=(',',': '))
+ res = json.dumps(outputObj, sort_keys=True, indent=4, separators=(',',': '), ensure_ascii=False).encode('utf-8').decode()
+ f.write(res)
Why can't json.dump() be used to write directly to f? Why do you call encode() and then decode() again? Should we be opening the file in text mode: final.open("cwl.output.json", "wt", encoding="utf-8")
There might be other places we should be opening files in text mode.
- return uri
+ return uri.encode("utf-8").decode()
Similar comment, what's with the encode/decode spin?
Updated by Eric Biagiotti almost 6 years ago
Peter Amstutz wrote:
[...]
Is this necessary? My understanding was that the difference is py2 .items() returns a copy of a list and in py3 it returns an iterator. However in this construct, wouldn't they have equivalent behavior?
This is actually a mistake introduced by futurize. generatemapper is cwltool PathMapper object and has an items() function like dict. items() already returns a list. I fixed this in executor.py, but missed it in arvcontainer.py and arvjob.py. Fixed in 7e1c8ea7.
Updated by Eric Biagiotti almost 6 years ago
Peter Amstutz wrote:
[...]
Why can't json.dump() be used to write directly to f? Why do you call encode() and then decode() again? Should we be opening the file in text mode:
final.open("cwl.output.json", "wt", encoding="utf-8")
There might be other places we should be opening files in text mode.
From what I understand, the problem is with json.dump, which produces a str object in python 2 (bytestring) and a str object in python 3 (unicode), resulting in incompatibility when writing to the file (expects unicode). In order to prevent using py2 and py3 specific code, the solution is to use json.dumps, which gives us an opportunuty to call encode, which will force the string to be bytes in both languages, then decode to unicode from there, before writing to the file. If this makes the code too confusing, we can alternitavely add py2/3 specific code here.
I explained this a bit in my commit message, but in the future I'll also reference the commit and put explanations like this on the issue page too.
Updated by Peter Amstutz almost 6 years ago
Eric Biagiotti wrote:
Peter Amstutz wrote:
[...]
Why can't json.dump() be used to write directly to f? Why do you call encode() and then decode() again? Should we be opening the file in text mode:
final.open("cwl.output.json", "wt", encoding="utf-8")
There might be other places we should be opening files in text mode.
From what I understand, the problem is with json.dump, which produces a str object in python 2 (bytestring) and a str object in python 3 (unicode), resulting in incompatibility when writing to the file (expects unicode). In order to prevent using py2 and py3 specific code, the solution is to use json.dumps, which gives us an opportunuty to call encode, which will force the string to be bytes in both languages, then decode to unicode from there, before writing to the file. If this makes the code too confusing, we can alternitavely add py2/3 specific code here.
If you open it in text mode, can you just do f.write(python.dumps(...))
?
Updated by Eric Biagiotti almost 6 years ago
Peter Amstutz wrote:
Eric Biagiotti wrote:
Peter Amstutz wrote:
[...]
Why can't json.dump() be used to write directly to f? Why do you call encode() and then decode() again? Should we be opening the file in text mode:
final.open("cwl.output.json", "wt", encoding="utf-8")
There might be other places we should be opening files in text mode.
From what I understand, the problem is with json.dump, which produces a str object in python 2 (bytestring) and a str object in python 3 (unicode), resulting in incompatibility when writing to the file (expects unicode). In order to prevent using py2 and py3 specific code, the solution is to use json.dumps, which gives us an opportunuty to call encode, which will force the string to be bytes in both languages, then decode to unicode from there, before writing to the file. If this makes the code too confusing, we can alternitavely add py2/3 specific code here.
If you open it in text mode, can you just do
f.write(python.dumps(...))
?
You mean json.dumps? If so, I did try that and it doesn't work. Text mode is the default, which is why its expecting unicode and dumps has the same problem with the string it returns as dump.
Updated by Eric Biagiotti almost 6 years ago
Peter Amstutz wrote:
[...]
What's the difference between .items(), list(.items()), and viewitems() ?
Using just items() for dict iteration in python2 is inefficient. As you previously said, it makes a copy, so the recommended way is to use viewitems. The reason you will still see items() in use in some places is that it is either an arvados collection object or a PathMapper object, which both have items() functions.
Though upon further review, I did notice some fixes that I missed. Updated in dd98e9fb.
Similar comment, what's with the encode/decode spin?
Regarding the 2 encode/decode comments, I think I found a better solution, which is to use the backwards compatible str object from builtins, which handles the differences automatically. Updated in 1e8da3c1.
Tests run: https://ci.curoverse.com/view/Developer/job/developer-run-tests/1050/
Might be worth running the conformance and integration tests again on your end. Thanks!
Updated by Peter Amstutz almost 6 years ago
Could you add a "classifiers" section to sdk/python/setup.py and sdk/cwl/setup.py to advertise Python 3 support?
setup(name='arvados-cwl-runner', ... classifiers=[ 'Programming Language :: Python :: 2', 'Programming Language :: Python :: 3', ] )
Updated by Peter Amstutz almost 6 years ago
2019-02-01 17:46:54 arvados.cwl-runner[25772] ERROR: [bwa-mem-tool.cwl] While getting final output object: 'dict_values' object does not support indexing (X-Request-Id: req-r95c5lrjbewtsqqce5y8) Traceback (most recent call last): File "/home/peter/.arvbox/arvbox/arvados/sdk/cwl/arvados_cwl/runner.py", line 460, in done done.logtail(logc, logger.error, "%s (%s) error log:" % (self.arvrunner.label(self), record["uuid"]), maxlen=40) File "/home/peter/.arvbox/arvbox/arvados/sdk/cwl/arvados_cwl/done.py", line 96, in logtail loglines = viewvalues(mergelogs)[0] TypeError: 'dict_values' object does not support indexing 2019-02-01 17:46:54 arvados.cwl-runner[25772] ERROR: Overall process status is permanentFail 2019-02-01 17:46:54 cwltool[25772] WARNING: Final process status is permanentFail
I hit this bug testing on crunchv1 (--api=jobs)
Updated by Peter Amstutz almost 6 years ago
Traceback (most recent call last): File "/usr/local/bin/arv-keepdocker", line 7, in <module> main() File "/usr/local/lib/python3.5/dist-packages/arvados/commands/keepdocker.py", line 384, in main image_hash = find_one_image_hash(args.image, args.tag) File "/usr/local/lib/python3.5/dist-packages/arvados/commands/keepdocker.py", line 161, in find_one_image_hash hashes = find_image_hashes(image_search, image_tag) File "/usr/local/lib/python3.5/dist-packages/arvados/commands/keepdocker.py", line 153, in find_image_hashes for image in docker_images(): File "/usr/local/lib/python3.5/dist-packages/arvados/commands/keepdocker.py", line 138, in docker_images ctime = ' '.join(words[3:size_index]) TypeError: sequence item 0: expected str instance, bytes found
edit: disregard this one, it seems to keep installing arvados-python-client from PyPi instead of the branch where this bug is fixed.
Updated by Peter Amstutz almost 6 years ago
I've added the ability to select between python/python3 and conformance/integration in test_with_arvbox.
New options:
--build (build the jobs image)
--pythoncmd (python|python3)
--suite (conformance|integration)
--api (jobs|containers)
$ test_with_arvbox.sh --no-reset-container --leave-running --config dev --build --pythoncmd python3 --suite integration -j3
Once the main branch is merged, we can update https://ci.curoverse.com/view/CWL/job/run-arvados-cwl-conformance-tests/ to run both sets of tests for both python and python3.
Updated by Peter Amstutz almost 6 years ago
31766825f158fa427e7b01cdc28d2b8e0abecea7 has a quick and dirty fix for #14806
The quick and dirty commit should be removed from the branch once the real fix is merged to master.
Updated by Peter Amstutz almost 6 years ago
Peter Amstutz wrote:
[...]
I hit this bug testing on crunchv1 (--api=jobs)
This is fixed in 9b712f30fe6784a5c8e747a8cf229c54cc02a509
Updated by Eric Biagiotti almost 6 years ago
Peter Amstutz wrote:
I've added the ability to select between python/python3 and conformance/integration in test_with_arvbox.
New options:
--build (build the jobs image)
--pythoncmd (python|python3)
--suite (conformance|integration)
--api (jobs|containers)
[...]
Once the main branch is merged, we can update https://ci.curoverse.com/view/CWL/job/run-arvados-cwl-conformance-tests/ to run both sets of tests for both python and python3.
Thank you, this is a huge help. In anticipation of the merge with master, I updated the wiki to reflect this. https://dev.arvados.org/projects/arvados/wiki/Arvados-cwl-runner_development
Updated by Eric Biagiotti almost 6 years ago
- Status changed from In Progress to Resolved