Story #13306

arvados-cwl-runner supports Python 3

Added by Peter Amstutz over 1 year ago. Updated 10 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
01/30/2019
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

Check that all the dependencies support python3 (>= 3.4)

'cwltool==1.0.20180403145700' → yes
'schema-salad==2.6.20171201034858' → yes
'typing==3.5.3.0' → yes
'ruamel.yaml==0.13.7' → yes
'arvados-python-client>=0.1.20170526013812' → yes
'setuptools' → yes
'ciso8601 >=1.0.0, <=1.0.4' → yes


Subtasks

Task #14667: Review 13306-arvados-cwl-runner-py3-supportResolvedEric Biagiotti


Related issues

Related to Arvados - Story #11308: Support Python3 for arvados-python-client & command line utilitiesResolved03/21/2017

Related to Arvados - Bug #14698: Running Python2 and Python3 tests seperatelyResolved

Blocks Arvados - Story #14532: [Epic] Port to Python 3 to prepare for Python 2 sunsetting in December 2019In Progress

Associated revisions

Revision 84e42cb0 (diff)
Added by Peter Amstutz 10 months ago

arvbox support for python3 refs #13306

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <>

Revision bdbf7c4e
Added by Eric Biagiotti 10 months ago

Merge branch '13306-arvados-cwl-runner-py3-support'

refs #13306

Arvados-DCO-1.1-Signed-off-by: Eric Biagiotti <>

Revision 60f3033a
Added by Eric Biagiotti 10 months ago

Merge branch 'master' into 13306-arvados-cwl-runner-py3-support

refs #13306
3

Arvados-DCO-1.1-Signed-off-by: Eric Biagiotti <>

Revision f7678076
Added by Eric Biagiotti 10 months ago

Merge branch '13306-arvados-cwl-runner-py3-support'

refs #13306
3

Arvados-DCO-1.1-Signed-off-by: Eric Biagiotti <>

History

#1 Updated by Peter Amstutz over 1 year ago

  • Tracker changed from Support to Bug

#2 Updated by Peter Amstutz over 1 year ago

  • Description updated (diff)

#3 Updated by Peter Amstutz over 1 year ago

  • Description updated (diff)

#4 Updated by Peter Amstutz over 1 year ago

  • Description updated (diff)

#5 Updated by Peter Amstutz over 1 year ago

  • Description updated (diff)

#6 Updated by Tom Morris over 1 year ago

  • Tracker changed from Bug to Story

#7 Updated by Tom Morris over 1 year ago

  • Related to Story #11308: Support Python3 for arvados-python-client & command line utilities added

#8 Updated by Tom Morris about 1 year ago

  • Blocks Story #14532: [Epic] Port to Python 3 to prepare for Python 2 sunsetting in December 2019 added

#9 Updated by Tom Morris 11 months ago

  • Assigned To set to Eric Biagiotti
  • Target version changed from To Be Groomed to 2019-01-16 Sprint

#10 Updated by Eric Biagiotti 11 months ago

  • Status changed from New to In Progress

#11 Updated by Eric Biagiotti 11 months ago

  • Related to Bug #14698: Running Python2 and Python3 tests seperately added

#12 Updated by Tom Morris 11 months ago

  • Target version changed from 2019-01-16 Sprint to 2019-01-30 Sprint

#13 Updated by Eric Biagiotti 11 months ago

  • Target version changed from 2019-01-30 Sprint to 2019-02-13 Sprint

#14 Updated by Peter Amstutz 11 months ago

-                    for f, p in generatemapper.items():
+                    for f, p in list(generatemapper.items()):

Is this necessary? My understanding was that the difference is py2 .items() returns a copy of a list and in py3 it returns an iterator. However in this construct, wouldn't they have equivalent behavior?

-        for k,v in job_spec["script_parameters"].items():
+        for k,v in viewitems(job_spec["script_parameters"]):

What's the difference between .items(), list(.items()), and viewitems() ?

         with final.open("cwl.output.json", "w") as f:
-            json.dump(outputObj, f, sort_keys=True, indent=4, separators=(',',': '))
+            res = json.dumps(outputObj, sort_keys=True, indent=4, separators=(',',': '), ensure_ascii=False).encode('utf-8').decode()
+            f.write(res)           

Why can't json.dump() be used to write directly to f? Why do you call encode() and then decode() again? Should we be opening the file in text mode: final.open("cwl.output.json", "wt", encoding="utf-8")

There might be other places we should be opening files in text mode.

-        return uri
+        return uri.encode("utf-8").decode()

Similar comment, what's with the encode/decode spin?

#15 Updated by Eric Biagiotti 11 months ago

Peter Amstutz wrote:

[...]

Is this necessary? My understanding was that the difference is py2 .items() returns a copy of a list and in py3 it returns an iterator. However in this construct, wouldn't they have equivalent behavior?

This is actually a mistake introduced by futurize. generatemapper is cwltool PathMapper object and has an items() function like dict. items() already returns a list. I fixed this in executor.py, but missed it in arvcontainer.py and arvjob.py. Fixed in 7e1c8ea7.

#16 Updated by Eric Biagiotti 11 months ago

Peter Amstutz wrote:

[...]

Why can't json.dump() be used to write directly to f? Why do you call encode() and then decode() again? Should we be opening the file in text mode: final.open("cwl.output.json", "wt", encoding="utf-8")

There might be other places we should be opening files in text mode.

From what I understand, the problem is with json.dump, which produces a str object in python 2 (bytestring) and a str object in python 3 (unicode), resulting in incompatibility when writing to the file (expects unicode). In order to prevent using py2 and py3 specific code, the solution is to use json.dumps, which gives us an opportunuty to call encode, which will force the string to be bytes in both languages, then decode to unicode from there, before writing to the file. If this makes the code too confusing, we can alternitavely add py2/3 specific code here.

I explained this a bit in my commit message, but in the future I'll also reference the commit and put explanations like this on the issue page too.

#17 Updated by Peter Amstutz 11 months ago

Eric Biagiotti wrote:

Peter Amstutz wrote:

[...]

Why can't json.dump() be used to write directly to f? Why do you call encode() and then decode() again? Should we be opening the file in text mode: final.open("cwl.output.json", "wt", encoding="utf-8")

There might be other places we should be opening files in text mode.

From what I understand, the problem is with json.dump, which produces a str object in python 2 (bytestring) and a str object in python 3 (unicode), resulting in incompatibility when writing to the file (expects unicode). In order to prevent using py2 and py3 specific code, the solution is to use json.dumps, which gives us an opportunuty to call encode, which will force the string to be bytes in both languages, then decode to unicode from there, before writing to the file. If this makes the code too confusing, we can alternitavely add py2/3 specific code here.

If you open it in text mode, can you just do f.write(python.dumps(...)) ?

#18 Updated by Eric Biagiotti 11 months ago

Peter Amstutz wrote:

Eric Biagiotti wrote:

Peter Amstutz wrote:

[...]

Why can't json.dump() be used to write directly to f? Why do you call encode() and then decode() again? Should we be opening the file in text mode: final.open("cwl.output.json", "wt", encoding="utf-8")

There might be other places we should be opening files in text mode.

From what I understand, the problem is with json.dump, which produces a str object in python 2 (bytestring) and a str object in python 3 (unicode), resulting in incompatibility when writing to the file (expects unicode). In order to prevent using py2 and py3 specific code, the solution is to use json.dumps, which gives us an opportunuty to call encode, which will force the string to be bytes in both languages, then decode to unicode from there, before writing to the file. If this makes the code too confusing, we can alternitavely add py2/3 specific code here.

If you open it in text mode, can you just do f.write(python.dumps(...)) ?

You mean json.dumps? If so, I did try that and it doesn't work. Text mode is the default, which is why its expecting unicode and dumps has the same problem with the string it returns as dump.

#19 Updated by Eric Biagiotti 10 months ago

Peter Amstutz wrote:

[...]

What's the difference between .items(), list(.items()), and viewitems() ?

Using just items() for dict iteration in python2 is inefficient. As you previously said, it makes a copy, so the recommended way is to use viewitems. The reason you will still see items() in use in some places is that it is either an arvados collection object or a PathMapper object, which both have items() functions.

Though upon further review, I did notice some fixes that I missed. Updated in dd98e9fb.

Similar comment, what's with the encode/decode spin?

Regarding the 2 encode/decode comments, I think I found a better solution, which is to use the backwards compatible str object from builtins, which handles the differences automatically. Updated in 1e8da3c1.

Tests run: https://ci.curoverse.com/view/Developer/job/developer-run-tests/1050/

Might be worth running the conformance and integration tests again on your end. Thanks!

#20 Updated by Peter Amstutz 10 months ago

Could you add a "classifiers" section to sdk/python/setup.py and sdk/cwl/setup.py to advertise Python 3 support?

setup(name='arvados-cwl-runner',
...
      classifiers=[
          'Programming Language :: Python :: 2',
          'Programming Language :: Python :: 3',
      ]
     )

#21 Updated by Peter Amstutz 10 months ago

2019-02-01 17:46:54 arvados.cwl-runner[25772] ERROR: [bwa-mem-tool.cwl] While getting final output object: 'dict_values' object does not support indexing (X-Request-Id: req-r95c5lrjbewtsqqce5y8)
Traceback (most recent call last):
  File "/home/peter/.arvbox/arvbox/arvados/sdk/cwl/arvados_cwl/runner.py", line 460, in done
    done.logtail(logc, logger.error, "%s (%s) error log:" % (self.arvrunner.label(self), record["uuid"]), maxlen=40)
  File "/home/peter/.arvbox/arvbox/arvados/sdk/cwl/arvados_cwl/done.py", line 96, in logtail
    loglines = viewvalues(mergelogs)[0]
TypeError: 'dict_values' object does not support indexing
2019-02-01 17:46:54 arvados.cwl-runner[25772] ERROR: Overall process status is permanentFail
2019-02-01 17:46:54 cwltool[25772] WARNING: Final process status is permanentFail

I hit this bug testing on crunchv1 (--api=jobs)

#22 Updated by Peter Amstutz 10 months ago

Traceback (most recent call last):
  File "/usr/local/bin/arv-keepdocker", line 7, in <module>
    main()
  File "/usr/local/lib/python3.5/dist-packages/arvados/commands/keepdocker.py", line 384, in main
    image_hash = find_one_image_hash(args.image, args.tag)
  File "/usr/local/lib/python3.5/dist-packages/arvados/commands/keepdocker.py", line 161, in find_one_image_hash
    hashes = find_image_hashes(image_search, image_tag)
  File "/usr/local/lib/python3.5/dist-packages/arvados/commands/keepdocker.py", line 153, in find_image_hashes
    for image in docker_images():
  File "/usr/local/lib/python3.5/dist-packages/arvados/commands/keepdocker.py", line 138, in docker_images
    ctime = ' '.join(words[3:size_index])
TypeError: sequence item 0: expected str instance, bytes found

edit: disregard this one, it seems to keep installing arvados-python-client from PyPi instead of the branch where this bug is fixed.

#23 Updated by Peter Amstutz 10 months ago

I've added the ability to select between python/python3 and conformance/integration in test_with_arvbox.

New options:

--build (build the jobs image)

--pythoncmd (python|python3)

--suite (conformance|integration)

--api (jobs|containers)

$ test_with_arvbox.sh --no-reset-container --leave-running --config dev --build --pythoncmd python3 --suite integration -j3

Once the main branch is merged, we can update https://ci.curoverse.com/view/CWL/job/run-arvados-cwl-conformance-tests/ to run both sets of tests for both python and python3.

#24 Updated by Peter Amstutz 10 months ago

31766825f158fa427e7b01cdc28d2b8e0abecea7 has a quick and dirty fix for #14806

The quick and dirty commit should be removed from the branch once the real fix is merged to master.

#25 Updated by Peter Amstutz 10 months ago

Peter Amstutz wrote:

[...]

I hit this bug testing on crunchv1 (--api=jobs)

This is fixed in 9b712f30fe6784a5c8e747a8cf229c54cc02a509

#26 Updated by Eric Biagiotti 10 months ago

Peter Amstutz wrote:

I've added the ability to select between python/python3 and conformance/integration in test_with_arvbox.

New options:

--build (build the jobs image)

--pythoncmd (python|python3)

--suite (conformance|integration)

--api (jobs|containers)

[...]

Once the main branch is merged, we can update https://ci.curoverse.com/view/CWL/job/run-arvados-cwl-conformance-tests/ to run both sets of tests for both python and python3.

Thank you, this is a huge help. In anticipation of the merge with master, I updated the wiki to reflect this. https://dev.arvados.org/projects/arvados/wiki/Arvados-cwl-runner_development

#27 Updated by Eric Biagiotti 10 months ago

  • Status changed from In Progress to Resolved

#28 Updated by Tom Morris 10 months ago

  • Release set to 15

Also available in: Atom PDF