Project

General

Profile

Actions

Idea #13306

closed

arvados-cwl-runner supports Python 3

Added by Peter Amstutz almost 6 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Eric Biagiotti
Category:
-
Target version:
Story points:
-
Release relationship:
Auto

Description

Check that all the dependencies support python3 (>= 3.4)

'cwltool==1.0.20180403145700' → yes
'schema-salad==2.6.20171201034858' → yes
'typing==3.5.3.0' → yes
'ruamel.yaml==0.13.7' → yes
'arvados-python-client>=0.1.20170526013812' → yes
'setuptools' → yes
'ciso8601 >=1.0.0, <=1.0.4' → yes


Subtasks 1 (0 open1 closed)

Task #14667: Review 13306-arvados-cwl-runner-py3-supportResolvedEric Biagiotti01/30/2019Actions

Related issues

Related to Arvados - Idea #11308: Support Python3 for arvados-python-client & command line utilitiesResolvedTom Clegg03/21/2017Actions
Related to Arvados - Bug #14698: Running Python2 and Python3 tests seperatelyResolvedActions
Blocks Arvados Epics - Idea #14532: [Epic] Port to Python 3 to for Python 2 sunset in December 2019Resolved01/01/202009/16/2020Actions
Actions #1

Updated by Peter Amstutz almost 6 years ago

  • Tracker changed from Support to Bug
Actions #2

Updated by Peter Amstutz almost 6 years ago

  • Description updated (diff)
Actions #3

Updated by Peter Amstutz almost 6 years ago

  • Description updated (diff)
Actions #4

Updated by Peter Amstutz almost 6 years ago

  • Description updated (diff)
Actions #5

Updated by Peter Amstutz almost 6 years ago

  • Description updated (diff)
Actions #6

Updated by Tom Morris almost 6 years ago

  • Tracker changed from Bug to Idea
Actions #7

Updated by Tom Morris over 5 years ago

  • Related to Idea #11308: Support Python3 for arvados-python-client & command line utilities added
Actions #8

Updated by Tom Morris over 5 years ago

  • Blocks Idea #14532: [Epic] Port to Python 3 to for Python 2 sunset in December 2019 added
Actions #9

Updated by Tom Morris about 5 years ago

  • Assigned To set to Eric Biagiotti
  • Target version changed from To Be Groomed to 2019-01-16 Sprint
Actions #10

Updated by Eric Biagiotti about 5 years ago

  • Status changed from New to In Progress
Actions #11

Updated by Eric Biagiotti about 5 years ago

  • Related to Bug #14698: Running Python2 and Python3 tests seperately added
Actions #12

Updated by Tom Morris about 5 years ago

  • Target version changed from 2019-01-16 Sprint to 2019-01-30 Sprint
Actions #13

Updated by Eric Biagiotti about 5 years ago

  • Target version changed from 2019-01-30 Sprint to 2019-02-13 Sprint
Actions #14

Updated by Peter Amstutz about 5 years ago

-                    for f, p in generatemapper.items():
+                    for f, p in list(generatemapper.items()):

Is this necessary? My understanding was that the difference is py2 .items() returns a copy of a list and in py3 it returns an iterator. However in this construct, wouldn't they have equivalent behavior?

-        for k,v in job_spec["script_parameters"].items():
+        for k,v in viewitems(job_spec["script_parameters"]):

What's the difference between .items(), list(.items()), and viewitems() ?

         with final.open("cwl.output.json", "w") as f:
-            json.dump(outputObj, f, sort_keys=True, indent=4, separators=(',',': '))
+            res = json.dumps(outputObj, sort_keys=True, indent=4, separators=(',',': '), ensure_ascii=False).encode('utf-8').decode()
+            f.write(res)           

Why can't json.dump() be used to write directly to f? Why do you call encode() and then decode() again? Should we be opening the file in text mode: final.open("cwl.output.json", "wt", encoding="utf-8")

There might be other places we should be opening files in text mode.

-        return uri
+        return uri.encode("utf-8").decode()

Similar comment, what's with the encode/decode spin?

Actions #15

Updated by Eric Biagiotti about 5 years ago

Peter Amstutz wrote:

[...]

Is this necessary? My understanding was that the difference is py2 .items() returns a copy of a list and in py3 it returns an iterator. However in this construct, wouldn't they have equivalent behavior?

This is actually a mistake introduced by futurize. generatemapper is cwltool PathMapper object and has an items() function like dict. items() already returns a list. I fixed this in executor.py, but missed it in arvcontainer.py and arvjob.py. Fixed in 7e1c8ea7.

Actions #16

Updated by Eric Biagiotti about 5 years ago

Peter Amstutz wrote:

[...]

Why can't json.dump() be used to write directly to f? Why do you call encode() and then decode() again? Should we be opening the file in text mode: final.open("cwl.output.json", "wt", encoding="utf-8")

There might be other places we should be opening files in text mode.

From what I understand, the problem is with json.dump, which produces a str object in python 2 (bytestring) and a str object in python 3 (unicode), resulting in incompatibility when writing to the file (expects unicode). In order to prevent using py2 and py3 specific code, the solution is to use json.dumps, which gives us an opportunuty to call encode, which will force the string to be bytes in both languages, then decode to unicode from there, before writing to the file. If this makes the code too confusing, we can alternitavely add py2/3 specific code here.

I explained this a bit in my commit message, but in the future I'll also reference the commit and put explanations like this on the issue page too.

Actions #17

Updated by Peter Amstutz about 5 years ago

Eric Biagiotti wrote:

Peter Amstutz wrote:

[...]

Why can't json.dump() be used to write directly to f? Why do you call encode() and then decode() again? Should we be opening the file in text mode: final.open("cwl.output.json", "wt", encoding="utf-8")

There might be other places we should be opening files in text mode.

From what I understand, the problem is with json.dump, which produces a str object in python 2 (bytestring) and a str object in python 3 (unicode), resulting in incompatibility when writing to the file (expects unicode). In order to prevent using py2 and py3 specific code, the solution is to use json.dumps, which gives us an opportunuty to call encode, which will force the string to be bytes in both languages, then decode to unicode from there, before writing to the file. If this makes the code too confusing, we can alternitavely add py2/3 specific code here.

If you open it in text mode, can you just do f.write(python.dumps(...)) ?

Actions #18

Updated by Eric Biagiotti about 5 years ago

Peter Amstutz wrote:

Eric Biagiotti wrote:

Peter Amstutz wrote:

[...]

Why can't json.dump() be used to write directly to f? Why do you call encode() and then decode() again? Should we be opening the file in text mode: final.open("cwl.output.json", "wt", encoding="utf-8")

There might be other places we should be opening files in text mode.

From what I understand, the problem is with json.dump, which produces a str object in python 2 (bytestring) and a str object in python 3 (unicode), resulting in incompatibility when writing to the file (expects unicode). In order to prevent using py2 and py3 specific code, the solution is to use json.dumps, which gives us an opportunuty to call encode, which will force the string to be bytes in both languages, then decode to unicode from there, before writing to the file. If this makes the code too confusing, we can alternitavely add py2/3 specific code here.

If you open it in text mode, can you just do f.write(python.dumps(...)) ?

You mean json.dumps? If so, I did try that and it doesn't work. Text mode is the default, which is why its expecting unicode and dumps has the same problem with the string it returns as dump.

Actions #19

Updated by Eric Biagiotti about 5 years ago

Peter Amstutz wrote:

[...]

What's the difference between .items(), list(.items()), and viewitems() ?

Using just items() for dict iteration in python2 is inefficient. As you previously said, it makes a copy, so the recommended way is to use viewitems. The reason you will still see items() in use in some places is that it is either an arvados collection object or a PathMapper object, which both have items() functions.

Though upon further review, I did notice some fixes that I missed. Updated in dd98e9fb.

Similar comment, what's with the encode/decode spin?

Regarding the 2 encode/decode comments, I think I found a better solution, which is to use the backwards compatible str object from builtins, which handles the differences automatically. Updated in 1e8da3c1.

Tests run: https://ci.curoverse.com/view/Developer/job/developer-run-tests/1050/

Might be worth running the conformance and integration tests again on your end. Thanks!

Actions #20

Updated by Peter Amstutz about 5 years ago

Could you add a "classifiers" section to sdk/python/setup.py and sdk/cwl/setup.py to advertise Python 3 support?

setup(name='arvados-cwl-runner',
...
      classifiers=[
          'Programming Language :: Python :: 2',
          'Programming Language :: Python :: 3',
      ]
     )
Actions #21

Updated by Peter Amstutz about 5 years ago

2019-02-01 17:46:54 arvados.cwl-runner[25772] ERROR: [bwa-mem-tool.cwl] While getting final output object: 'dict_values' object does not support indexing (X-Request-Id: req-r95c5lrjbewtsqqce5y8)
Traceback (most recent call last):
  File "/home/peter/.arvbox/arvbox/arvados/sdk/cwl/arvados_cwl/runner.py", line 460, in done
    done.logtail(logc, logger.error, "%s (%s) error log:" % (self.arvrunner.label(self), record["uuid"]), maxlen=40)
  File "/home/peter/.arvbox/arvbox/arvados/sdk/cwl/arvados_cwl/done.py", line 96, in logtail
    loglines = viewvalues(mergelogs)[0]
TypeError: 'dict_values' object does not support indexing
2019-02-01 17:46:54 arvados.cwl-runner[25772] ERROR: Overall process status is permanentFail
2019-02-01 17:46:54 cwltool[25772] WARNING: Final process status is permanentFail

I hit this bug testing on crunchv1 (--api=jobs)

Actions #22

Updated by Peter Amstutz about 5 years ago

Traceback (most recent call last):
  File "/usr/local/bin/arv-keepdocker", line 7, in <module>
    main()
  File "/usr/local/lib/python3.5/dist-packages/arvados/commands/keepdocker.py", line 384, in main
    image_hash = find_one_image_hash(args.image, args.tag)
  File "/usr/local/lib/python3.5/dist-packages/arvados/commands/keepdocker.py", line 161, in find_one_image_hash
    hashes = find_image_hashes(image_search, image_tag)
  File "/usr/local/lib/python3.5/dist-packages/arvados/commands/keepdocker.py", line 153, in find_image_hashes
    for image in docker_images():
  File "/usr/local/lib/python3.5/dist-packages/arvados/commands/keepdocker.py", line 138, in docker_images
    ctime = ' '.join(words[3:size_index])
TypeError: sequence item 0: expected str instance, bytes found

edit: disregard this one, it seems to keep installing arvados-python-client from PyPi instead of the branch where this bug is fixed.

Actions #23

Updated by Peter Amstutz about 5 years ago

I've added the ability to select between python/python3 and conformance/integration in test_with_arvbox.

New options:

--build (build the jobs image)

--pythoncmd (python|python3)

--suite (conformance|integration)

--api (jobs|containers)

$ test_with_arvbox.sh --no-reset-container --leave-running --config dev --build --pythoncmd python3 --suite integration -j3

Once the main branch is merged, we can update https://ci.curoverse.com/view/CWL/job/run-arvados-cwl-conformance-tests/ to run both sets of tests for both python and python3.

Actions #24

Updated by Peter Amstutz about 5 years ago

31766825f158fa427e7b01cdc28d2b8e0abecea7 has a quick and dirty fix for #14806

The quick and dirty commit should be removed from the branch once the real fix is merged to master.

Actions #25

Updated by Peter Amstutz about 5 years ago

Peter Amstutz wrote:

[...]

I hit this bug testing on crunchv1 (--api=jobs)

This is fixed in 9b712f30fe6784a5c8e747a8cf229c54cc02a509

Actions #26

Updated by Eric Biagiotti about 5 years ago

Peter Amstutz wrote:

I've added the ability to select between python/python3 and conformance/integration in test_with_arvbox.

New options:

--build (build the jobs image)

--pythoncmd (python|python3)

--suite (conformance|integration)

--api (jobs|containers)

[...]

Once the main branch is merged, we can update https://ci.curoverse.com/view/CWL/job/run-arvados-cwl-conformance-tests/ to run both sets of tests for both python and python3.

Thank you, this is a huge help. In anticipation of the merge with master, I updated the wiki to reflect this. https://dev.arvados.org/projects/arvados/wiki/Arvados-cwl-runner_development

Actions #27

Updated by Eric Biagiotti about 5 years ago

  • Status changed from In Progress to Resolved
Actions #28

Updated by Tom Morris about 5 years ago

  • Release set to 15
Actions

Also available in: Atom PDF