Story #11308

Support Python3 for arvados-python-client & command line utilities

Added by Tom Morris about 1 month ago. Updated 2 days ago.

Status:In ProgressStart date:03/21/2017
Priority:NormalDue date:
Assignee:Tom Clegg% Done:


Target version:2017-05-10 sprint
Story pointsSRemaining (hours)0.00 hour
Velocity based estimate-


Python 3 has been available for many years and the Python 2/3 migration is reaching its final stages. As a first step, Arvados Python SDK + command line needs to run on both Python 3 and Python 2. needs to run tests using both Python 2.7 and Python 3.4+

Dependencies must support Python 3. May require updating dependencies and related side effects (API/behavior changes).

Use "__future__" to adopt Python 3 behavior on Python 2.7:

from __future__ import division, absolute_import, print_function, unicode_literals

Clean up usage of strings to conform to Python 3 unicode/bytes distinction.

Existing Arvados Python packages including FUSE, Node manager, arvados-cwl-runner (+ its dependencies, e.g. cwltool), and crunchstat-summary must continue to work on Python 2.7 (they will be ported in future stories). Ideally this can be resolved without divergent behavior between Py2 and Py3, however if there is a conflict between Py2 and Py3 behavior it needs to be resolved in favor of maintaining compatibility with existing dependencies on Py2.

Packages should be published to PyPi advertising compatibility with both Py2+Py3 .

Will need to build and publish separate Py2 and Py3 deb/rpm packages (due to hard dependency on Python interpreter).


Task #11419: support text-mode open() in Python 3NewTom Clegg

Task #11418: pass tests with python3ResolvedTom Clegg

Task #11379: Review 11308-python3In ProgressTom Clegg

Associated revisions

Revision a6be53f6
Added by Tom Clegg 17 days ago

Build packages for python "future" module.

refs #11349
refs #11308


#1 Updated by Peter Amstutz about 1 month ago

  • Description updated (diff)

#2 Updated by Tom Morris about 1 month ago

We appear to have suffered from serious scope creep here. Somehow we went from SDK & command line tools to almost all of our Python code. This ticket is only about the Python SDK and the command line utilities. (When we do all Python code, we'll want to include crunchstat-summary & arvados-docker-cleaner in addition to nodemanager & fuse driver).

pip install caniusepython3
find . -name PKG-INFO -exec caniusepython3 -m {} \; -print

claims that we're all set on the dependencies front with all packages having Python 3 compatible versions.

The Python 2->3 porting guide is here:

#3 Updated by Tom Morris about 1 month ago

  • Story points set to 3.0

#4 Updated by Peter Amstutz about 1 month ago

  • Description updated (diff)

#5 Updated by Tom Clegg about 1 month ago

  • Category set to SDKs
  • Assignee set to Tom Clegg

#6 Updated by Tom Clegg 29 days ago

  • Status changed from New to In Progress

#7 Updated by Tom Clegg 24 days ago

virtualenv --python python3 /tmp/v3
. /tmp/v3/bin/activate

git clone -b 11308-python3
cd arvados/sdk/python
python install

python <<EOF
import arvados

print("# api request")
print repr(a.collections().list(count='none').execute()['items'][0])

print("# read data from keep")
c = arvados.collection.Collection('4xphq-4zz18-09pu7btnmkhaxky')
with[0]) as f:
    print(f.readline()) # text
    print( # binary

print("# write data to keep")
c = arvados.collection.Collection()
with'foo.txt', 'wb') as f:
c.save_new()                  # '. acbd18db4cc2f85cedef654fccc4a4d8+3+A53d2e4dee0300ccd35c2b6b23d9a660cd6b41268@58f62cfb 0:3:foo.txt\n'
print(c.portable_data_hash()) # 83367e8913dcec0bf3fc25ed5a27eacb+49
print(c.manifest_locator())   # 4xphq-4zz18-nzwz3de4at90zaz

  • Currently all files are opened in binary mode (read() returns bytes and write() expects bytes) even without 'b' in the mode argument to open(). We should implement text mode -- or at least force Python 3 callers to specify 'b' in mode, otherwise people will write Python 3 code that works now but breaks when we implement text mode.
  • Build deb/rpm packages
  • Improve output (currently the py2sdk and py3sdk test suites are titled just "sdk/python" so it's hard to tell which version of python you're looking at).
  • Update Hacking wiki with new run-tests options ("--skip python2")

#8 Updated by Tom Clegg 23 days ago

11308-python3 @ 4a8883858028d4068b52710acd9c349108b6fc7c
  • raises an exception if files are opened in text mode using Python 3.

For the sake of minimizing merge conflicts, I think it makes sense to get this branch ("arvados module can be built, installed with setuptools, and tested in Python 3") reviewed/merged, then tackle deb/rpm packaging in a separate branch.

#9 Updated by Lucas Di Pentima 18 days ago

After lots of reading, this is my compiled list of things that may need a second look:

  • sdk/python/arvados/ - Should that be except (IOError, OSError): ?
  • sdk/python/tests/ - Double parenthesis
  • sdk/python/arvados/
    • L593: list(some_dict.values()) is said to be inefficient on Py2 (, it seems that an efficient way to do it is:
      from future.utils import listvalues
    • L706: Same as L593, but with listitems()
  • sdk/python/arvados/
    • Lines 430, 684, 727, 732, 737, 770, 1128: Py2 inefficient use of list(dict.values()) or similar
    • L695: Unnecessary conversion: iter(list(iter))
  • sdk/python/arvados/
    • Lines 66, 101: Py2 inefficient use of list()
  • sdk/python/arvados/commands/
    • L158: Unnecessary use of list()
  • sdk/python/arvados/commands/
    • Lines 511, 703, 780, 984: Unnecessary use of list() for iterations
  • sdk/python/tests/ - Unnecessary use of list()
  • sdk/python/tests/ Lines 45-47 commented code lines.

#10 Updated by Tom Clegg 18 days ago

Changed most of the list(...) stuff to viewkeys(), listitems(), etc.

  • sdk/python/tests/ - Unnecessary use of list()

Not unnecessary in this case -- without it, we get "dictionary changed size during iteration".

  • sdk/python/tests/ Lines 45-47 commented code lines.

Whoops, that was some half-done debugging and was preventing the test from failing on exceptions! Fixed.

11308-python3 @ f6b16b5ae75469167dbea08d69692d5f53dd9c84

#11 Updated by Lucas Di Pentima 17 days ago

  • sdk/python/tests/ - viewkeys is not used, is the import needed?
  • Why the use of viewkeys()? Supposedly on py2 and py3 we can iterate over dict keys using
    for key in a_dict:

Other than that, lgtm.

#12 Updated by Tom Clegg 16 days ago

  • Target version changed from 2017-04-12 sprint to 2017-04-26 sprint

#13 Updated by Tom Clegg 16 days ago

  • Story points changed from 3.0 to 0.5

#15 Updated by Tom Clegg 4 days ago

11308-python3 @ 3a0d849c08f750dca1d6a40153c0107001769c6d

This fixes a bug revealed by merging the recent truncate() tests: in append modes, the file pointer was not updated after a write.

#16 Updated by Lucas Di Pentima 4 days ago

Tom: The write in append mode update LGTM.

#17 Updated by Tom Morris 2 days ago

  • Target version changed from 2017-04-26 sprint to 2017-05-10 sprint
  • Story points changed from 0.5 to 0.0

Also available in: Atom PDF