Story #11308

Support Python3 for arvados-python-client & command line utilities

Added by Tom Morris 5 months ago. Updated 3 months ago.

Status:ResolvedStart date:03/21/2017
Priority:NormalDue date:
Assignee:Tom Clegg% Done:

100%

Category:SDKs
Target version:2017-05-10 sprint
Story pointsSRemaining (hours)0.00 hour
Velocity based estimate-

Description

Python 3 has been available for many years and the Python 2/3 migration is reaching its final stages. As a first step, Arvados Python SDK + command line needs to run on both Python 3 and Python 2.

run-tests.sh needs to run tests using both Python 2.7 and Python 3.4+

Dependencies must support Python 3. May require updating dependencies and related side effects (API/behavior changes).

Use "__future__" to adopt Python 3 behavior on Python 2.7:

from __future__ import division, absolute_import, print_function, unicode_literals

Clean up usage of strings to conform to Python 3 unicode/bytes distinction.

Existing Arvados Python packages including FUSE, Node manager, arvados-cwl-runner (+ its dependencies, e.g. cwltool), and crunchstat-summary must continue to work on Python 2.7 (they will be ported in future stories). Ideally this can be resolved without divergent behavior between Py2 and Py3, however if there is a conflict between Py2 and Py3 behavior it needs to be resolved in favor of maintaining compatibility with existing dependencies on Py2.

Packages should be published to PyPi advertising compatibility with both Py2+Py3 .

Will need to build and publish separate Py2 and Py3 deb/rpm packages (due to hard dependency on Python interpreter).


Subtasks

Task #11379: Review 11308-python3ResolvedTom Clegg

Task #11418: pass tests with python3ResolvedTom Clegg

Associated revisions

Revision a6be53f6
Added by Tom Clegg 4 months ago

Build packages for python "future" module.

refs #11349
refs #11308

Revision 9bf79049
Added by Tom Clegg 4 months ago

Merge branch '11308-python3'

refs #11308

Revision 4bc2f36b
Added by Tom Clegg 4 months ago

Merge branch '11308-collection-keys'

refs #11308

Revision 40ce7856
Added by Peter Amstutz 4 months ago

Add libgnutls28-dev and python3-dev to install package list to satisfy
run-tests.sh dependencies. refs #11308

History

#1 Updated by Peter Amstutz 5 months ago

  • Description updated (diff)

#2 Updated by Tom Morris 5 months ago

We appear to have suffered from serious scope creep here. Somehow we went from SDK & command line tools to almost all of our Python code. This ticket is only about the Python SDK and the command line utilities. (When we do all Python code, we'll want to include crunchstat-summary & arvados-docker-cleaner in addition to nodemanager & fuse driver).

pip install caniusepython3
find . -name PKG-INFO -exec caniusepython3 -m {} \; -print

claims that we're all set on the dependencies front with all packages having Python 3 compatible versions.

The Python 2->3 porting guide is here: https://docs.python.org/3/howto/pyporting.html

#3 Updated by Tom Morris 5 months ago

  • Story points set to 3.0

#4 Updated by Peter Amstutz 5 months ago

  • Description updated (diff)

#5 Updated by Tom Clegg 5 months ago

  • Category set to SDKs
  • Assignee set to Tom Clegg

#6 Updated by Tom Clegg 5 months ago

  • Status changed from New to In Progress

#7 Updated by Tom Clegg 5 months ago

virtualenv --python python3 /tmp/v3
. /tmp/v3/bin/activate

git clone -b 11308-python3 https://github.com/curoverse/arvados.git
cd arvados/sdk/python
python setup.py install

python <<EOF
import arvados
a=arvados.api()

print("# api request")
print repr(a.collections().list(count='none').execute()['items'][0])

print("# read data from keep")
c = arvados.collection.Collection('4xphq-4zz18-09pu7btnmkhaxky')
with c.open(c.keys()[0]) as f:
    f.seek(1234)
    print(f.readline()) # text
    print(f.read(1234)) # binary

print("# write data to keep")
c = arvados.collection.Collection()
with c.open('foo.txt', 'wb') as f:
    f.write(b'foo')
    f.close()
c.save_new()                  # '. acbd18db4cc2f85cedef654fccc4a4d8+3+A53d2e4dee0300ccd35c2b6b23d9a660cd6b41268@58f62cfb 0:3:foo.txt\n'
print(c.portable_data_hash()) # 83367e8913dcec0bf3fc25ed5a27eacb+49
print(c.manifest_locator())   # 4xphq-4zz18-nzwz3de4at90zaz

EOF
Todo:
  • Currently all files are opened in binary mode (read() returns bytes and write() expects bytes) even without 'b' in the mode argument to open(). We should implement text mode -- or at least force Python 3 callers to specify 'b' in mode, otherwise people will write Python 3 code that works now but breaks when we implement text mode.
  • Build deb/rpm packages
  • Improve run-tests.sh output (currently the py2sdk and py3sdk test suites are titled just "sdk/python" so it's hard to tell which version of python you're looking at).
  • Update Hacking wiki with new run-tests options ("--skip python2")

#8 Updated by Tom Clegg 5 months ago

11308-python3 @ 4a8883858028d4068b52710acd9c349108b6fc7c
  • raises an exception if files are opened in text mode using Python 3.

For the sake of minimizing merge conflicts, I think it makes sense to get this branch ("arvados module can be built, installed with setuptools, and tested in Python 3") reviewed/merged, then tackle deb/rpm packaging in a separate branch.

#9 Updated by Lucas Di Pentima 4 months ago

After lots of reading, this is my compiled list of things that may need a second look:

  • sdk/python/arvados/cache.py:42 - Should that be except (IOError, OSError): ?
  • sdk/python/tests/test_events.py:38 - Double parenthesis
  • sdk/python/arvados/arvfile.py
    • L593: list(some_dict.values()) is said to be inefficient on Py2 (http://python-future.org/compatible_idioms.html), it seems that an efficient way to do it is:
      from future.utils import listvalues
      listvalues(some_dict)
      
    • L706: Same as L593, but with listitems()
  • sdk/python/arvados/collection.py
    • Lines 430, 684, 727, 732, 737, 770, 1128: Py2 inefficient use of list(dict.values()) or similar
    • L695: Unnecessary conversion: iter(list(iter))
  • sdk/python/arvados/stream.py
    • Lines 66, 101: Py2 inefficient use of list()
  • sdk/python/arvados/commands/arv_copy.py
    • L158: Unnecessary use of list()
  • sdk/python/arvados/commands/put.py
    • Lines 511, 703, 780, 984: Unnecessary use of list() for iterations
  • sdk/python/tests/run_test_server.py:740 - Unnecessary use of list()
  • sdk/python/tests/test_cache.py Lines 45-47 commented code lines.

#10 Updated by Tom Clegg 4 months ago

Changed most of the list(...) stuff to viewkeys(), listitems(), etc.

  • sdk/python/tests/run_test_server.py:740 - Unnecessary use of list()

Not unnecessary in this case -- without it, we get "dictionary changed size during iteration".

  • sdk/python/tests/test_cache.py Lines 45-47 commented code lines.

Whoops, that was some half-done debugging and was preventing the test from failing on exceptions! Fixed.

11308-python3 @ f6b16b5ae75469167dbea08d69692d5f53dd9c84

#11 Updated by Lucas Di Pentima 4 months ago

  • sdk/python/tests/run_test_server.py:3 - viewkeys is not used, is the import needed?
  • Why the use of viewkeys()? Supposedly on py2 and py3 we can iterate over dict keys using
    for key in a_dict:
        ....
    

Other than that, lgtm.

#12 Updated by Tom Clegg 4 months ago

  • Target version changed from 2017-04-12 sprint to 2017-04-26 sprint

#13 Updated by Tom Clegg 4 months ago

  • Story points changed from 3.0 to 0.5

#15 Updated by Tom Clegg 4 months ago

11308-python3 @ 3a0d849c08f750dca1d6a40153c0107001769c6d https://ci.curoverse.com/job/developer-run-tests/244/

This fixes a bug revealed by merging the recent truncate() tests: in append modes, the file pointer was not updated after a write.

#16 Updated by Lucas Di Pentima 4 months ago

Tom: The write in append mode update LGTM.

#17 Updated by Tom Morris 4 months ago

  • Target version changed from 2017-04-26 sprint to 2017-05-10 sprint
  • Story points changed from 0.5 to 0.0

#19 Updated by Peter Amstutz 4 months ago

API incompatibility:

Collection.keys(), Collection.values() and Collection.items() return lists in Python 2.7. However in Python 3 they return dict_keys, dict_values and dict_items. Unfortunately there is code that uses Collection.keys()[0] and Collection.value()[0] as a way to get the item of a single-entry dict, and this is doesn't work in Python 3 because the dict_* types don't support indexing.

The Collection methods should use an explicit cast to list().

#20 Updated by Tom Clegg 4 months ago

#21 Updated by Peter Amstutz 4 months ago

11308-collection-keys LGTM, tested with arvados-cwl-runner

#22 Updated by Tom Clegg 3 months ago

  • Status changed from In Progress to Resolved

Packaging moved to #11657

Also available in: Atom PDF