Project

General

Profile

Actions

Idea #11308

closed

Support Python3 for arvados-python-client & command line utilities

Added by Tom Morris about 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
SDKs
Target version:
Story points:
0.0

Description

Python 3 has been available for many years and the Python 2/3 migration is reaching its final stages. As a first step, Arvados Python SDK + command line needs to run on both Python 3 and Python 2.

run-tests.sh needs to run tests using both Python 2.7 and Python 3.4+

Dependencies must support Python 3. May require updating dependencies and related side effects (API/behavior changes).

Use "__future__" to adopt Python 3 behavior on Python 2.7:

from __future__ import division, absolute_import, print_function, unicode_literals

Clean up usage of strings to conform to Python 3 unicode/bytes distinction.

Existing Arvados Python packages including FUSE, Node manager, arvados-cwl-runner (+ its dependencies, e.g. cwltool), and crunchstat-summary must continue to work on Python 2.7 (they will be ported in future stories). Ideally this can be resolved without divergent behavior between Py2 and Py3, however if there is a conflict between Py2 and Py3 behavior it needs to be resolved in favor of maintaining compatibility with existing dependencies on Py2.

Packages should be published to PyPi advertising compatibility with both Py2+Py3 .

Will need to build and publish separate Py2 and Py3 deb/rpm packages (due to hard dependency on Python interpreter).


Subtasks 2 (0 open2 closed)

Task #11379: Review 11308-python3ResolvedTom Clegg03/21/2017Actions
Task #11418: pass tests with python3ResolvedTom Clegg03/21/2017Actions

Related issues

Related to Arvados - Idea #13306: arvados-cwl-runner supports Python 3ResolvedEric Biagiotti01/30/2019Actions
Related to Arvados - Idea #11419: [SDKs] support text-mode open() in Python 3ResolvedTom Clegg03/21/2017Actions
Related to Arvados - Bug #14326: Our custom-compiled `python-future` and `python3-future` packages can't be installed together and have precedenceResolvedWard VandewegeActions
Actions #1

Updated by Peter Amstutz about 7 years ago

  • Description updated (diff)
Actions #2

Updated by Tom Morris about 7 years ago

We appear to have suffered from serious scope creep here. Somehow we went from SDK & command line tools to almost all of our Python code. This ticket is only about the Python SDK and the command line utilities. (When we do all Python code, we'll want to include crunchstat-summary & arvados-docker-cleaner in addition to nodemanager & fuse driver).

pip install caniusepython3
find . -name PKG-INFO -exec caniusepython3 -m {} \; -print

claims that we're all set on the dependencies front with all packages having Python 3 compatible versions.

The Python 2->3 porting guide is here: https://docs.python.org/3/howto/pyporting.html

Actions #3

Updated by Tom Morris about 7 years ago

  • Story points set to 3.0
Actions #4

Updated by Peter Amstutz about 7 years ago

  • Description updated (diff)
Actions #5

Updated by Tom Clegg about 7 years ago

  • Category set to SDKs
  • Assigned To set to Tom Clegg
Actions #6

Updated by Tom Clegg about 7 years ago

  • Status changed from New to In Progress
Actions #7

Updated by Tom Clegg about 7 years ago

virtualenv --python python3 /tmp/v3
. /tmp/v3/bin/activate

git clone -b 11308-python3 https://github.com/curoverse/arvados.git
cd arvados/sdk/python
python setup.py install

python <<EOF
import arvados
a=arvados.api()

print("# api request")
print repr(a.collections().list(count='none').execute()['items'][0])

print("# read data from keep")
c = arvados.collection.Collection('4xphq-4zz18-09pu7btnmkhaxky')
with c.open(c.keys()[0]) as f:
    f.seek(1234)
    print(f.readline()) # text
    print(f.read(1234)) # binary

print("# write data to keep")
c = arvados.collection.Collection()
with c.open('foo.txt', 'wb') as f:
    f.write(b'foo')
    f.close()
c.save_new()                  # '. acbd18db4cc2f85cedef654fccc4a4d8+3+A53d2e4dee0300ccd35c2b6b23d9a660cd6b41268@58f62cfb 0:3:foo.txt\n'
print(c.portable_data_hash()) # 83367e8913dcec0bf3fc25ed5a27eacb+49
print(c.manifest_locator())   # 4xphq-4zz18-nzwz3de4at90zaz

EOF
Todo:
  • Currently all files are opened in binary mode (read() returns bytes and write() expects bytes) even without 'b' in the mode argument to open(). We should implement text mode -- or at least force Python 3 callers to specify 'b' in mode, otherwise people will write Python 3 code that works now but breaks when we implement text mode.
  • Build deb/rpm packages
  • Improve run-tests.sh output (currently the py2sdk and py3sdk test suites are titled just "sdk/python" so it's hard to tell which version of python you're looking at).
  • Update Hacking wiki with new run-tests options ("--skip python2")
Actions #8

Updated by Tom Clegg about 7 years ago

11308-python3 @ 4a8883858028d4068b52710acd9c349108b6fc7c
  • raises an exception if files are opened in text mode using Python 3.

For the sake of minimizing merge conflicts, I think it makes sense to get this branch ("arvados module can be built, installed with setuptools, and tested in Python 3") reviewed/merged, then tackle deb/rpm packaging in a separate branch.

Actions #9

Updated by Lucas Di Pentima about 7 years ago

After lots of reading, this is my compiled list of things that may need a second look:

  • sdk/python/arvados/cache.py:42 - Should that be except (IOError, OSError): ?
  • sdk/python/tests/test_events.py:38 - Double parenthesis
  • sdk/python/arvados/arvfile.py
    • L593: list(some_dict.values()) is said to be inefficient on Py2 (http://python-future.org/compatible_idioms.html), it seems that an efficient way to do it is:
      from future.utils import listvalues
      listvalues(some_dict)
      
    • L706: Same as L593, but with listitems()
  • sdk/python/arvados/collection.py
    • Lines 430, 684, 727, 732, 737, 770, 1128: Py2 inefficient use of list(dict.values()) or similar
    • L695: Unnecessary conversion: iter(list(iter))
  • sdk/python/arvados/stream.py
    • Lines 66, 101: Py2 inefficient use of list()
  • sdk/python/arvados/commands/arv_copy.py
    • L158: Unnecessary use of list()
  • sdk/python/arvados/commands/put.py
    • Lines 511, 703, 780, 984: Unnecessary use of list() for iterations
  • sdk/python/tests/run_test_server.py:740 - Unnecessary use of list()
  • sdk/python/tests/test_cache.py Lines 45-47 commented code lines.
Actions #10

Updated by Tom Clegg about 7 years ago

Changed most of the list(...) stuff to viewkeys(), listitems(), etc.

  • sdk/python/tests/run_test_server.py:740 - Unnecessary use of list()

Not unnecessary in this case -- without it, we get "dictionary changed size during iteration".

  • sdk/python/tests/test_cache.py Lines 45-47 commented code lines.

Whoops, that was some half-done debugging and was preventing the test from failing on exceptions! Fixed.

11308-python3 @ f6b16b5ae75469167dbea08d69692d5f53dd9c84

Actions #11

Updated by Lucas Di Pentima about 7 years ago

  • sdk/python/tests/run_test_server.py:3 - viewkeys is not used, is the import needed?
  • Why the use of viewkeys()? Supposedly on py2 and py3 we can iterate over dict keys using
    for key in a_dict:
        ....
    

Other than that, lgtm.

Actions #12

Updated by Tom Clegg about 7 years ago

  • Target version changed from 2017-04-12 sprint to 2017-04-26 sprint
Actions #13

Updated by Tom Clegg about 7 years ago

  • Story points changed from 3.0 to 0.5
Actions #15

Updated by Tom Clegg almost 7 years ago

11308-python3 @ 3a0d849c08f750dca1d6a40153c0107001769c6d https://ci.curoverse.com/job/developer-run-tests/244/

This fixes a bug revealed by merging the recent truncate() tests: in append modes, the file pointer was not updated after a write.

Actions #16

Updated by Lucas Di Pentima almost 7 years ago

Tom: The write in append mode update LGTM.

Actions #17

Updated by Tom Morris almost 7 years ago

  • Target version changed from 2017-04-26 sprint to 2017-05-10 sprint
  • Story points changed from 0.5 to 0.0
Actions #19

Updated by Peter Amstutz almost 7 years ago

API incompatibility:

Collection.keys(), Collection.values() and Collection.items() return lists in Python 2.7. However in Python 3 they return dict_keys, dict_values and dict_items. Unfortunately there is code that uses Collection.keys()[0] and Collection.value()[0] as a way to get the item of a single-entry dict, and this is doesn't work in Python 3 because the dict_* types don't support indexing.

The Collection methods should use an explicit cast to list().

Actions #20

Updated by Tom Clegg almost 7 years ago

Actions #21

Updated by Peter Amstutz almost 7 years ago

11308-collection-keys LGTM, tested with arvados-cwl-runner

Actions #22

Updated by Tom Clegg almost 7 years ago

  • Status changed from In Progress to Resolved

Packaging moved to #11657

Actions #23

Updated by Tom Morris over 5 years ago

  • Related to Idea #13306: arvados-cwl-runner supports Python 3 added
Actions #25

Updated by Tom Morris over 5 years ago

  • Related to Idea #11419: [SDKs] support text-mode open() in Python 3 added
Actions #26

Updated by Nico César over 5 years ago

  • Related to Bug #14326: Our custom-compiled `python-future` and `python3-future` packages can't be installed together and have precedence added
Actions

Also available in: Atom PDF