Project

General

Profile

Actions

Idea #14939

closed

Update crunchstat-summary to Python 3

Added by Tom Morris about 5 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Tom Morris
Category:
-
Target version:
Story points:
1.0
Release relationship:
Auto

Subtasks 2 (0 open2 closed)

Task #14940: Review 14939-crunchstat-summary-python3 -> 14939-crunchstat-summary-python3ResolvedTom Morris03/08/2019Actions
Task #15127: Review 14939-crunchstat-summary-python3-fixupClosedTom Morris04/22/2019Actions

Related issues

Blocks Arvados Epics - Idea #14532: [Epic] Port to Python 3 to for Python 2 sunset in December 2019Resolved01/01/202009/16/2020Actions
Actions #1

Updated by Tom Morris about 5 years ago

  • Status changed from New to In Progress
  • Start date set to 03/08/2019
Actions #2

Updated by Tom Morris about 5 years ago

  • Parent task deleted (#14532)
Actions #3

Updated by Tom Morris about 5 years ago

  • Tracker changed from Task to Idea
Actions #4

Updated by Tom Morris about 5 years ago

  • Blocks Idea #14532: [Epic] Port to Python 3 to for Python 2 sunset in December 2019 added
Actions #5

Updated by Tom Clegg about 5 years ago

LiveLogReader threads are crashing in the test suite, which seems to indicate two broken things:
  • Crashing threads don't cause the tests to fail, because it's Python and we don't explicitly catch exceptions in the thread and check/propagate them after calling join()... I guess we need to do that.
  • LiveLogReader.next() is still needed by Python 2

Should update run-tests.sh to test crunchstat-summary with Python 3 as well (add "tools/crunchstat-summary:py3" to pythonstuff like "sdk/cwl:py3"). It looks like there are some bytes-vs.-str things to sort out.

Actions #6

Updated by Tom Morris about 5 years ago

  • Target version changed from Arvados Future Sprints to 2019-03-13 Sprint
Actions #7

Updated by Tom Morris about 5 years ago

  • Target version changed from 2019-03-13 Sprint to 2019-03-27 Sprint
Actions #8

Updated by Tom Morris about 5 years ago

Tom Clegg wrote:

LiveLogReader threads are crashing in the test suite, which seems to indicate two broken things:
  • Crashing threads don't cause the tests to fail, because it's Python and we don't explicitly catch exceptions in the thread and check/propagate them after calling join()... I guess we need to do that.
  • LiveLogReader.next() is still needed by Python 2

Fixed the iterator and other thread crashers, but not the general thread exception catching.

Should update run-tests.sh to test crunchstat-summary with Python 3 as well (add "tools/crunchstat-summary:py3" to pythonstuff like "sdk/cwl:py3"). It looks like there are some bytes-vs.-str things to sort out.

Done.

Also fixed Python 3 log file reading, test failure caused by integer division diffs, and a couple of other things.

All tests pass locally in both Python 2 & 3 and CI job running here: https://ci.curoverse.com/view/Developer/job/developer-run-tests/1122/

Actions #9

Updated by Tom Clegg about 5 years ago

Could tests/test_examples.py use crunchstat_summary.command.GzipText instead of having its own copy of it?

Nit: usually kwargs are foo=bar but there are a couple of foo = bar here. meh

The rest LGTM, thanks.

Actions #10

Updated by Tom Morris about 5 years ago

I force pushed a new branch 14939-crunchstat-summary-python3 at 6cf111041f450c5d33665615ac915869c4279f9f which addresses your two comments as well as fixes:
  • resource leak warned by Python 3 (only)
  • deprecation warnings in Python 3

I also eliminated some unnecessary u'' additions and squashed some of the minor commits together and gave the UTF8 decoding wrapper class a better name.

If you could take a quick look to make sure it looks better, not worse, that'd be great.

Actions #11

Updated by Tom Morris about 5 years ago

I've put up another branch based on this one with a number of small fixes for #12026, #10570, & #14451 which all have Eric as the reviewer. Please coordinate with him to see if it makes more sense for one person to review all the changes as a batch.

Actions #12

Updated by Eric Biagiotti about 5 years ago

Calling .items() to iterate on a dict is inefficient in Python 2 according to the following: http://python-future.org/compatible_idioms.html#iterating-through-dict-keys-values-items. It recommends using viewitems(), though this will require a future.utils import. When working on the ACR, I ran into packaging problems after adding the future import. If you add it, its probably worth running the package building and tests locally.

Also a nit about an extra line of whitespace here: https://dev.arvados.org/projects/arvados/repository/revisions/14939-crunchstat-summary-python3/entry/tools/crunchstat-summary/tests/test_examples.py#L63

Looks like this change from cpu to cp was unintentional: https://dev.arvados.org/projects/arvados/repository/revisions/14939-crunchstat-summary-python3/entry/tools/crunchstat-summary/crunchstat_summary/webchart.py#L48

Also, do we use crunchstat-summary output to run cwl integration/conformance tests at all? If so, its worth running those as well.

Actions #13

Updated by Tom Morris about 5 years ago

Eric Biagiotti wrote:

Calling .items() to iterate on a dict is inefficient in Python 2 according to the following: http://python-future.org/compatible_idioms.html#iterating-through-dict-keys-values-items. It recommends using viewitems(), though this will require a future.utils import. When working on the ACR, I ran into packaging problems after adding the future import. If you add it, its probably worth running the package building and tests locally.

I think I'm willing to live with a few months of Python 2 only (unquantified) inefficiency in a non-performance critical section of code.

Also a nit about an extra line of whitespace here: https://dev.arvados.org/projects/arvados/repository/revisions/14939-crunchstat-summary-python3/entry/tools/crunchstat-summary/tests/test_examples.py#L63

Fixed.

Looks like this change from cpu to cp was unintentional: https://dev.arvados.org/projects/arvados/repository/revisions/14939-crunchstat-summary-python3/entry/tools/crunchstat-summary/crunchstat_summary/webchart.py#L48

Oops! Good catch. Fixed.

Also, do we use crunchstat-summary output to run cwl integration/conformance tests at all? If so, its worth running those as well.

No. I'm pretty sure there are no interactions with other components.

Actions #14

Updated by Tom Morris about 5 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100
Actions #15

Updated by Tom Morris about 5 years ago

  • Status changed from Resolved to In Progress
  • Target version changed from 2019-03-27 Sprint to 2019-04-10 Sprint
Actions #16

Updated by Tom Morris almost 5 years ago

  • Target version changed from 2019-04-10 Sprint to 2019-04-24 Sprint
Actions #17

Updated by Tom Morris almost 5 years ago

  • Story points set to 1.0

Tom noticed that I failed to include this in the Python 3 tests (each component needs to be added individually). Adding it caused it to fail in CI because there was a latent bug in the code that was dependent on dictionary iteration order. It worked fine on Python 2.7 & 3.7, but failed on Python 3.5.

CI build at https://ci.curoverse.com/view/Developer/job/developer-run-tests/1201/
(successful except for failing fuse tests)

Actions #18

Updated by Eric Biagiotti almost 5 years ago

I'm trying to test locally (run crunchstat-summary with the test data) in a 3.5.7 virtualenv, and the arvados-python-client fails installation because of the ciso8601 dependency. Is there a specific way I should be testing this?

Actions #19

Updated by Tom Morris almost 5 years ago

I don't think I did anything special to set up my conda environment, but it doesn't have any specific arvados-api-client dependency, so you can use whatever relatively recent version that works for you.

Here's what I'm testing with:

python                    3.5.5                h5001a0f_2    conda-forge
arvados-python-client     1.3.1.20190308210512           <pip>
ciso8601                  2.1.1                     <pip>

I can't tell what versions are in use on Jenkins, but it runs fine there too, so perhaps replicating that environment is the best way to go.

Actions #20

Updated by Eric Biagiotti almost 5 years ago

If our code is relying on a dict being ordered, we should probably just use an OrderedDict.

From cf8a475732f7a62108704b6d7803400681a69eba, changing line 187 in summarizer.py to the following got tests passing in 2.7, 3.5, and 3.7

stats = collections.OrderedDict()

Since python 3.7, dicts are officially ordered, but it is still recommended to use OrderedDict if we are supporting multiple python versions.

Actions #21

Updated by Eric Biagiotti almost 5 years ago

As discussed in chat, there is no real need to require the dict to be ordered in this case, and the changes in 5254ba765ef759bb59584d7304ec22b77238772c remove the order requirement. Subsequently, this LGTM. Thanks!

Actions #22

Updated by Tom Morris almost 5 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 50 to 100
Actions #23

Updated by Tom Morris almost 5 years ago

  • Release set to 15
Actions

Also available in: Atom PDF