Project

General

Profile

Actions

Bug #10359

closed

[crunchstat-summary] Limit concurrency to keep memory use under control

Added by Tom Morris over 7 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
-

Description

Currently crunchstat-summary processes all components of a pipeline in parallel. This can mean hundreds of threads all competing for memory and cycles at the same time, leading to memory exhaustion in extreme cases.

We should dial this back to a reasonable number of threads for the machine and workload being processed.


Subtasks 1 (0 open1 closed)

Task #10379: Review 10359-crunchstat-summary-serialClosedTom Morris10/26/2016Actions

Related issues

Related to Arvados - Idea #11309: [Crunch2] crunchstat-summary --container UUID should summarize container logsResolvedTom Clegg08/16/2017Actions
Related to Arvados - Bug #12196: [crunchstat-summary] avoid opening too many files at once when working on a large container treeResolvedTom Clegg08/30/2017Actions
Actions #1

Updated by Tom Morris over 7 years ago

  • Assigned To set to Tom Morris
  • Target version set to 2016-11-09 sprint
Actions #2

Updated by Tom Morris over 7 years ago

  • Status changed from New to In Progress
  • Target version changed from 2016-11-09 sprint to 2016-11-23 sprint
Actions #3

Updated by Tom Morris over 7 years ago

  • Story points set to 0.5
Actions #4

Updated by Tom Morris over 7 years ago

  • Target version changed from 2016-11-23 sprint to 2016-12-14 sprint
Actions #5

Updated by Tom Morris over 7 years ago

  • Target version changed from 2016-12-14 sprint to 2017-01-04 sprint
Actions #6

Updated by Tom Morris over 7 years ago

  • Target version changed from 2017-01-04 sprint to 2017-01-18 sprint
Actions #7

Updated by Peter Amstutz over 7 years ago

$ crunchstat-summary --format html --job 962eh-8i9sb-vrfiobkau7bilws > blah.html
Traceback (most recent call last):
  File "/home/peter/work/scripts/venv/bin/crunchstat-summary", line 4, in <module>
    __import__('pkg_resources').run_script('crunchstat-summary==0.1.20170105025304', 'crunchstat-summary')
  File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 739, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1494, in run_script
    exec(code, namespace, namespace)
  File "/home/peter/work/scripts/venv/lib/python2.7/site-packages/crunchstat_summary-0.1.20170105025304-py2.7.egg/EGG-INFO/scripts/crunchstat-summary", line 15, in <module>
    for r in cmd.report():
  File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/crunchstat_summary-0.1.20170105025304-py2.7.egg/crunchstat_summary/command.py", line 65, in report
    yield self.summer.html_header()
AttributeError: 'JobSummarizer' object has no attribute 'html_header'
$ crunchstat-summary --format text --job 962eh-8i9sb-vrfiobkau7bilws > blah.html
Traceback (most recent call last):
  File "/home/peter/work/scripts/venv/bin/crunchstat-summary", line 4, in <module>
    __import__('pkg_resources').run_script('crunchstat-summary==0.1.20170105025304', 'crunchstat-summary')
  File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 739, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1494, in run_script
    exec(code, namespace, namespace)
  File "/home/peter/work/scripts/venv/lib/python2.7/site-packages/crunchstat_summary-0.1.20170105025304-py2.7.egg/EGG-INFO/scripts/crunchstat-summary", line 15, in <module>
    for r in cmd.report():
  File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/crunchstat_summary-0.1.20170105025304-py2.7.egg/crunchstat_summary/command.py", line 60, in report
    yield self.summer.text_header()
AttributeError: 'JobSummarizer' object has no attribute 'text_header'

This fits the story description so long as we define a "reasonable number of threads" as N=1. Parallel processing with a thread pool would be better, since the reason for having threads in the first place is that going through 100s of jobs serially means that (at ~5 seconds per job) it will take crunchstat-summary 10 minutes or more to analyze a large workflow.

Actions #8

Updated by Peter Amstutz over 7 years ago

An easy solution might be something like:

  1. Take the next N jobs
  2. Spin them out to N threads, wait for all of them to complete (basically the existing logic)
  3. yield N results
  4. repeat until everything is processed
Actions #9

Updated by Tom Morris over 7 years ago

Thanks for the quick review. I'll look at the job failure, but the cluster you used isn't familiar and doesn't seem to be resolvable via *.arvadosapi.com Where is it? I was mostly focused on pipeline instances, so it wouldn't surprise me if there were issues specific to jobs (although any bugs are likely to be in the other branch that this one depends on).

As for performance, reports for a pipeline with 370 jobs that runs 3 days and uses thousands of core hours take 11.5 minutes for text and 13.8 minutes for html, which is acceptable to me.

I have a branch with a capped number of threads, but decided the complexity wasn't warranted.

Actions #10

Updated by Tom Morris over 7 years ago

  • Target version changed from 2017-01-18 sprint to 2017-02-01 sprint
Actions #11

Updated by Tom Morris about 7 years ago

  • Target version changed from 2017-02-01 sprint to 2017-02-15 sprint
Actions #12

Updated by Tom Morris about 7 years ago

  • Target version changed from 2017-02-15 sprint to 2017-03-01 sprint
Actions #13

Updated by Tom Morris about 7 years ago

  • Target version changed from 2017-03-01 sprint to 2017-03-15 sprint
Actions #14

Updated by Radhika Chippada about 7 years ago

  • Target version changed from 2017-03-15 sprint to 2017-03-29 sprint
Actions #15

Updated by Tom Morris about 7 years ago

  • Target version changed from 2017-03-29 sprint to 2017-04-12 sprint
Actions #16

Updated by Tom Morris about 7 years ago

  • Target version changed from 2017-04-12 sprint to 2017-04-26 sprint
Actions #17

Updated by Tom Morris almost 7 years ago

  • Target version changed from 2017-04-26 sprint to 2017-05-10 sprint
Actions #18

Updated by Tom Morris almost 7 years ago

  • Target version changed from 2017-05-10 sprint to 2017-05-24 sprint
Actions #19

Updated by Tom Morris almost 7 years ago

  • Target version changed from 2017-05-24 sprint to 2017-06-07 sprint
Actions #20

Updated by Tom Morris almost 7 years ago

  • Target version changed from 2017-06-07 sprint to 2017-06-21 sprint
Actions #21

Updated by Tom Morris almost 7 years ago

  • Target version changed from 2017-06-21 sprint to 2017-07-05 sprint
Actions #22

Updated by Tom Morris almost 7 years ago

  • Target version changed from 2017-07-05 sprint to 2017-07-19 sprint
Actions #23

Updated by Tom Morris almost 7 years ago

  • Target version changed from 2017-07-19 sprint to 2017-08-02 sprint
Actions #24

Updated by Tom Morris over 6 years ago

  • Target version changed from 2017-08-02 sprint to 2017-08-16 sprint
Actions #25

Updated by Tom Clegg over 6 years ago

  • Subject changed from Reduce amount of parallelism in crunchstat-summary to [crunchstat-summary] Limit concurrency to keep memory use under control
  • Assigned To changed from Tom Morris to Tom Clegg
  • Target version changed from 2017-08-16 sprint to 2017-08-30 Sprint
  • Story points changed from 0.5 to 0.0
Actions #26

Updated by Tom Clegg over 6 years ago

  • Status changed from In Progress to Resolved
  • Story points deleted (0.0)
Actions

Also available in: Atom PDF