Project

General

Profile

Actions

Bug #11170

closed

Stale squeue processes on c97qk caused by "crunch-dispatch --jobs"

Added by Ward Vandewege about 7 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
-

Description

There were a total of 4694 of these processes, representing a significant resource leak.

# ps auxwf 

...
root      1339  0.0  0.0    196    32 ?        Ss    2016   0:52 runsvdir -P /etc/service log: ......................................................................................................
.....................................................................................................................................................................................................
................................................................................................
root      1357  0.0  0.0    176    32 ?        Ss    2016   0:00  \_ runsv crunch-dispatch-jobs-0
root      1433  0.0  0.0    192    48 ?        S     2016   0:57  |   \_ svlogd -tt /etc/sv/crunch-dispatch-jobs-0/log/main
root     46325  7.4  1.9 461112 138060 ?       Sl   Feb25 177:18  |   \_ ./script/crunch-dispatch.rb --jobs                                                                                          

root     46919  0.0  0.0      0     0 ?        Z    Feb25   0:00  |       \_ [squeue] <defunct>
root     47929  0.0  0.0      0     0 ?        Z    Feb25   0:00  |       \_ [squeue] <defunct>
root     48991  0.0  0.0      0     0 ?        Z    Feb25   0:00  |       \_ [squeue] <defunct>
root     49948  0.0  0.0      0     0 ?        Z    Feb25   0:00  |       \_ [squeue] <defunct>
root     51172  0.0  0.0      0     0 ?        Z    Feb25   0:00  |       \_ [squeue] <defunct>
root     52131  0.0  0.0      0     0 ?        Z    Feb25   0:00  |       \_ [squeue] <defunct>
root     53174  0.0  0.0      0     0 ?        Z    Feb25   0:00  |       \_ [squeue] <defunct>
...
root      3988  0.0  0.0      0     0 ?        Z    18:52   0:00  |       \_ [squeue] <defunct>
root      5015  0.0  0.0      0     0 ?        Z    18:53   0:00  |       \_ [squeue] <defunct>
root      6157  0.0  0.0      0     0 ?        Z    18:54   0:00  |       \_ [squeue] <defunct>
root      7388  0.0  0.0      0     0 ?        Z    18:55   0:00  |       \_ [squeue] <defunct>
root      8527  0.0  0.0      0     0 ?        Z    18:56   0:00  |       \_ [squeue] <defunct>
root      9515  0.0  0.0      0     0 ?        Z    18:57   0:00  |       \_ [squeue] <defunct>
root     10627  0.0  0.0      0     0 ?        Z    18:58   0:00  |       \_ [squeue] <defunct>
root     11657  0.0  0.0      0     0 ?        Z    18:59   0:00  |       \_ [squeue] <defunct>
root     12996  0.0  0.0      0     0 ?        Z    19:00   0:00  |       \_ [squeue] <defunct>
root     14366  0.0  0.0      0     0 ?        Z    19:02   0:00  |       \_ [squeue] <defunct>
root     14676  0.0  0.0  10468  2192 pts/0    S+   19:02   0:00          \_ grep --color=auto squeu
c97qk:~# ps auxwf |grep squeu |wc
   4695   65730  450725


Subtasks 1 (0 open1 closed)

Task #11269: Review 11170-stale-squeue-procsResolvedPeter Amstutz03/23/2017Actions
Actions #1

Updated by Ward Vandewege about 7 years ago

  • Description updated (diff)
Actions #2

Updated by Ward Vandewege about 7 years ago

  • Subject changed from stale squeue processes on c97qk to stale squeue processes on c97qk caused by crunch-dispatch --jobs
Actions #3

Updated by Tom Morris about 7 years ago

  • Project changed from 40 to Arvados
  • Subject changed from stale squeue processes on c97qk caused by crunch-dispatch --jobs to Stale squeue processes on c97qk caused by "crunch-dispatch --jobs"
  • Description updated (diff)
  • Target version set to 2017-03-29 sprint
Actions #4

Updated by Lucas Di Pentima about 7 years ago

  • Assigned To set to Lucas Di Pentima
Actions #5

Updated by Lucas Di Pentima about 7 years ago

  • Status changed from New to In Progress
Actions #6

Updated by Lucas Di Pentima about 7 years ago

Updated at branch 11170-stale-squeue-procs - f31475d
Test run: https://ci.curoverse.com/job/developer-run-tests/195/

Used Process::detach on both File.popen(...) cases so that the process status get collected by a separate thread on completion.
Ref: https://ruby-doc.org/core-2.1.1/Process.html#method-c-detach

Actions #7

Updated by Peter Amstutz about 7 years ago

squeue_jobs and scancel should use the block form of IO.popen() so that it is closed automatically. See stdout_s

Actions #9

Updated by Lucas Di Pentima about 7 years ago

New updates at 077878d
Test run: https://ci.curoverse.com/job/developer-run-tests/197/

I've updated the tests so they stub the IO class instead of File.

Actions #10

Updated by Peter Amstutz about 7 years ago

Can we get

      p = IO.popen(['squeue', '-a', '-h', '-o', '%j'])
      begin
        l = p.readlines.map {|line| line.strip}
      ensure
        p.close
      end
Actions #11

Updated by Lucas Di Pentima about 7 years ago

Done: 2741b54

Actions #12

Updated by Peter Amstutz about 7 years ago

Lucas Di Pentima wrote:

Done: 2741b54

LGTM

Actions #13

Updated by Lucas Di Pentima about 7 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:83203f5c739ee0b0199e76babccb60e832a0de8e.

Actions

Also available in: Atom PDF