Bug #11170
closedStale squeue processes on c97qk caused by "crunch-dispatch --jobs"
Description
There were a total of 4694 of these processes, representing a significant resource leak.
# ps auxwf ... root 1339 0.0 0.0 196 32 ? Ss 2016 0:52 runsvdir -P /etc/service log: ...................................................................................................... ..................................................................................................................................................................................................... ................................................................................................ root 1357 0.0 0.0 176 32 ? Ss 2016 0:00 \_ runsv crunch-dispatch-jobs-0 root 1433 0.0 0.0 192 48 ? S 2016 0:57 | \_ svlogd -tt /etc/sv/crunch-dispatch-jobs-0/log/main root 46325 7.4 1.9 461112 138060 ? Sl Feb25 177:18 | \_ ./script/crunch-dispatch.rb --jobs root 46919 0.0 0.0 0 0 ? Z Feb25 0:00 | \_ [squeue] <defunct> root 47929 0.0 0.0 0 0 ? Z Feb25 0:00 | \_ [squeue] <defunct> root 48991 0.0 0.0 0 0 ? Z Feb25 0:00 | \_ [squeue] <defunct> root 49948 0.0 0.0 0 0 ? Z Feb25 0:00 | \_ [squeue] <defunct> root 51172 0.0 0.0 0 0 ? Z Feb25 0:00 | \_ [squeue] <defunct> root 52131 0.0 0.0 0 0 ? Z Feb25 0:00 | \_ [squeue] <defunct> root 53174 0.0 0.0 0 0 ? Z Feb25 0:00 | \_ [squeue] <defunct> ... root 3988 0.0 0.0 0 0 ? Z 18:52 0:00 | \_ [squeue] <defunct> root 5015 0.0 0.0 0 0 ? Z 18:53 0:00 | \_ [squeue] <defunct> root 6157 0.0 0.0 0 0 ? Z 18:54 0:00 | \_ [squeue] <defunct> root 7388 0.0 0.0 0 0 ? Z 18:55 0:00 | \_ [squeue] <defunct> root 8527 0.0 0.0 0 0 ? Z 18:56 0:00 | \_ [squeue] <defunct> root 9515 0.0 0.0 0 0 ? Z 18:57 0:00 | \_ [squeue] <defunct> root 10627 0.0 0.0 0 0 ? Z 18:58 0:00 | \_ [squeue] <defunct> root 11657 0.0 0.0 0 0 ? Z 18:59 0:00 | \_ [squeue] <defunct> root 12996 0.0 0.0 0 0 ? Z 19:00 0:00 | \_ [squeue] <defunct> root 14366 0.0 0.0 0 0 ? Z 19:02 0:00 | \_ [squeue] <defunct> root 14676 0.0 0.0 10468 2192 pts/0 S+ 19:02 0:00 \_ grep --color=auto squeu c97qk:~# ps auxwf |grep squeu |wc 4695 65730 450725
Updated by Ward Vandewege over 7 years ago
- Subject changed from stale squeue processes on c97qk to stale squeue processes on c97qk caused by crunch-dispatch --jobs
Updated by Tom Morris over 7 years ago
- Project changed from 40 to Arvados
- Subject changed from stale squeue processes on c97qk caused by crunch-dispatch --jobs to Stale squeue processes on c97qk caused by "crunch-dispatch --jobs"
- Description updated (diff)
- Target version set to 2017-03-29 sprint
Updated by Lucas Di Pentima over 7 years ago
- Assigned To set to Lucas Di Pentima
Updated by Lucas Di Pentima over 7 years ago
- Status changed from New to In Progress
Updated by Lucas Di Pentima over 7 years ago
Updated at branch 11170-stale-squeue-procs
- f31475d
Test run: https://ci.curoverse.com/job/developer-run-tests/195/
Used Process::detach
on both File.popen(...)
cases so that the process status get collected by a separate thread on completion.
Ref: https://ruby-doc.org/core-2.1.1/Process.html#method-c-detach
Updated by Peter Amstutz over 7 years ago
squeue_jobs
and scancel
should use the block form of IO.popen()
so that it is closed automatically. See stdout_s
Updated by Lucas Di Pentima over 7 years ago
Updates at 79e53c0
Test run: https://ci.curoverse.com/job/developer-run-tests/196/
Updated by Lucas Di Pentima over 7 years ago
New updates at 077878d
Test run: https://ci.curoverse.com/job/developer-run-tests/197/
I've updated the tests so they stub the IO class instead of File.
Updated by Peter Amstutz over 7 years ago
Can we get
p = IO.popen(['squeue', '-a', '-h', '-o', '%j']) begin l = p.readlines.map {|line| line.strip} ensure p.close end
Updated by Peter Amstutz over 7 years ago
Updated by Lucas Di Pentima over 7 years ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|commit:83203f5c739ee0b0199e76babccb60e832a0de8e.