Bug #4185

[Crunch] crunchstat memory reports seem suspect for multithreaded programs

Added by Brett Smith almost 5 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
Start date:
10/14/2014
Due date:
% Done:

100%

Estimated time:
(Total: 1.00 h)
Story points:
0.5

Description

qr1hi-8i9sb-zs7g4zo303dhdnu tries to run the GATK HaplotypeCaller with multiple threads and Java's maximum heap size set to 4g. The job ends when GATK aborts, claiming that not enough RAM is available to run the analysis.

qr1hi-8i9sb-l7vv9qqxozezv38 successfully ran the same analysis with a maximum heap size of 15g. However, the rss lines from crunchstat report that the maximum RAM used was about .5GiB (grep -F 'memory.stat rss' qr1hi-8i9sb-l7vv9qqxozezv38.log.txt | python3 -c 'import sys; print(max(int(line.split()[-1]) for line in sys.stdin))').

This reporting seems inconsistent with the results from the first job. I haven't dug deeply, but I'm guessing this has something to do with the way cgroups report memory use in the context of multithreaded programs. If possible, it would be good to find and use a number that more accurately reflects the job's "real" memory use.


Subtasks

Task #4372: Find out whether current stats make sense to usersResolvedTom Clegg


Related issues

Related to Arvados - Task #4086: [Crunch] Not enough memory to run GATKResolved10/03/2014

Related to Arvados - Story #3826: [Crunch] Display network activity in crunchstatResolved10/10/2014

Associated revisions

Revision 5870334b (diff)
Added by Tom Clegg almost 5 years ago

Report "swap" memory statistic if kernel provides it. refs #4185

History

#1 Updated by Ward Vandewege almost 5 years ago

  • Target version changed from Bug Triage to Arvados Future Sprints

#2 Updated by Tom Clegg almost 5 years ago

This could be a clue, in https://www.kernel.org/doc/Documentation/cgroups/memory.txt

Only anonymous and swap cache memory is listed as part of 'rss' stat. This should not be confused with the true 'resident set size' or the amount of physical memory used by the cgroup.

While crunchstat and cgroups are not above suspicion, I think it's possible that the "rss" figure is correct. Java allocates a large heap (as requested with -Xmx) but this is "virt" size, not resident size. AIUI it's possible for a Java program to allocate large amounts of memory from the heap without writing to much of it, and in this case Java could reach its heap limit well before Linux devotes a comparable amount of resident RAM to the process.

#3 Updated by Tom Clegg almost 5 years ago

Watching a java program on compute0.q (qr1hi-8i9sb-brduspenge0fw63)

Wed Oct 22 15:17:18 2014 qr1hi-8i9sb-brduspenge0fw63 9565 1 stderr crunchstat: memory.stat rss 289296384

However:

cat /sys/fs/cgroup/memory/docker/68f7b9b13e4ac6c1644809d0733931f157886aa8b7a6e772461d8d2abe58182f/memory.stat
cache 3121152
rss 9482752000
rss_huge 0
mapped_file 417792
writeback 4096
swap 1263448064
pgpgin 8254441
pgpgout 5938554
pgfault 5931378
pgmajfault 488211
inactive_anon 1126043648
active_anon 8356839424
inactive_file 1638400
active_file 1482752
unevictable 0
hierarchical_memory_limit 18446744073709551615
hierarchical_memsw_limit 18446744073709551615
total_cache 3121152
total_rss 9482752000
total_rss_huge 0
total_mapped_file 417792
total_writeback 4096
total_swap 1263448064
total_pgpgin 8254441
total_pgpgout 5938554
total_pgfault 5931378
total_pgmajfault 488211
total_inactive_anon 1126043648
total_active_anon 8356839424
total_inactive_file 1638400
total_active_file 1482752
total_unevictable 0

#4 Updated by Tom Clegg almost 5 years ago

OTOH maybe I was looking at the wrong task above, because now top says

 3016 1000      20   0 12.9g 387m 1884 S   7.7  2.6 216:25.50 java              
 3064 1000      20   0 12.9g 534m 3712 S   6.0  3.5  47:12.57 java              
 3192 1000      20   0 13.0g 1.1g 2152 S   7.3  7.4  60:46.94 java              
 3194 1000      20   0 12.9g 2.1g 1996 S   6.3 14.3  90:26.65 java              
 3193 1000      20   0 12.9g 2.1g 2268 S   7.3 14.1  84:10.51 java              
 3191 1000      20   0 12.9g 2.4g 2164 S   6.7 16.1 123:10.97 java              
 3063 1000      20   0 12.9g 2.5g 1880 S   7.0 17.1 120:49.54 java              
 2952 1000      20   0 12.9g 2.9g 2252 S  25.9 19.5  70:22.32 java              

while egrep -h ^rss\ /sys/fs/cgroup/memory/docker/*/memory.stat | sort -nk2 says

rss 406056960
rss 562262016
rss 1167360000
rss 2233044992
rss 2270339072
rss 2546380800
rss 2706317312
rss 3091529728

...which looks totally fine if you assume "g" and "m" mean 2^30 and 2^20.

#5 Updated by Tom Clegg almost 5 years ago

  • Status changed from New to In Progress
  • Story points set to 0.5

#6 Updated by Tom Clegg almost 5 years ago

Added "swap" to crunchstat memory stats, to help expose low-memory conditions. ("pgmajfault" was also added, in #3826.)

(Rss has probably been correct the whole time.)

#7 Updated by Tom Clegg almost 5 years ago

  • Status changed from In Progress to Feedback

#8 Updated by Tom Clegg almost 5 years ago

  • Target version changed from Arvados Future Sprints to 2014-11-19 sprint

#9 Updated by Tom Clegg almost 5 years ago

  • Story points deleted (0.5)

#10 Updated by Tom Clegg almost 5 years ago

  • Assigned To set to Tom Clegg

#11 Updated by Ward Vandewege almost 5 years ago

  • Story points set to 0.5

#12 Updated by Tom Clegg almost 5 years ago

  • Status changed from Feedback to Resolved

Also available in: Atom PDF