Bug #4185
closed[Crunch] crunchstat memory reports seem suspect for multithreaded programs
Description
qr1hi-8i9sb-zs7g4zo303dhdnu tries to run the GATK HaplotypeCaller with multiple threads and Java's maximum heap size set to 4g. The job ends when GATK aborts, claiming that not enough RAM is available to run the analysis.
qr1hi-8i9sb-l7vv9qqxozezv38 successfully ran the same analysis with a maximum heap size of 15g. However, the rss lines from crunchstat report that the maximum RAM used was about .5GiB (grep -F 'memory.stat rss' qr1hi-8i9sb-l7vv9qqxozezv38.log.txt | python3 -c 'import sys; print(max(int(line.split()[-1]) for line in sys.stdin))'
).
This reporting seems inconsistent with the results from the first job. I haven't dug deeply, but I'm guessing this has something to do with the way cgroups report memory use in the context of multithreaded programs. If possible, it would be good to find and use a number that more accurately reflects the job's "real" memory use.
Updated by Ward Vandewege about 10 years ago
- Target version changed from Bug Triage to Arvados Future Sprints
Updated by Tom Clegg about 10 years ago
This could be a clue, in https://www.kernel.org/doc/Documentation/cgroups/memory.txt
Only anonymous and swap cache memory is listed as part of 'rss' stat. This should not be confused with the true 'resident set size' or the amount of physical memory used by the cgroup.
While crunchstat and cgroups are not above suspicion, I think it's possible that the "rss" figure is correct. Java allocates a large heap (as requested with -Xmx) but this is "virt" size, not resident size. AIUI it's possible for a Java program to allocate large amounts of memory from the heap without writing to much of it, and in this case Java could reach its heap limit well before Linux devotes a comparable amount of resident RAM to the process.
Updated by Tom Clegg about 10 years ago
Watching a java program on compute0.q (qr1hi-8i9sb-brduspenge0fw63)
Wed Oct 22 15:17:18 2014 qr1hi-8i9sb-brduspenge0fw63 9565 1 stderr crunchstat: memory.stat rss 289296384
However:
cat /sys/fs/cgroup/memory/docker/68f7b9b13e4ac6c1644809d0733931f157886aa8b7a6e772461d8d2abe58182f/memory.stat cache 3121152 rss 9482752000 rss_huge 0 mapped_file 417792 writeback 4096 swap 1263448064 pgpgin 8254441 pgpgout 5938554 pgfault 5931378 pgmajfault 488211 inactive_anon 1126043648 active_anon 8356839424 inactive_file 1638400 active_file 1482752 unevictable 0 hierarchical_memory_limit 18446744073709551615 hierarchical_memsw_limit 18446744073709551615 total_cache 3121152 total_rss 9482752000 total_rss_huge 0 total_mapped_file 417792 total_writeback 4096 total_swap 1263448064 total_pgpgin 8254441 total_pgpgout 5938554 total_pgfault 5931378 total_pgmajfault 488211 total_inactive_anon 1126043648 total_active_anon 8356839424 total_inactive_file 1638400 total_active_file 1482752 total_unevictable 0
Updated by Tom Clegg about 10 years ago
OTOH maybe I was looking at the wrong task above, because now top says
3016 1000 20 0 12.9g 387m 1884 S 7.7 2.6 216:25.50 java 3064 1000 20 0 12.9g 534m 3712 S 6.0 3.5 47:12.57 java 3192 1000 20 0 13.0g 1.1g 2152 S 7.3 7.4 60:46.94 java 3194 1000 20 0 12.9g 2.1g 1996 S 6.3 14.3 90:26.65 java 3193 1000 20 0 12.9g 2.1g 2268 S 7.3 14.1 84:10.51 java 3191 1000 20 0 12.9g 2.4g 2164 S 6.7 16.1 123:10.97 java 3063 1000 20 0 12.9g 2.5g 1880 S 7.0 17.1 120:49.54 java 2952 1000 20 0 12.9g 2.9g 2252 S 25.9 19.5 70:22.32 java
while egrep -h ^rss\ /sys/fs/cgroup/memory/docker/*/memory.stat | sort -nk2
says
rss 406056960 rss 562262016 rss 1167360000 rss 2233044992 rss 2270339072 rss 2546380800 rss 2706317312 rss 3091529728
...which looks totally fine if you assume "g" and "m" mean 2^30 and 2^20.
Updated by Tom Clegg about 10 years ago
- Status changed from New to In Progress
- Story points set to 0.5
Updated by Tom Clegg about 10 years ago
Added "swap" to crunchstat memory stats, to help expose low-memory conditions. ("pgmajfault" was also added, in #3826.)
(Rss has probably been correct the whole time.)
Updated by Tom Clegg about 10 years ago
- Status changed from In Progress to Feedback
Updated by Tom Clegg about 10 years ago
- Target version changed from Arvados Future Sprints to 2014-11-19 sprint
Updated by Tom Clegg about 10 years ago
- Status changed from Feedback to Resolved