Bug #6997
closed[Keep] keepstore reboots GCE host under heavy load
Description
Steps to reproduce¶
- Start keepstore1.
- Write 3K small files2 (so the index isn't trivially small).
- Hit keepstore with 15 concurrent 40MiB writes3 (so it allocates 15 data buffers). Repeat, just to be sure.
- Hit keepstore with 20 concurrent /index requests4. Repeat a few times.
Runtime | Build | Env | After writes | After many indexes |
go1.3.3 | 0d9da68 | 1.8G | 3.7G | |
go1.3.3 | 0d9da68 | GOGC=10 | 2.1G | 2.3G |
go1.3.3 | 0d9da68 | GOMAXPROCS=8 | 2.4G | >4.5G |
go1.3.3 | 0d9da68 | GOMAXPROCS=8 GOGC=10 | 2.4G | 2.6G |
go1.4.2 | 0d9da68 | GOMAXPROCS=8 | 2.3G | 3.5G |
go1.4.2 | 0d9da68 | GOMAXPROCS=8 GOGC=10 | 2.1G | 2.4G |
Evidently, "index" generates a lot of garbage, and the default GOGC=100 is much too high for a program like keepstore that normally fills >50% of the system's RAM with active buffers.
See http://golang.org/pkg/runtime/debug/#SetGCPercent
1 mkdir /tmp/zzz; echo abc | GOMAXPROCS=8 keepstore -volume=/tmp/zzz -data-manager-token-file=/dev/stdin
2 for i in `seq 0 3000`; do echo -n "$i "; curl -X PUT -H 'Authorization: OAuth2 abc' --data-binary $i localhost:25107/`echo -n $i | md5sum | head -c32`; done
3 dd if=/dev/zero of=/tmp/zero bs=1000000 count=40; for i in `seq 0 15`; do curl -X PUT -H 'Authorization: OAuth2 foo' --data-binary @/tmp/zero localhost:25107/48e9a108a3ec623652e7988af2f88867 & done
4 for i in `seq 0 20`; do curl -s -H 'Authorization: OAuth2 abc' localhost:25107/index >/dev/null & done
Fix¶
Specify GOGC=10 in the keepstore run script, both in our own deployments and the Keepstore install guide.
Files
Updated by Brett Smith over 9 years ago
- Target version set to Arvados Future Sprints
Updated by Ward Vandewege over 9 years ago
Added swap on all the su92l keep nodes. I had to fight to keep keep9-13 in the air; 9 and 11 actually went down (OOM) and I reset them. The others had very high load.
I also noticed that keep11 was being hit way harder than the others (many messages about max_buffers being reached). Stats gaps as a consequence:
/data/keep11 on manage.s:
-rw-r--r-- 1 root root 5762 Aug 20 21:41 status.json.3312 -rw-r--r-- 1 root root 9342 Aug 20 21:26 status.json.3311
The very high pressure on that node continued until roughly 21:50.
Updated by Nico César over 9 years ago
memory keeps being allocated.
I wish we could graph this easily ...
manage.su92l:/data/keep10# for a in $( ls -1tr ) ; do date --reference=$a +%s.%N | tr "\n" " " ; jq '.["Memory"]["TotalAlloc"]' $a ; done
ANY GNUPLOT SKILLS?
Updated by Nico César over 9 years ago
Nico Cesar wrote:
memory keeps being allocated.
I wish we could graph this easily ...
[...]
ANY GNUPLOT SKILLS?
I just improved it to have all the keys.
KEYS=$(jq '.["Memory"] | keys' $(ls status.json* | head --lines=1) | tr "," " " | tr -d '[]"' | grep -v PauseNs | grep -v PauseEnd | grep -v BySize | tr "\n" " ") for a in $( ls -1tr ) ; do date --reference=$a +%s.%N | tr "\n" " " ; for col in $KEYS ; do jq '.["Memory"]["'$col'"]' $a | tr "\n" " " ; done ; echo ;done
Updated by Nico César over 9 years ago
- File keep10-pausetotalns.png keep10-pausetotalns.png added
- File keep10-nextlastgc.png keep10-nextlastgc.png added
- File keep10-mallocs.png keep10-mallocs.png added
- File keep10-sys.png keep10-sys.png added
- File keep10-gc.png keep10-gc.png added
- File keep10-mspans.png keep10-mspans.png added
- File keep10-mcache.png keep10-mcache.png added
- File keep10-lookups.png keep10-lookups.png added
- File keep10-stack.png keep10-stack.png added
- File keep10-heap.png keep10-heap.png added
Updated by Nico César over 9 years ago
check the references here: http://golang.org/pkg/runtime/#MemStats
Updated by Nico César over 9 years ago
we got "compute.instances.hostError an instance" on all the failing nodes and GCE rebooted the machines.
compute.instances.hostError an instance su92l-keep10 system start: Aug 21, 2015, 3:52:03 PM End: Aug 21, 2015, 3:52:03 PM
all the nodes that we've seen rebooted (keep7, keep9, keep10, keep11) have been done by GCE
my guess is that adding swap only made the load get higher when the swap was being used. And GCE kill VMs with high load.
My recommendation is to take swap off and lower the slot from 100 to 60
Updated by Ward Vandewege over 9 years ago
- Target version changed from Arvados Future Sprints to 2015-09-02 sprint
Updated by Ward Vandewege over 9 years ago
- Status changed from New to In Progress
Updated by Ward Vandewege over 9 years ago
Nico Cesar wrote:
My recommendation is to take swap off and lower the slot from 100 to 60
Done for all the su92l nodes.
Updated by Ward Vandewege over 9 years ago
Ward Vandewege wrote:
Nico Cesar wrote:
My recommendation is to take swap off and lower the slot from 100 to 60
Done for all the su92l nodes.
Perhaps unsurprisingly, this didn't really make a difference. Both keep10 and keep11 have died in the past 30 minutes (I restarted them).
Update 23:38pm: keep10/11/12 died once more.
Updated by Tom Clegg over 9 years ago
100x 64MiB buffers + 10% GOGC means we expect at least 7381975040 bytes (7208960 KiB) rss. We've seen the OOM-killer kill keepstore at: total-vm:7514700kB, anon-rss:6979172kB, file-rss:2280kB
If it's a problem with actual memory, it means max-buffers is too high, and I'd expect -max-buffers=60
to have helped: with that, rss should not get (much) past 4325376 KiB for more than one GC cycle.
If it's a problem with virtual memory, we could try ulimit -v
(it works as expected, unlike "chpst -m", which makes keepstore panic at startup). At least this would kill/restart the keepstore process when it hits 7.5GiB vsize, instead of taking down the whole host. But AFAICT vsize is not tied to a physical resource and therefore Go doesn't ever reduce it or try particularly hard to reduce growth, so this would just be a clue, not a solution.
https://www.kernel.org/doc/Documentation/sysctl/vm.txt → overcommit_memory: "This feature can be very useful because there are a lot of programs that malloc() huge amounts of memory "just-in-case" and don't use much of it."
Should we be using echo 1 >/proc/sys/vm/overcommit_memory
...?
Updated by Nico César over 9 years ago
- Status changed from In Progress to New
- Target version changed from 2015-09-02 sprint to Arvados Future Sprints
about overcommit:
here are the 3 modes:
https://www.kernel.org/doc/Documentation/vm/overcommit-accounting
basically 1 is go blindly and 2 is with a programmable ratio taking in account RAM+Swap
I checked all keep servers in su92l and the value is in 0 (Heuristic overcommit)
We can give it a try.
Tom, Do you know if there is a way to dump the memory from a running go program to analyze ?
Updated by Tom Clegg over 9 years ago
Nico Cesar wrote:
about overcommit:
here are the 3 modes:
https://www.kernel.org/doc/Documentation/vm/overcommit-accounting
basically 1 is go blindly and 2 is with a programmable ratio taking in account RAM+Swap
I checked all keep servers in su92l and the value is in 0 (Heuristic overcommit)
This "heuristic" is a bit mysterious. The link I posted (which says kernel 2.6.29 -- not sure when the doc page you posted was last updated) says: "When this flag is 0, the kernel attempts to estimate the amount of free memory left when userspace requests more memory." That sort of vaguely sounds like "never overcommit if no swap". But only vaguely. Slightly less vague than the words "heuristic" and "obvious" in the other doc, maybe, sort of.
Tom, Do you know if there is a way to dump the memory from a running go program to analyze ?
Yes, https://golang.org/pkg/runtime/debug/#WriteHeapDump ... what do you have in mind?
All of the evidence I've seen in my own testing and the dmesg stuff here indicates:- RSS is no bigger than we would expect when max-buffers=100 and GOGC=10 (and is larger than the OOM-killing point, so that was simply our fault for not getting the numbers right)
- VSIZE is explicitly not controllable; Go (like Java) assumes the OS is able to handle an address space larger than physical RAM as long as it isn't all in use
Do we have dmesg from a crash that happened after reducing max-buffers to 60? That could confirm that it's really VSIZE and not RSS that's the issue.
Updated by Peter Amstutz over 9 years ago
I found something!
Total: 3228.0 MB 2048.0 63.4% 63.4% 2048.0 63.4% main.func·003 993.5 30.8% 94.2% 1010.0 31.3% os.(*File).readdirnames 55.0 1.7% 95.9% 91.5 2.8% os.Lstat 32.5 1.0% 96.9% 32.5 1.0% strings.Join 26.5 0.8% 97.8% 26.5 0.8% os.fileInfoFromStat 16.5 0.5% 98.3% 16.5 0.5% syscall.ParseDirent 16.0 0.5% 98.8% 16.0 0.5% syscall.ByteSliceFromString 11.0 0.3% 99.1% 13.0 0.4% main.func·007 9.5 0.3% 99.4% 9.5 0.3% os.NewFile 4.0 0.1% 99.5% 4.0 0.1% runtime.convT2I
Updated by Peter Amstutz over 9 years ago
After running a little longer:
Total: 5247.5 MB 2691.0 51.3% 51.3% 2729.5 52.0% os.(*File).readdirnames 2048.0 39.0% 90.3% 2048.0 39.0% main.func·003 153.5 2.9% 93.2% 274.5 5.2% os.Lstat 87.0 1.7% 94.9% 87.0 1.7% os.fileInfoFromStat 85.5 1.6% 96.5% 85.5 1.6% strings.Join 44.5 0.8% 97.4% 44.5 0.8% syscall.ByteSliceFromString 38.5 0.7% 98.1% 38.5 0.7% syscall.ParseDirent 34.5 0.7% 98.8% 43.5 0.8% main.func·007 24.5 0.5% 99.2% 24.5 0.5% os.NewFile 17.0 0.3% 99.6% 17.0 0.3% runtime.convT2I
Updated by Peter Amstutz over 9 years ago
For the curious, main.func·003 is newBufferPool(), so it makes sense that it would be responsible for the vast majority of memory allocations (at least would be in a correctly functioning keepstore).
The now evidence strongly points towards os.(*File).readdirnames either having some kind of memory leak or we are failing to clean up resources.
Updated by Tom Clegg over 9 years ago
vmstat 1 on keep11.s just before rebooting.
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 2 3 0 146812 5436 4750848 0 0 105556 40 8065 13417 36 36 6 22 3 2 0 152764 5428 4704452 0 0 120336 122860 4265 6017 56 44 0 0 5 3 0 137760 5428 4731620 0 0 77876 410 5096 7521 29 38 5 28 6 0 0 153012 5428 4716660 0 0 119212 159748 5593 9728 51 34 2 13 0 6 0 143820 5428 4687208 0 0 11060 147492 3302 4862 27 47 0 26 2 3 0 170356 5428 4689856 0 0 2688 28 2155 2697 3 28 0 70 1 3 0 121500 5428 4776252 0 0 137504 40 3544 6452 15 25 24 36 2 3 0 128940 5428 4769868 0 0 179508 40 2711 4790 24 35 13 28 1 2 0 142704 5428 4756640 0 0 176152 48 3967 7054 38 32 7 23 1 3 0 134372 5492 4752808 0 0 172872 112 5528 8438 29 36 6 29 6 1 0 127256 5492 4726248 0 0 146580 113900 3553 5763 29 44 8 19 0 3 0 185016 5492 4689800 0 0 17552 24640 3339 5223 14 26 10 50 0 2 0 152832 5492 4706236 0 0 13412 131216 4590 6827 33 46 7 14 2 1 0 164860 5492 4729348 0 0 130820 60 5502 10497 18 22 10 49 1 3 0 160396 5496 4703072 0 0 42516 156163 4649 7657 28 42 0 30 3 1 0 130016 5496 4753988 0 0 46068 8578 7509 12456 30 31 6 33 2 2 0 173160 5492 4694048 0 0 29176 188356 8109 15390 23 51 8 18 1 4 0 169728 5492 4700612 0 0 6584 156512 3321 5755 9 29 21 41 3 2 0 166256 5492 4726304 0 0 2960 48 5348 7885 12 29 6 54 2 3 0 146128 5492 4749516 0 0 189892 40 6861 10528 40 38 5 17 2 2 0 171836 5492 4673392 0 0 93580 98348 5507 10270 45 44 1 11 4 2 0 120872 5492 4751396 0 0 113572 40 7332 13733 24 33 8 36 1 5 0 178408 5492 4681284 0 0 49848 152232 3029 4431 28 48 0 24 2 2 0 127196 5492 4747944 0 0 105240 40 7656 13315 26 26 11 36 6 1 0 125336 5492 4747864 0 0 157092 236 5203 8342 28 34 12 26 6 3 0 139348 5432 4736820 0 0 75732 98512 4272 5490 57 43 0 0 3 2 0 132892 5432 4729192 0 0 135448 98248 5349 9208 37 36 8 19 2 2 0 137200 5432 4751196 0 0 43400 24 3769 6483 17 28 23 32 3 3 0 163404 5432 4719552 0 0 140820 24616 7868 12755 41 43 3 13 1 3 0 136380 5432 4731280 0 0 170812 56 5818 10388 37 37 4 23 1 6 0 148532 5428 4672164 0 0 48152 323544 1997 4239 25 45 0 31 1 5 0 160056 5432 4673324 0 0 1156 32 4644 6918 14 31 0 55 Write failed: Broken pipe Shared connection to keep11.s closed.
atop (2015/08/24 19:20:01) indicates lots of IO writing in the last sample.
LVM | su92l-keep-1 | busy 17% | | read 12223 | write 44065 | KiB/r 126 | | KiB/w 157 | MBr/s 2.51 | MBw/s 11.31 | | avq 269.81 | avio 1.64 ms | LVM | su92l-keep-0 | busy 16% | | read 12121 | write 42154 | KiB/r 126 | | KiB/w 163 | MBr/s 2.50 | MBw/s 11.19 | | avq 286.06 | avio 1.69 ms | LVM | su92l-keep-2 | busy 14% | | read 7256 | write 41840 | KiB/r 125 | | KiB/w 160 | MBr/s 1.48 | MBw/s 10.92 | | avq 311.11 | avio 1.60 ms | PID MINFLT MAJFLT VSTEXT VSIZE RSIZE VGROW RGROW RUID EUID MEM CMD 1/42 15099 49155 13 2346K 3.5G 917.2M 0K 192.0M root root 12% keepstore PID RDDSK WRDSK WCANCL DSK CMD 1/42 15099 3.8G 19.8G 0K 100% keepstore
Updated by Tom Clegg over 9 years ago
- Subject changed from [Keep] keepstore holds on to too much RAM with default settings to [Keep] keepstore reboots GCE host under heavy load
Updated by Tom Clegg over 9 years ago
- linux kernel bug prevents memory from being freed as expected
- workaround added to Go in feb 2015 (1.5 is the only release since then)
- linux bug https://bugzilla.kernel.org/show_bug.cgi?id=93111 says pri=low
- linux bug says "this seems intentional" and "If I set /sys/kernel/mm/transparent_hugepage/enabled to "never" the problem goes away."
So far I've only found one single-character change needed to make everything build & pass tests with Go1.5: 687d757.
This doesn't quite seem to fit our case though, because [a] we don't see RSS growing -- unless we hit some trigger where RSS suddenly grows really huge, or the missing RAM is somehow not being reported in RSS like it is for other people -- and [b] atop reports 6.0G cache just before reboot, which seems like an indication linux thinks it has lots of RAM available.
Updated by Tom Clegg over 9 years ago
Can we try with non-zero swap, and max-buffers set to not exceed total RAM? It looks like we've tried both of these things but not at the same time. That would at least put us in the realm of "normal ways to use linux". Going all the way with that approach would mean max-buffers=40 and GOGC=100.
Updated by Tom Clegg over 9 years ago
Tom Clegg wrote:
Can we try with non-zero swap, and max-buffers set to not exceed total RAM? It looks like we've tried both of these things but not at the same time. That would at least put us in the realm of "normal ways to use linux". Going all the way with that approach would mean max-buffers=40 and GOGC=100.
Emphasis on max-buffers=40, not 60 -- with GOGC=100 and max-buffers=40 we expect about 5.5G res, vs. >8G res with max-buffers=60.
Updated by Ward Vandewege over 9 years ago
Okay so the kernel issue has been ruled out (keep11 and keep12 died again).
Tom Clegg wrote:
Tom Clegg wrote:
Can we try with non-zero swap, and max-buffers set to not exceed total RAM? It looks like we've tried both of these things but not at the same time. That would at least put us in the realm of "normal ways to use linux". Going all the way with that approach would mean max-buffers=40 and GOGC=100.
Emphasis on max-buffers=40, not 60 -- with GOGC=100 and max-buffers=40 we expect about 5.5G res, vs. >8G res with max-buffers=60.
OK, re-running like that now, with memory overcommit settings also returned to defaults.
Updated by Tom Clegg over 9 years ago
From https://groups.google.com/forum/#!topic/golang-nuts/0WSOKnHGBZE/discussion (re. "not releasing memory")
We now (as of 1.4) release goroutine stack memory to the OS. So your example should behave much better. Please try 1.4.2 and tip if you can, and open a new bug if you're still seeing problems. There are two known caveats which you might run into:
1) There is a kernel bug where freed pages aren't actually freed because of huge page support, see https://github.com/golang/go/issues/8832 . There is a workaround checked in for 1.5.
2) Stacks are freed, but not G structures, see https://github.com/golang/go/issues/8832 . We might get to this in 1.6.
Updated by Ward Vandewege over 9 years ago
Ward Vandewege wrote:
Okay so the kernel issue has been ruled out (keep11 and keep12 died again).
Tom Clegg wrote:
Tom Clegg wrote:
Can we try with non-zero swap, and max-buffers set to not exceed total RAM? It looks like we've tried both of these things but not at the same time. That would at least put us in the realm of "normal ways to use linux". Going all the way with that approach would mean max-buffers=40 and GOGC=100.
Emphasis on max-buffers=40, not 60 -- with GOGC=100 and max-buffers=40 we expect about 5.5G res, vs. >8G res with max-buffers=60.
OK, re-running like that now, with memory overcommit settings also returned to defaults.
keep11 just died. Memory pressure was low. Load was high-ish (between 6 and 7). But it had been hitting max-buffers for a while.
manage.su92l:/data/keep11# cat status.json.150114 |jq . { "volumes": [ { "mount_point": "/data/su92l-keep-0/keep", "device_num": 65024, "bytes_free": 503465967616, "bytes_used": 1106358243328 }, { "mount_point": "/data/su92l-keep-1/keep", "device_num": 65025, "bytes_free": 504915349504, "bytes_used": 1104908861440 }, { "mount_point": "/data/su92l-keep-2/keep", "device_num": 65026, "bytes_free": 509382160384, "bytes_used": 1100442050560 } ], "BufferPool": { "BytesAllocated": 24159191040, "BuffersMax": 40, "BuffersInUse": 40 }, "PullQueue": { "InProgress": 0, "Queued": 0 }, "TrashQueue": { "InProgress": 0, "Queued": 0 }, "Memory": { "Alloc": 2823535072, "TotalAlloc": 24285081416, "Sys": 3147569400, "Lookups": 20400, "Mallocs": 2597023, "Frees": 2527636, "HeapAlloc": 2823535072, "HeapSys": 2955673600, "HeapIdle": 131031040, "HeapInuse": 2824642560, "HeapReleased": 0, "HeapObjects": 69387, "StackInuse": 2359296, "StackSys": 2359296, "MSpanInuse": 89232, "MSpanSys": 98304, "MCacheInuse": 4800, "MCacheSys": 16384, "BuckHashSys": 1462456, "GCSys": 184891625, "OtherSys": 3067735, "NextGC": 5373197152, "LastGC": 1440511844804422100, "PauseTotalNs": 1217760239, "PauseNs": [ 288891, 171261, 253452, 256588, 159510, 180090, 431716, 429565, 420385, 422571, 423969, 262370, 422536, 413530, 270318, 444949, 452848, 543686, 18722193, 10836439, 8036607, 856033, 1039134, 6204598, 7405841, 10484042, 2165888, 2208708, 2341573, 2232237, 2047327, 1928057, 1390085, 1608044, 1613724, 98236224, 1624971, 15629257, 1626479, 1767185, 2179084, 2893032, 2193078, 2252229, 34353902, 2071667, 2180512, 2491072, 21727833, 2560965, 2537353, 2578280, 2397659, 2494659, 28775788, 23844646, 2425616, 2681864, 2497078, 2502960, 2384777, 2339549, 2509545, 9410810, 16971710, 2626032, 2611664, 9486162, 2750199, 2526121, 2477242, 2005752, 1898899, 2054844, 1952794, 1885540, 1968658, 2124501, 2512091, 2691505, 2590028, 2643929, 2773625, 2640361, 2643612, 2816181, 2664651, 8911443, 2077330, 2018793, 2158362, 2091485, 2139338, 2008764, 1832929, 1635072, 1760993, 1792141, 1631576, 1797422, 1809128, 1806544, 1806701, 1637645, 8627923, 1549485, 1709460, 1775591, 1690660, 7694376, 1809275, 1735509, 1824709, 2252965, 1794998, 1557886, 1609016, 1749600, 45204017, 1771277, 1490283, 1700191, 1658202, 1591519, 1463743, 1615452, 1476564, 1633691, 1661715, 1709581, 1734592, 1486707, 1483298, 3847125, 44080247, 1472380, 1554775, 1696490, 1610149, 1805244, 1694561, 1606165, 2252718, 8124888, 1821957, 1909194, 1935240, 1861954, 1970669, 52786586, 1753619, 1377337, 1458417, 1303150, 1447242, 1401994, 1270971, 1446665, 1284288, 1256256, 1268772, 6694757, 1490182, 1681862, 1542291, 1575682, 1675663, 1835794, 1894608, 1840730, 10866699, 43830201, 157875996, 2363329, 37473967, 65091815, 2643227, 3417279, 119590990, 15245223, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ], "PauseEnd": [ 21144069386626, 21144069774726, 21144070492080, 21144071352084, 21144072239120, 21144073003663, 21162862644540, 21205235121713, 21249717045480, 21294956413533, 21338645772781, 21382512426333, 21410551170000, 21438819408680, 21466792887497, 21482717664062, 21510011514387, 21511252724284, 21511552844827, 21515790613483, 21599449793565, 21719801964449, 21735212067477, 21737339022889, 21758872228561, 21760403627855, 21882133449185, 21916356799322, 22027783330723, 22149841059644, 22173587768457, 22237702831331, 22257034186634, 22268744884216, 22271933432530, 22275227547786, 22279150052984, 22281368731241, 22402647917595, 22481455276949, 22521018273826, 22525980129220, 22648046416352, 22664590374620, 22667967362046, 22730657215717, 22732180982804, 22741171416359, 22749000373057, 22870006020393, 22900150388371, 22904918446158, 22910373097683, 22911687705405, 22913953480185, 22920634865574, 22955388607314, 23075392715669, 23075524656998, 23078546845863, 23079841299715, 23081918088579, 23082924670851, 23083499419745, 23203609053068, 23207834983073, 23213279633062, 23215602388104, 23336359227278, 23433128479823, 23463944284623, 23482166142349, 23509055081235, 23608394217279, 23728087387963, 23733199641643, 23752045485846, 23785537942114, 23806685402734, 23834246019615, 23955076101922, 24040606313748, 24041231070569, 24056609660241, 24059844385711, 24070194061779, 24072706892990, 24075233451884, 24196391364621, 24318363404405, 24340211705606, 24378183687038, 24406404119928, 24528356163105, 24589022239972, 24593511309655, 24602614001863, 24607937670090, 24616388162527, 24647448103492, 24701502781370, 24810226367352, 24820126266900, 24824883203393, 24850407938090, 24965114901308, 24971460172457, 25054448032047, 25063530049610, 25093383933560, 25163345867002, 25172400270101, 25176015313993, 25260462205366, 25276206977660, 25290654066711, 25297467707883, 25319031396887, 25332207406320, 25452457153298, 25459103051376, 25479081135408, 25482232503603, 25504499026608, 25525940595854, 25535839047543, 25537172310383, 25565489563002, 25567067232990, 25596686208720, 25611504995769, 25726917220724, 25733055556672, 25747370975091, 25762371350900, 25842861874425, 25962925177918, 25963243910083, 25972433263715, 25990030396687, 26011810556728, 26013303301605, 26030834139587, 26033653702910, 26107118252921, 26108413487574, 26120282806414, 26121615721850, 26140000775826, 26143214686915, 26265489897336, 26272591663738, 26299361834445, 26378047861962, 26423911362980, 26431054852212, 26453582341520, 26462786200726, 26477779511895, 26493482080562, 26536146486618, 26602574372671, 26722725973243, 26726601016189, 26732206472248, 26746858829488, 26765142880409, 26770047906289, 26783368236800, 26871648944545, 26970657625177, 27005698334121, 27041656973885, 27100624053379, 27220670629375, 27340735767601, 27460771257807, 27580914861042, 27701042226581, 27821228489194, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ], "NumGC": 180, "EnableGC": true, "DebugGC": false, "BySize": [ { "Size": 0, "Mallocs": 0, "Frees": 0 }, { "Size": 8, "Mallocs": 13475, "Frees": 12919 }, { "Size": 16, "Mallocs": 1042849, "Frees": 996573 }, { "Size": 32, "Mallocs": 164941, "Frees": 158633 }, { "Size": 48, "Mallocs": 124074, "Frees": 119478 }, { "Size": 64, "Mallocs": 111228, "Frees": 106742 }, { "Size": 80, "Mallocs": 18732, "Frees": 17839 }, { "Size": 96, "Mallocs": 22046, "Frees": 20910 }, { "Size": 112, "Mallocs": 32586, "Frees": 31426 }, { "Size": 128, "Mallocs": 1169, "Frees": 1073 }, { "Size": 144, "Mallocs": 27672, "Frees": 26940 }, { "Size": 160, "Mallocs": 17422, "Frees": 16851 }, { "Size": 176, "Mallocs": 2879, "Frees": 2844 }, { "Size": 192, "Mallocs": 9, "Frees": 8 }, { "Size": 208, "Mallocs": 14826, "Frees": 14359 }, { "Size": 224, "Mallocs": 8, "Frees": 8 }, { "Size": 240, "Mallocs": 7, "Frees": 2 }, { "Size": 256, "Mallocs": 596, "Frees": 572 }, { "Size": 288, "Mallocs": 18602, "Frees": 17677 }, { "Size": 320, "Mallocs": 26, "Frees": 23 }, { "Size": 352, "Mallocs": 17658, "Frees": 17306 }, { "Size": 384, "Mallocs": 5, "Frees": 1 }, { "Size": 416, "Mallocs": 15, "Frees": 8 }, { "Size": 448, "Mallocs": 10, "Frees": 10 }, { "Size": 480, "Mallocs": 1, "Frees": 0 }, { "Size": 512, "Mallocs": 38, "Frees": 37 }, { "Size": 576, "Mallocs": 2875, "Frees": 2840 }, { "Size": 640, "Mallocs": 31, "Frees": 14 }, { "Size": 704, "Mallocs": 1223, "Frees": 1211 }, { "Size": 768, "Mallocs": 0, "Frees": 0 }, { "Size": 896, "Mallocs": 37, "Frees": 13 }, { "Size": 1024, "Mallocs": 23, "Frees": 18 }, { "Size": 1152, "Mallocs": 2873, "Frees": 2842 }, { "Size": 1280, "Mallocs": 15, "Frees": 13 }, { "Size": 1408, "Mallocs": 0, "Frees": 0 }, { "Size": 1536, "Mallocs": 4, "Frees": 0 }, { "Size": 1664, "Mallocs": 7, "Frees": 3 }, { "Size": 2048, "Mallocs": 560, "Frees": 497 }, { "Size": 2304, "Mallocs": 2875, "Frees": 2837 }, { "Size": 2560, "Mallocs": 13, "Frees": 3 }, { "Size": 2816, "Mallocs": 0, "Frees": 0 }, { "Size": 3072, "Mallocs": 0, "Frees": 0 }, { "Size": 3328, "Mallocs": 3, "Frees": 1 }, { "Size": 4096, "Mallocs": 1864, "Frees": 1476 }, { "Size": 4608, "Mallocs": 2868, "Frees": 2837 }, { "Size": 5376, "Mallocs": 4, "Frees": 0 }, { "Size": 6144, "Mallocs": 2, "Frees": 0 }, { "Size": 6400, "Mallocs": 0, "Frees": 0 }, { "Size": 6656, "Mallocs": 1, "Frees": 0 }, { "Size": 6912, "Mallocs": 0, "Frees": 0 }, { "Size": 8192, "Mallocs": 44, "Frees": 38 }, { "Size": 8448, "Mallocs": 0, "Frees": 0 }, { "Size": 8704, "Mallocs": 0, "Frees": 0 }, { "Size": 9472, "Mallocs": 2765, "Frees": 2734 }, { "Size": 10496, "Mallocs": 0, "Frees": 0 }, { "Size": 12288, "Mallocs": 0, "Frees": 0 }, { "Size": 13568, "Mallocs": 0, "Frees": 0 }, { "Size": 14080, "Mallocs": 0, "Frees": 0 }, { "Size": 16384, "Mallocs": 0, "Frees": 0 }, { "Size": 16640, "Mallocs": 0, "Frees": 0 }, { "Size": 17664, "Mallocs": 0, "Frees": 0 } ] } }
Updated by Tom Clegg over 9 years ago
- half GET, half PUT, plus a couple of status and index thrown in → max-buffers reached, load avg ~8, but no problem: this ran all night.
- nearly all PUT → no problem.
- vary the data in each PUT request so actual filesystem writes and disk IO are needed → write to disk at ~120 MiB/s for a short while (~1 minute?), then crash/reboot.
(I started up a separate keepstore proc for the last experiment, so I could divert the useless data to a separate dir on the storage disks. Should confirm the previous experiments still don't crash with that setup.)
<1s before crash:
top - 13:17:02 up 1 day, 12:38, 6 users, load average: 5.00, 2.40, 4.26 Tasks: 136 total, 3 running, 133 sleeping, 0 stopped, 0 zombie %Cpu(s): 25.3 us, 41.9 sy, 0.0 ni, 0.0 id, 29.3 wa, 0.0 hi, 3.5 si, 0.0 st KiB Mem: 7679608 total, 7358572 used, 321036 free, 5720 buffers KiB Swap: 5242876 total, 61152 used, 5181724 free, 3318740 cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7025 tom 20 0 2068m 1.4g 6584 S 48.7 19.7 1:03.61 python 7001 root 20 0 3125m 2.0g 5384 S 38.8 27.9 0:44.51 keepstore 6780 root 20 0 0 0 0 D 22.9 0.0 0:20.47 kworker/1:2 2749 root 20 0 0 0 0 R 13.9 0.0 0:09.84 kworker/0:1 6750 root 20 0 0 0 0 R 9.0 0.0 0:18.89 kworker/0:2 25652 root 20 0 0 0 0 S 7.0 0.0 0:01.55 kworker/u4:1
(top looked like this for a while before crash.)
<10s before crash:
PRC | sys 8.37s | user 4.34s | | | #proc 136 | #trun 9 | #tslpi 233 | #tslpu 1 | #zombie 0 | | clones 60 | | #exit 38 | CPU | sys 77% | user 41% | irq 6% | | | idle 3% | wait 74% | | steal 0% | | guest 0% | curf 2.60GHz | curscal ?% | cpu | sys 41% | user 17% | irq 5% | | | idle 2% | cpu000 w 35% | | steal 0% | | guest 0% | curf 2.60GHz | curscal ?% | cpu | sys 36% | user 23% | irq 0% | | | idle 1% | cpu001 w 39% | | steal 0% | | guest 0% | curf 2.60GHz | curscal ?% | CPL | avg1 4.99 | | avg5 2.35 | avg15 4.26 | | | | csw 84894 | | intr 27756 | | | numcpu 2 | MEM | tot 7.3G | free 419.5M | | cache 3.1G | dirty 246.9M | buff 5.6M | | slab 211.1M | | | | | | SWP | tot 5.0G | free 4.9G | | | | | | | | | | vmcom 8.1G | vmlim 8.7G | PAG | scan 268660 | | | stall 0 | | | | | | swin 0 | | | swout 0 | LVM | zzzzz-keep-0 | busy 103% | | read 1 | write 10434 | KiB/r 4 | | KiB/w 126 | MBr/s 0.00 | MBw/s 128.71 | | avq 2789.18 | avio 0.96 ms | DSK | sdb | busy 102% | | read 1 | write 9727 | KiB/r 4 | | KiB/w 126 | MBr/s 0.00 | MBw/s 119.93 | | avq 126.12 | avio 1.02 ms | DSK | sda | busy 44% | | read 8 | write 109 | KiB/r 4 | | KiB/w 6 | MBr/s 0.00 | MBw/s 0.07 | | avq 3.64 | avio 36.4 ms | NET | transport | tcpi 29275 | tcpo 29309 | udpi 2 | udpo 2 | tcpao 158 | | tcppo 157 | tcprs 4 | tcpie 0 | tcpor 5 | udpnp 0 | udpip 0 | NET | network | ipi 29277 | | ipo 29278 | ipfrw 0 | deliv 29277 | | | | | | icmpi 0 | icmpo 0 | NET | lo ---- | pcki 29165 | pcko 29165 | | si 965 Mbps | so 965 Mbps | coll 0 | mlti 0 | erri 0 | | erro 0 | drpi 0 | drpo 0 | NET | eth0 ---- | pcki 112 | pcko 113 | | si 13 Kbps | so 75 Kbps | coll 0 | mlti 0 | erri 0 | | erro 0 | drpi 0 | drpo 0 |
Updated by Tom Clegg over 9 years ago
Many similar attempts have stayed alive happily for hours, then another one died within seconds of startup. It's not clear whether the test program is actually causing the crashes.
Updated by Tom Clegg over 9 years ago
- 96 python threads across 3 machines, reading/writing at once
- Turn off http keepalive by adding "Connection: close" header → 15K entries in netstat
- max-buffers=100 → vsize 12G, keepstore gets OOM-killed and when rsize is ~7G, runit restarts it
- Lots of repetitive PUTs (low disk activity)
- Lots of non-repetitive PUTs (high disk activity)
- ~Equal mix of GET and PUT
- index and status requests thrown in
- lots of IO (~2TB disk consumed during test trials over the last few days)
But something did crash keep14 while I was running tests, so the problem (or at least some problem) is still present.
Updated by Tom Clegg over 9 years ago
Updated by Tom Clegg almost 6 years ago
- Status changed from In Progress to Resolved
- Target version deleted (
Arvados Future Sprints)