Project

General

Profile

Actions

Bug #6997

closed

[Keep] keepstore reboots GCE host under heavy load

Added by Tom Clegg over 8 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
-
Category:
Deployment
Target version:
-
Story points:
-

Description

Steps to reproduce

  1. Start keepstore1.
  2. Write 3K small files2 (so the index isn't trivially small).
  3. Hit keepstore with 15 concurrent 40MiB writes3 (so it allocates 15 data buffers). Repeat, just to be sure.
  4. Hit keepstore with 20 concurrent /index requests4. Repeat a few times.
Runtime Build Env After writes After many indexes
go1.3.3 0d9da68 1.8G 3.7G
go1.3.3 0d9da68 GOGC=10 2.1G 2.3G
go1.3.3 0d9da68 GOMAXPROCS=8 2.4G >4.5G
go1.3.3 0d9da68 GOMAXPROCS=8 GOGC=10 2.4G 2.6G
go1.4.2 0d9da68 GOMAXPROCS=8 2.3G 3.5G
go1.4.2 0d9da68 GOMAXPROCS=8 GOGC=10 2.1G 2.4G

Evidently, "index" generates a lot of garbage, and the default GOGC=100 is much too high for a program like keepstore that normally fills >50% of the system's RAM with active buffers.

See http://golang.org/pkg/runtime/debug/#SetGCPercent

1 mkdir /tmp/zzz; echo abc | GOMAXPROCS=8 keepstore -volume=/tmp/zzz -data-manager-token-file=/dev/stdin

2 for i in `seq 0 3000`; do echo -n "$i "; curl -X PUT -H 'Authorization: OAuth2 abc' --data-binary $i localhost:25107/`echo -n $i | md5sum | head -c32`; done

3 dd if=/dev/zero of=/tmp/zero bs=1000000 count=40; for i in `seq 0 15`; do curl -X PUT -H 'Authorization: OAuth2 foo' --data-binary @/tmp/zero localhost:25107/48e9a108a3ec623652e7988af2f88867 & done

4 for i in `seq 0 20`; do curl -s -H 'Authorization: OAuth2 abc' localhost:25107/index >/dev/null & done

Fix

Specify GOGC=10 in the keepstore run script, both in our own deployments and the Keepstore install guide.


Files

keep10-nextlastgc.png (29.1 KB) keep10-nextlastgc.png Nico César, 08/21/2015 08:06 PM
keep10-pausetotalns.png (33.9 KB) keep10-pausetotalns.png Nico César, 08/21/2015 08:06 PM
keep10-mallocs.png (29.3 KB) keep10-mallocs.png Nico César, 08/21/2015 08:06 PM
keep10-sys.png (30 KB) keep10-sys.png Nico César, 08/21/2015 08:06 PM
keep10-gc.png (27.1 KB) keep10-gc.png Nico César, 08/21/2015 08:06 PM
keep10-mspans.png (32 KB) keep10-mspans.png Nico César, 08/21/2015 08:06 PM
keep10-mcache.png (25.7 KB) keep10-mcache.png Nico César, 08/21/2015 08:06 PM
keep10-lookups.png (29.5 KB) keep10-lookups.png Nico César, 08/21/2015 08:06 PM
keep10-stack.png (29.3 KB) keep10-stack.png Nico César, 08/21/2015 08:06 PM
keep10-heap.png (36.2 KB) keep10-heap.png Nico César, 08/21/2015 08:06 PM

Related issues

Related to Arvados - Bug #7121: [Keep] keepstore should use only one buffer for each PUT (and should not deadlock)ResolvedTom Clegg08/24/2015Actions
Related to Arvados - Bug #7165: [Keep] Write replayable activity logsClosed08/28/2015Actions
Has duplicate Arvados - Bug #7119: Pipeline instance failed and log collection is emptyDuplicate08/24/2015Actions
Actions #1

Updated by Brett Smith over 8 years ago

  • Description updated (diff)
Actions #2

Updated by Brett Smith over 8 years ago

  • Target version set to Arvados Future Sprints
Actions #9

Updated by Ward Vandewege over 8 years ago

Added swap on all the su92l keep nodes. I had to fight to keep keep9-13 in the air; 9 and 11 actually went down (OOM) and I reset them. The others had very high load.

I also noticed that keep11 was being hit way harder than the others (many messages about max_buffers being reached). Stats gaps as a consequence:

/data/keep11 on manage.s:

-rw-r--r--  1 root root   5762 Aug 20 21:41 status.json.3312
-rw-r--r--  1 root root   9342 Aug 20 21:26 status.json.3311

The very high pressure on that node continued until roughly 21:50.

Actions #12

Updated by Nico César over 8 years ago

memory keeps being allocated.

I wish we could graph this easily ...

manage.su92l:/data/keep10# for a in $( ls -1tr ) ; do date --reference=$a +%s.%N | tr "\n" " " ; jq '.["Memory"]["TotalAlloc"]' $a ; done

ANY GNUPLOT SKILLS?

Actions #13

Updated by Nico César over 8 years ago

Nico Cesar wrote:

memory keeps being allocated.

I wish we could graph this easily ...

[...]

ANY GNUPLOT SKILLS?

I just improved it to have all the keys.

KEYS=$(jq '.["Memory"] | keys' $(ls status.json* | head --lines=1) | tr ","  " " | tr -d '[]"' | grep -v PauseNs | grep -v PauseEnd | grep -v BySize | tr "\n" " ")
for a in $( ls -1tr ) ; do date --reference=$a +%s.%N | tr "\n" " " ;  for col in $KEYS ; do jq '.["Memory"]["'$col'"]' $a | tr "\n" " " ; done ; echo ;done
Actions #15

Updated by Nico César over 8 years ago

check the references here: http://golang.org/pkg/runtime/#MemStats

Actions #16

Updated by Nico César over 8 years ago

we got "compute.instances.hostError an instance" on all the failing nodes and GCE rebooted the machines.

compute.instances.hostError an instance   su92l-keep10  system      start: Aug 21, 2015, 3:52:03 PM  End: Aug 21, 2015, 3:52:03 PM 

all the nodes that we've seen rebooted (keep7, keep9, keep10, keep11) have been done by GCE

my guess is that adding swap only made the load get higher when the swap was being used. And GCE kill VMs with high load.

My recommendation is to take swap off and lower the slot from 100 to 60

Actions #17

Updated by Ward Vandewege over 8 years ago

  • Target version changed from Arvados Future Sprints to 2015-09-02 sprint
Actions #18

Updated by Ward Vandewege over 8 years ago

  • Status changed from New to In Progress
Actions #19

Updated by Ward Vandewege over 8 years ago

Nico Cesar wrote:

My recommendation is to take swap off and lower the slot from 100 to 60

Done for all the su92l nodes.

Actions #20

Updated by Ward Vandewege over 8 years ago

Ward Vandewege wrote:

Nico Cesar wrote:

My recommendation is to take swap off and lower the slot from 100 to 60

Done for all the su92l nodes.

Perhaps unsurprisingly, this didn't really make a difference. Both keep10 and keep11 have died in the past 30 minutes (I restarted them).

Update 23:38pm: keep10/11/12 died once more.

Actions #21

Updated by Tom Clegg over 8 years ago

100x 64MiB buffers + 10% GOGC means we expect at least 7381975040 bytes (7208960 KiB) rss. We've seen the OOM-killer kill keepstore at: total-vm:7514700kB, anon-rss:6979172kB, file-rss:2280kB

If it's a problem with actual memory, it means max-buffers is too high, and I'd expect -max-buffers=60 to have helped: with that, rss should not get (much) past 4325376 KiB for more than one GC cycle.

If it's a problem with virtual memory, we could try ulimit -v (it works as expected, unlike "chpst -m", which makes keepstore panic at startup). At least this would kill/restart the keepstore process when it hits 7.5GiB vsize, instead of taking down the whole host. But AFAICT vsize is not tied to a physical resource and therefore Go doesn't ever reduce it or try particularly hard to reduce growth, so this would just be a clue, not a solution.

https://www.kernel.org/doc/Documentation/sysctl/vm.txt → overcommit_memory: "This feature can be very useful because there are a lot of programs that malloc() huge amounts of memory "just-in-case" and don't use much of it."

Should we be using echo 1 >/proc/sys/vm/overcommit_memory ...?

Actions #22

Updated by Nico César over 8 years ago

  • Status changed from In Progress to New
  • Target version changed from 2015-09-02 sprint to Arvados Future Sprints

about overcommit:

here are the 3 modes:

https://www.kernel.org/doc/Documentation/vm/overcommit-accounting

basically 1 is go blindly and 2 is with a programmable ratio taking in account RAM+Swap

I checked all keep servers in su92l and the value is in 0 (Heuristic overcommit)

We can give it a try.

Tom, Do you know if there is a way to dump the memory from a running go program to analyze ?

Actions #23

Updated by Tom Clegg over 8 years ago

Nico Cesar wrote:

about overcommit:

here are the 3 modes:

https://www.kernel.org/doc/Documentation/vm/overcommit-accounting

basically 1 is go blindly and 2 is with a programmable ratio taking in account RAM+Swap

I checked all keep servers in su92l and the value is in 0 (Heuristic overcommit)

This "heuristic" is a bit mysterious. The link I posted (which says kernel 2.6.29 -- not sure when the doc page you posted was last updated) says: "When this flag is 0, the kernel attempts to estimate the amount of free memory left when userspace requests more memory." That sort of vaguely sounds like "never overcommit if no swap". But only vaguely. Slightly less vague than the words "heuristic" and "obvious" in the other doc, maybe, sort of.

Tom, Do you know if there is a way to dump the memory from a running go program to analyze ?

Yes, https://golang.org/pkg/runtime/debug/#WriteHeapDump ... what do you have in mind?

All of the evidence I've seen in my own testing and the dmesg stuff here indicates:
  • RSS is no bigger than we would expect when max-buffers=100 and GOGC=10 (and is larger than the OOM-killing point, so that was simply our fault for not getting the numbers right)
  • VSIZE is explicitly not controllable; Go (like Java) assumes the OS is able to handle an address space larger than physical RAM as long as it isn't all in use

Do we have dmesg from a crash that happened after reducing max-buffers to 60? That could confirm that it's really VSIZE and not RSS that's the issue.

Actions #25

Updated by Peter Amstutz over 8 years ago

I found something!

Total: 3228.0 MB
  2048.0  63.4%  63.4%   2048.0  63.4% main.func·003
   993.5  30.8%  94.2%   1010.0  31.3% os.(*File).readdirnames
    55.0   1.7%  95.9%     91.5   2.8% os.Lstat
    32.5   1.0%  96.9%     32.5   1.0% strings.Join
    26.5   0.8%  97.8%     26.5   0.8% os.fileInfoFromStat
    16.5   0.5%  98.3%     16.5   0.5% syscall.ParseDirent
    16.0   0.5%  98.8%     16.0   0.5% syscall.ByteSliceFromString
    11.0   0.3%  99.1%     13.0   0.4% main.func·007
     9.5   0.3%  99.4%      9.5   0.3% os.NewFile
     4.0   0.1%  99.5%      4.0   0.1% runtime.convT2I
Actions #26

Updated by Peter Amstutz over 8 years ago

After running a little longer:

Total: 5247.5 MB
  2691.0  51.3%  51.3%   2729.5  52.0% os.(*File).readdirnames
  2048.0  39.0%  90.3%   2048.0  39.0% main.func·003
   153.5   2.9%  93.2%    274.5   5.2% os.Lstat
    87.0   1.7%  94.9%     87.0   1.7% os.fileInfoFromStat
    85.5   1.6%  96.5%     85.5   1.6% strings.Join
    44.5   0.8%  97.4%     44.5   0.8% syscall.ByteSliceFromString
    38.5   0.7%  98.1%     38.5   0.7% syscall.ParseDirent
    34.5   0.7%  98.8%     43.5   0.8% main.func·007
    24.5   0.5%  99.2%     24.5   0.5% os.NewFile
    17.0   0.3%  99.6%     17.0   0.3% runtime.convT2I
Actions #27

Updated by Peter Amstutz over 8 years ago

For the curious, main.func·003 is newBufferPool(), so it makes sense that it would be responsible for the vast majority of memory allocations (at least would be in a correctly functioning keepstore).

The now evidence strongly points towards os.(*File).readdirnames either having some kind of memory leak or we are failing to clean up resources.

Actions #30

Updated by Tom Clegg over 8 years ago

  • Status changed from New to In Progress
Actions #31

Updated by Tom Clegg over 8 years ago

vmstat 1 on keep11.s just before rebooting.

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 2  3      0 146812   5436 4750848    0    0 105556    40 8065 13417 36 36  6 22
 3  2      0 152764   5428 4704452    0    0 120336 122860 4265 6017 56 44  0  0
 5  3      0 137760   5428 4731620    0    0 77876   410 5096 7521 29 38  5 28
 6  0      0 153012   5428 4716660    0    0 119212 159748 5593 9728 51 34  2 13
 0  6      0 143820   5428 4687208    0    0 11060 147492 3302 4862 27 47  0 26
 2  3      0 170356   5428 4689856    0    0  2688    28 2155 2697  3 28  0 70
 1  3      0 121500   5428 4776252    0    0 137504    40 3544 6452 15 25 24 36
 2  3      0 128940   5428 4769868    0    0 179508    40 2711 4790 24 35 13 28
 1  2      0 142704   5428 4756640    0    0 176152    48 3967 7054 38 32  7 23
 1  3      0 134372   5492 4752808    0    0 172872   112 5528 8438 29 36  6 29
 6  1      0 127256   5492 4726248    0    0 146580 113900 3553 5763 29 44  8 19
 0  3      0 185016   5492 4689800    0    0 17552 24640 3339 5223 14 26 10 50
 0  2      0 152832   5492 4706236    0    0 13412 131216 4590 6827 33 46  7 14
 2  1      0 164860   5492 4729348    0    0 130820    60 5502 10497 18 22 10 49
 1  3      0 160396   5496 4703072    0    0 42516 156163 4649 7657 28 42  0 30
 3  1      0 130016   5496 4753988    0    0 46068  8578 7509 12456 30 31  6 33
 2  2      0 173160   5492 4694048    0    0 29176 188356 8109 15390 23 51  8 18
 1  4      0 169728   5492 4700612    0    0  6584 156512 3321 5755  9 29 21 41
 3  2      0 166256   5492 4726304    0    0  2960    48 5348 7885 12 29  6 54
 2  3      0 146128   5492 4749516    0    0 189892    40 6861 10528 40 38  5 17
 2  2      0 171836   5492 4673392    0    0 93580 98348 5507 10270 45 44  1 11
 4  2      0 120872   5492 4751396    0    0 113572    40 7332 13733 24 33  8 36
 1  5      0 178408   5492 4681284    0    0 49848 152232 3029 4431 28 48  0 24
 2  2      0 127196   5492 4747944    0    0 105240    40 7656 13315 26 26 11 36
 6  1      0 125336   5492 4747864    0    0 157092   236 5203 8342 28 34 12 26
 6  3      0 139348   5432 4736820    0    0 75732 98512 4272 5490 57 43  0  0
 3  2      0 132892   5432 4729192    0    0 135448 98248 5349 9208 37 36  8 19
 2  2      0 137200   5432 4751196    0    0 43400    24 3769 6483 17 28 23 32
 3  3      0 163404   5432 4719552    0    0 140820 24616 7868 12755 41 43  3 13
 1  3      0 136380   5432 4731280    0    0 170812    56 5818 10388 37 37  4 23
 1  6      0 148532   5428 4672164    0    0 48152 323544 1997 4239 25 45  0 31
 1  5      0 160056   5432 4673324    0    0  1156    32 4644 6918 14 31  0 55
Write failed: Broken pipe
Shared connection to keep11.s closed.

atop (2015/08/24 19:20:01) indicates lots of IO writing in the last sample.

LVM | su92l-keep-1  | busy     17%  |               | read   12223  | write  44065  | KiB/r    126  |              |  KiB/w    157 |  MBr/s   2.51 |  MBw/s  11.31 |               |  avq   269.81 |  avio 1.64 ms |
LVM | su92l-keep-0  | busy     16%  |               | read   12121  | write  42154  | KiB/r    126  |              |  KiB/w    163 |  MBr/s   2.50 |  MBw/s  11.19 |               |  avq   286.06 |  avio 1.69 ms |
LVM | su92l-keep-2  | busy     14%  |               | read    7256  | write  41840  | KiB/r    125  |              |  KiB/w    160 |  MBr/s   1.48 |  MBw/s  10.92 |               |  avq   311.11 |  avio 1.60 ms |

  PID           MINFLT            MAJFLT            VSTEXT             VSIZE             RSIZE             VGROW             RGROW            RUID                EUID                 MEM           CMD        1/42
15099            49155                13             2346K              3.5G            917.2M                0K            192.0M            root                root                 12%           keepstore

  PID                                   RDDSK                                   WRDSK                                  WCANCL                                   DSK                                   CMD       1/42
15099                                    3.8G                                   19.8G                                      0K                                  100%                                   keepstore

Actions #32

Updated by Tom Clegg over 8 years ago

  • Subject changed from [Keep] keepstore holds on to too much RAM with default settings to [Keep] keepstore reboots GCE host under heavy load
Actions #33

Updated by Tom Clegg over 8 years ago

https://github.com/golang/go/issues/8832 suggests
  • linux kernel bug prevents memory from being freed as expected
  • workaround added to Go in feb 2015 (1.5 is the only release since then)
  • linux bug https://bugzilla.kernel.org/show_bug.cgi?id=93111 says pri=low
  • linux bug says "this seems intentional" and "If I set /sys/kernel/mm/transparent_hugepage/enabled to "never" the problem goes away."

So far I've only found one single-character change needed to make everything build & pass tests with Go1.5: 687d757.

This doesn't quite seem to fit our case though, because [a] we don't see RSS growing -- unless we hit some trigger where RSS suddenly grows really huge, or the missing RAM is somehow not being reported in RSS like it is for other people -- and [b] atop reports 6.0G cache just before reboot, which seems like an indication linux thinks it has lots of RAM available.

Actions #34

Updated by Tom Clegg over 8 years ago

Can we try with non-zero swap, and max-buffers set to not exceed total RAM? It looks like we've tried both of these things but not at the same time. That would at least put us in the realm of "normal ways to use linux". Going all the way with that approach would mean max-buffers=40 and GOGC=100.

Actions #37

Updated by Tom Clegg over 8 years ago

Tom Clegg wrote:

Can we try with non-zero swap, and max-buffers set to not exceed total RAM? It looks like we've tried both of these things but not at the same time. That would at least put us in the realm of "normal ways to use linux". Going all the way with that approach would mean max-buffers=40 and GOGC=100.

Emphasis on max-buffers=40, not 60 -- with GOGC=100 and max-buffers=40 we expect about 5.5G res, vs. >8G res with max-buffers=60.

Actions #38

Updated by Ward Vandewege over 8 years ago

Okay so the kernel issue has been ruled out (keep11 and keep12 died again).

Tom Clegg wrote:

Tom Clegg wrote:

Can we try with non-zero swap, and max-buffers set to not exceed total RAM? It looks like we've tried both of these things but not at the same time. That would at least put us in the realm of "normal ways to use linux". Going all the way with that approach would mean max-buffers=40 and GOGC=100.

Emphasis on max-buffers=40, not 60 -- with GOGC=100 and max-buffers=40 we expect about 5.5G res, vs. >8G res with max-buffers=60.

OK, re-running like that now, with memory overcommit settings also returned to defaults.

Actions #39

Updated by Tom Clegg over 8 years ago

From https://groups.google.com/forum/#!topic/golang-nuts/0WSOKnHGBZE/discussion (re. "not releasing memory")

We now (as of 1.4) release goroutine stack memory to the OS. So your example should behave much better. Please try 1.4.2 and tip if you can, and open a new bug if you're still seeing problems. There are two known caveats which you might run into:

1) There is a kernel bug where freed pages aren't actually freed because of huge page support, see https://github.com/golang/go/issues/8832 . There is a workaround checked in for 1.5.
2) Stacks are freed, but not G structures, see https://github.com/golang/go/issues/8832 . We might get to this in 1.6.

Actions #40

Updated by Ward Vandewege over 8 years ago

Ward Vandewege wrote:

Okay so the kernel issue has been ruled out (keep11 and keep12 died again).

Tom Clegg wrote:

Tom Clegg wrote:

Can we try with non-zero swap, and max-buffers set to not exceed total RAM? It looks like we've tried both of these things but not at the same time. That would at least put us in the realm of "normal ways to use linux". Going all the way with that approach would mean max-buffers=40 and GOGC=100.

Emphasis on max-buffers=40, not 60 -- with GOGC=100 and max-buffers=40 we expect about 5.5G res, vs. >8G res with max-buffers=60.

OK, re-running like that now, with memory overcommit settings also returned to defaults.

keep11 just died. Memory pressure was low. Load was high-ish (between 6 and 7). But it had been hitting max-buffers for a while.

manage.su92l:/data/keep11# cat status.json.150114 |jq . 
{
  "volumes": [
    {
      "mount_point": "/data/su92l-keep-0/keep",
      "device_num": 65024,
      "bytes_free": 503465967616,
      "bytes_used": 1106358243328
    },
    {
      "mount_point": "/data/su92l-keep-1/keep",
      "device_num": 65025,
      "bytes_free": 504915349504,
      "bytes_used": 1104908861440
    },
    {
      "mount_point": "/data/su92l-keep-2/keep",
      "device_num": 65026,
      "bytes_free": 509382160384,
      "bytes_used": 1100442050560
    }
  ],
  "BufferPool": {
    "BytesAllocated": 24159191040,
    "BuffersMax": 40,
    "BuffersInUse": 40
  },
  "PullQueue": {
    "InProgress": 0,
    "Queued": 0
  },
  "TrashQueue": {
    "InProgress": 0,
    "Queued": 0
  },
  "Memory": {
    "Alloc": 2823535072,
    "TotalAlloc": 24285081416,
    "Sys": 3147569400,
    "Lookups": 20400,
    "Mallocs": 2597023,
    "Frees": 2527636,
    "HeapAlloc": 2823535072,
    "HeapSys": 2955673600,
    "HeapIdle": 131031040,
    "HeapInuse": 2824642560,
    "HeapReleased": 0,
    "HeapObjects": 69387,
    "StackInuse": 2359296,
    "StackSys": 2359296,
    "MSpanInuse": 89232,
    "MSpanSys": 98304,
    "MCacheInuse": 4800,
    "MCacheSys": 16384,
    "BuckHashSys": 1462456,
    "GCSys": 184891625,
    "OtherSys": 3067735,
    "NextGC": 5373197152,
    "LastGC": 1440511844804422100,
    "PauseTotalNs": 1217760239,
    "PauseNs": [
      288891,
      171261,
      253452,
      256588,
      159510,
      180090,
      431716,
      429565,
      420385,
      422571,
      423969,
      262370,
      422536,
      413530,
      270318,
      444949,
      452848,
      543686,
      18722193,
      10836439,
      8036607,
      856033,
      1039134,
      6204598,
      7405841,
      10484042,
      2165888,
      2208708,
      2341573,
      2232237,
      2047327,
      1928057,
      1390085,
      1608044,
      1613724,
      98236224,
      1624971,
      15629257,
      1626479,
      1767185,
      2179084,
      2893032,
      2193078,
      2252229,
      34353902,
      2071667,
      2180512,
      2491072,
      21727833,
      2560965,
      2537353,
      2578280,
      2397659,
      2494659,
      28775788,
      23844646,
      2425616,
      2681864,
      2497078,
      2502960,
      2384777,
      2339549,
      2509545,
      9410810,
      16971710,
      2626032,
      2611664,
      9486162,
      2750199,
      2526121,
      2477242,
      2005752,
      1898899,
      2054844,
      1952794,
      1885540,
      1968658,
      2124501,
      2512091,
      2691505,
      2590028,
      2643929,
      2773625,
      2640361,
      2643612,
      2816181,
      2664651,
      8911443,
      2077330,
      2018793,
      2158362,
      2091485,
      2139338,
      2008764,
      1832929,
      1635072,
      1760993,
      1792141,
      1631576,
      1797422,
      1809128,
      1806544,
      1806701,
      1637645,
      8627923,
      1549485,
      1709460,
      1775591,
      1690660,
      7694376,
      1809275,
      1735509,
      1824709,
      2252965,
      1794998,
      1557886,
      1609016,
      1749600,
      45204017,
      1771277,
      1490283,
      1700191,
      1658202,
      1591519,
      1463743,
      1615452,
      1476564,
      1633691,
      1661715,
      1709581,
      1734592,
      1486707,
      1483298,
      3847125,
      44080247,
      1472380,
      1554775,
      1696490,
      1610149,
      1805244,
      1694561,
      1606165,
      2252718,
      8124888,
      1821957,
      1909194,
      1935240,
      1861954,
      1970669,
      52786586,
      1753619,
      1377337,
      1458417,
      1303150,
      1447242,
      1401994,
      1270971,
      1446665,
      1284288,
      1256256,
      1268772,
      6694757,
      1490182,
      1681862,
      1542291,
      1575682,
      1675663,
      1835794,
      1894608,
      1840730,
      10866699,
      43830201,
      157875996,
      2363329,
      37473967,
      65091815,
      2643227,
      3417279,
      119590990,
      15245223,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0
    ],
    "PauseEnd": [
      21144069386626,
      21144069774726,
      21144070492080,
      21144071352084,
      21144072239120,
      21144073003663,
      21162862644540,
      21205235121713,
      21249717045480,
      21294956413533,
      21338645772781,
      21382512426333,
      21410551170000,
      21438819408680,
      21466792887497,
      21482717664062,
      21510011514387,
      21511252724284,
      21511552844827,
      21515790613483,
      21599449793565,
      21719801964449,
      21735212067477,
      21737339022889,
      21758872228561,
      21760403627855,
      21882133449185,
      21916356799322,
      22027783330723,
      22149841059644,
      22173587768457,
      22237702831331,
      22257034186634,
      22268744884216,
      22271933432530,
      22275227547786,
      22279150052984,
      22281368731241,
      22402647917595,
      22481455276949,
      22521018273826,
      22525980129220,
      22648046416352,
      22664590374620,
      22667967362046,
      22730657215717,
      22732180982804,
      22741171416359,
      22749000373057,
      22870006020393,
      22900150388371,
      22904918446158,
      22910373097683,
      22911687705405,
      22913953480185,
      22920634865574,
      22955388607314,
      23075392715669,
      23075524656998,
      23078546845863,
      23079841299715,
      23081918088579,
      23082924670851,
      23083499419745,
      23203609053068,
      23207834983073,
      23213279633062,
      23215602388104,
      23336359227278,
      23433128479823,
      23463944284623,
      23482166142349,
      23509055081235,
      23608394217279,
      23728087387963,
      23733199641643,
      23752045485846,
      23785537942114,
      23806685402734,
      23834246019615,
      23955076101922,
      24040606313748,
      24041231070569,
      24056609660241,
      24059844385711,
      24070194061779,
      24072706892990,
      24075233451884,
      24196391364621,
      24318363404405,
      24340211705606,
      24378183687038,
      24406404119928,
      24528356163105,
      24589022239972,
      24593511309655,
      24602614001863,
      24607937670090,
      24616388162527,
      24647448103492,
      24701502781370,
      24810226367352,
      24820126266900,
      24824883203393,
      24850407938090,
      24965114901308,
      24971460172457,
      25054448032047,
      25063530049610,
      25093383933560,
      25163345867002,
      25172400270101,
      25176015313993,
      25260462205366,
      25276206977660,
      25290654066711,
      25297467707883,
      25319031396887,
      25332207406320,
      25452457153298,
      25459103051376,
      25479081135408,
      25482232503603,
      25504499026608,
      25525940595854,
      25535839047543,
      25537172310383,
      25565489563002,
      25567067232990,
      25596686208720,
      25611504995769,
      25726917220724,
      25733055556672,
      25747370975091,
      25762371350900,
      25842861874425,
      25962925177918,
      25963243910083,
      25972433263715,
      25990030396687,
      26011810556728,
      26013303301605,
      26030834139587,
      26033653702910,
      26107118252921,
      26108413487574,
      26120282806414,
      26121615721850,
      26140000775826,
      26143214686915,
      26265489897336,
      26272591663738,
      26299361834445,
      26378047861962,
      26423911362980,
      26431054852212,
      26453582341520,
      26462786200726,
      26477779511895,
      26493482080562,
      26536146486618,
      26602574372671,
      26722725973243,
      26726601016189,
      26732206472248,
      26746858829488,
      26765142880409,
      26770047906289,
      26783368236800,
      26871648944545,
      26970657625177,
      27005698334121,
      27041656973885,
      27100624053379,
      27220670629375,
      27340735767601,
      27460771257807,
      27580914861042,
      27701042226581,
      27821228489194,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0
    ],
    "NumGC": 180,
    "EnableGC": true,
    "DebugGC": false,
    "BySize": [
      {
        "Size": 0,
        "Mallocs": 0,
        "Frees": 0
      },
      {
        "Size": 8,
        "Mallocs": 13475,
        "Frees": 12919
      },
      {
        "Size": 16,
        "Mallocs": 1042849,
        "Frees": 996573
      },
      {
        "Size": 32,
        "Mallocs": 164941,
        "Frees": 158633
      },
      {
        "Size": 48,
        "Mallocs": 124074,
        "Frees": 119478
      },
      {
        "Size": 64,
        "Mallocs": 111228,
        "Frees": 106742
      },
      {
        "Size": 80,
        "Mallocs": 18732,
        "Frees": 17839
      },
      {
        "Size": 96,
        "Mallocs": 22046,
        "Frees": 20910
      },
      {
        "Size": 112,
        "Mallocs": 32586,
        "Frees": 31426
      },
      {
        "Size": 128,
        "Mallocs": 1169,
        "Frees": 1073
      },
      {
        "Size": 144,
        "Mallocs": 27672,
        "Frees": 26940
      },
      {
        "Size": 160,
        "Mallocs": 17422,
        "Frees": 16851
      },
      {
        "Size": 176,
        "Mallocs": 2879,
        "Frees": 2844
      },
      {
        "Size": 192,
        "Mallocs": 9,
        "Frees": 8
      },
      {
        "Size": 208,
        "Mallocs": 14826,
        "Frees": 14359
      },
      {
        "Size": 224,
        "Mallocs": 8,
        "Frees": 8
      },
      {
        "Size": 240,
        "Mallocs": 7,
        "Frees": 2
      },
      {
        "Size": 256,
        "Mallocs": 596,
        "Frees": 572
      },
      {
        "Size": 288,
        "Mallocs": 18602,
        "Frees": 17677
      },
      {
        "Size": 320,
        "Mallocs": 26,
        "Frees": 23
      },
      {
        "Size": 352,
        "Mallocs": 17658,
        "Frees": 17306
      },
      {
        "Size": 384,
        "Mallocs": 5,
        "Frees": 1
      },
      {
        "Size": 416,
        "Mallocs": 15,
        "Frees": 8
      },
      {
        "Size": 448,
        "Mallocs": 10,
        "Frees": 10
      },
      {
        "Size": 480,
        "Mallocs": 1,
        "Frees": 0
      },
      {
        "Size": 512,
        "Mallocs": 38,
        "Frees": 37
      },
      {
        "Size": 576,
        "Mallocs": 2875,
        "Frees": 2840
      },
      {
        "Size": 640,
        "Mallocs": 31,
        "Frees": 14
      },
      {
        "Size": 704,
        "Mallocs": 1223,
        "Frees": 1211
      },
      {
        "Size": 768,
        "Mallocs": 0,
        "Frees": 0
      },
      {
        "Size": 896,
        "Mallocs": 37,
        "Frees": 13
      },
      {
        "Size": 1024,
        "Mallocs": 23,
        "Frees": 18
      },
      {
        "Size": 1152,
        "Mallocs": 2873,
        "Frees": 2842
      },
      {
        "Size": 1280,
        "Mallocs": 15,
        "Frees": 13
      },
      {
        "Size": 1408,
        "Mallocs": 0,
        "Frees": 0
      },
      {
        "Size": 1536,
        "Mallocs": 4,
        "Frees": 0
      },
      {
        "Size": 1664,
        "Mallocs": 7,
        "Frees": 3
      },
      {
        "Size": 2048,
        "Mallocs": 560,
        "Frees": 497
      },
      {
        "Size": 2304,
        "Mallocs": 2875,
        "Frees": 2837
      },
      {
        "Size": 2560,
        "Mallocs": 13,
        "Frees": 3
      },
      {
        "Size": 2816,
        "Mallocs": 0,
        "Frees": 0
      },
      {
        "Size": 3072,
        "Mallocs": 0,
        "Frees": 0
      },
      {
        "Size": 3328,
        "Mallocs": 3,
        "Frees": 1
      },
      {
        "Size": 4096,
        "Mallocs": 1864,
        "Frees": 1476
      },
      {
        "Size": 4608,
        "Mallocs": 2868,
        "Frees": 2837
      },
      {
        "Size": 5376,
        "Mallocs": 4,
        "Frees": 0
      },
      {
        "Size": 6144,
        "Mallocs": 2,
        "Frees": 0
      },
      {
        "Size": 6400,
        "Mallocs": 0,
        "Frees": 0
      },
      {
        "Size": 6656,
        "Mallocs": 1,
        "Frees": 0
      },
      {
        "Size": 6912,
        "Mallocs": 0,
        "Frees": 0
      },
      {
        "Size": 8192,
        "Mallocs": 44,
        "Frees": 38
      },
      {
        "Size": 8448,
        "Mallocs": 0,
        "Frees": 0
      },
      {
        "Size": 8704,
        "Mallocs": 0,
        "Frees": 0
      },
      {
        "Size": 9472,
        "Mallocs": 2765,
        "Frees": 2734
      },
      {
        "Size": 10496,
        "Mallocs": 0,
        "Frees": 0
      },
      {
        "Size": 12288,
        "Mallocs": 0,
        "Frees": 0
      },
      {
        "Size": 13568,
        "Mallocs": 0,
        "Frees": 0
      },
      {
        "Size": 14080,
        "Mallocs": 0,
        "Frees": 0
      },
      {
        "Size": 16384,
        "Mallocs": 0,
        "Frees": 0
      },
      {
        "Size": 16640,
        "Mallocs": 0,
        "Frees": 0
      },
      {
        "Size": 17664,
        "Mallocs": 0,
        "Frees": 0
      }
    ]
  }
}
Actions #43

Updated by Tom Clegg over 8 years ago

Made a python program that starts 60 threads and hammers away at a Keep server, testing on GCE.
  • half GET, half PUT, plus a couple of status and index thrown in → max-buffers reached, load avg ~8, but no problem: this ran all night.
  • nearly all PUT → no problem.
  • vary the data in each PUT request so actual filesystem writes and disk IO are needed → write to disk at ~120 MiB/s for a short while (~1 minute?), then crash/reboot.

(I started up a separate keepstore proc for the last experiment, so I could divert the useless data to a separate dir on the storage disks. Should confirm the previous experiments still don't crash with that setup.)

<1s before crash:

top - 13:17:02 up 1 day, 12:38,  6 users,  load average: 5.00, 2.40, 4.26
Tasks: 136 total,   3 running, 133 sleeping,   0 stopped,   0 zombie
%Cpu(s): 25.3 us, 41.9 sy,  0.0 ni,  0.0 id, 29.3 wa,  0.0 hi,  3.5 si,  0.0 st
KiB Mem:   7679608 total,  7358572 used,   321036 free,     5720 buffers
KiB Swap:  5242876 total,    61152 used,  5181724 free,  3318740 cached

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND                                                                                                                                               
 7025 tom       20   0 2068m 1.4g 6584 S  48.7 19.7   1:03.61 python
 7001 root      20   0 3125m 2.0g 5384 S  38.8 27.9   0:44.51 keepstore
 6780 root      20   0     0    0    0 D  22.9  0.0   0:20.47 kworker/1:2
 2749 root      20   0     0    0    0 R  13.9  0.0   0:09.84 kworker/0:1                                                                                                                                           
 6750 root      20   0     0    0    0 R   9.0  0.0   0:18.89 kworker/0:2                                                                                                                                           
25652 root      20   0     0    0    0 S   7.0  0.0   0:01.55 kworker/u4:1

(top looked like this for a while before crash.)

<10s before crash:

PRC | sys    8.37s  | user   4.34s  |               |               | #proc    136  | #trun      9  | #tslpi   233 |  #tslpu     1 |  #zombie    0 |               |  clones    60 |               |  #exit     38 |
CPU | sys      77%  | user     41%  | irq       6%  |               |               | idle      3%  | wait     74% |               |  steal     0% |               |  guest     0% |  curf 2.60GHz |  curscal   ?% |
cpu | sys      41%  | user     17%  | irq       5%  |               |               | idle      2%  | cpu000 w 35% |               |  steal     0% |               |  guest     0% |  curf 2.60GHz |  curscal   ?% |
cpu | sys      36%  | user     23%  | irq       0%  |               |               | idle      1%  | cpu001 w 39% |               |  steal     0% |               |  guest     0% |  curf 2.60GHz |  curscal   ?% |
CPL | avg1    4.99  |               | avg5    2.35  | avg15   4.26  |               |               |              |  csw    84894 |               |  intr   27756 |               |               |  numcpu     2 |
MEM | tot     7.3G  | free  419.5M  |               | cache   3.1G  | dirty 246.9M  | buff    5.6M  |              |  slab  211.1M |               |               |               |               |               |
SWP | tot     5.0G  | free    4.9G  |               |               |               |               |              |               |               |               |               |  vmcom   8.1G |  vmlim   8.7G |
PAG | scan  268660  |               |               | stall      0  |               |               |              |               |               |  swin       0 |               |               |  swout      0 |
LVM | zzzzz-keep-0  | busy    103%  |               | read       1  | write  10434  | KiB/r      4  |              |  KiB/w    126 |  MBr/s   0.00 |  MBw/s 128.71 |               |  avq  2789.18 |  avio 0.96 ms |
DSK |          sdb  | busy    102%  |               | read       1  | write   9727  | KiB/r      4  |              |  KiB/w    126 |  MBr/s   0.00 |  MBw/s 119.93 |               |  avq   126.12 |  avio 1.02 ms |
DSK |          sda  | busy     44%  |               | read       8  | write    109  | KiB/r      4  |              |  KiB/w      6 |  MBr/s   0.00 |  MBw/s   0.07 |               |  avq     3.64 |  avio 36.4 ms |
NET | transport     | tcpi   29275  | tcpo   29309  | udpi       2  | udpo       2  | tcpao    158  |              |  tcppo    157 |  tcprs      4 |  tcpie      0 |  tcpor      5 |  udpnp      0 |  udpip      0 |
NET | network       | ipi    29277  |               | ipo    29278  | ipfrw      0  | deliv  29277  |              |               |               |               |               |  icmpi      0 |  icmpo      0 |
NET | lo      ----  | pcki   29165  | pcko   29165  |               | si  965 Mbps  | so  965 Mbps  | coll       0 |  mlti       0 |  erri       0 |               |  erro       0 |  drpi       0 |  drpo       0 |
NET | eth0    ----  | pcki     112  | pcko     113  |               | si   13 Kbps  | so   75 Kbps  | coll       0 |  mlti       0 |  erri       0 |               |  erro       0 |  drpi       0 |  drpo       0 |
Actions #44

Updated by Tom Clegg over 8 years ago

Many similar attempts have stayed alive happily for hours, then another one died within seconds of startup. It's not clear whether the test program is actually causing the crashes.

Actions #45

Updated by Tom Clegg over 8 years ago

Things that don't crash the expendable test server:
  • 96 python threads across 3 machines, reading/writing at once
  • Turn off http keepalive by adding "Connection: close" header → 15K entries in netstat
  • max-buffers=100 → vsize 12G, keepstore gets OOM-killed and when rsize is ~7G, runit restarts it
  • Lots of repetitive PUTs (low disk activity)
  • Lots of non-repetitive PUTs (high disk activity)
  • ~Equal mix of GET and PUT
  • index and status requests thrown in
  • lots of IO (~2TB disk consumed during test trials over the last few days)

But something did crash keep14 while I was running tests, so the problem (or at least some problem) is still present.

Actions #47

Updated by Tom Clegg about 5 years ago

  • Status changed from In Progress to Resolved
  • Target version deleted (Arvados Future Sprints)
Actions

Also available in: Atom PDF