Bug #5748: [Keep] sometimes, Keep sucks up 100% cpu and becomes really slow after a while - Arvados

Actions

Copy link

Bug #5748

closed

[Keep] sometimes, Keep sucks up 100% cpu and becomes really slow after a while

Added by Ward Vandewege about 10 years ago. Updated almost 10 years ago.

Status:

Resolved

Priority:

Normal

Assigned To:

Tom Clegg

Category:

Keep

Target version:

2015-05-20 sprint

Story points:

Description

We have some keepstore servers that have r/o disks (only). After running for a few hours, the keepstored process becomes bloated (high ram) and pegged at (almost) 100% cpu. It still responds to requests, but very slowly. A slow memory leak?

root     24414 60.4 59.0 2491440 2243984 ?     Sl   14:21 263:13      \_ keepstore -enforce-permissions=false -permission-key-file=/etc/keepstore/permission.key -data-manager-token-file=/etc/keepstore/datamanager.key -listen=:25107 -serialize=true -volumes=/...

All CPU time is %user, not iowait:

# iostat -x
Linux 3.16-0.bpo.2-amd64 (keep5.su92l.arvadosapi.com)     04/15/2015     _x86_64_    (1 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           6.24    0.00    1.13    0.21    0.00   92.42

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     1.30    0.02    2.09     0.54    26.92    26.07     0.04   21.36    8.53   21.49   1.72   0.36
sdb               0.00     0.02    0.06    1.48     3.94   186.27   247.27     0.17  108.47    6.69  112.33   2.07   0.32
sdc               0.00     0.02    0.05    1.49     3.18   186.98   246.86     0.17  109.06    7.23  112.59   2.08   0.32
dm-0              0.00     0.00    0.06    1.50     3.94   186.27   244.24     0.25  157.84    8.05  163.45   2.09   0.32
dm-1              0.00     0.00    0.05    1.51     3.18   186.98   243.76     0.29  184.28    8.66  190.28   2.09   0.33
sdd               0.00     0.03    0.01    2.01     0.05   255.84   253.40     0.07   35.18   14.01   35.27   2.08   0.42
dm-2              0.00     0.00    0.01    2.04     0.05   255.84   249.51     0.26  129.09   29.02  129.51   2.06   0.42

Version:

# dpkg -l |grep keepstore
ii  keepstore                               0.1.20150413210544.0f119f7    amd64        Keepstore is the Keep storage daemon, accessible to clients on the LAN

Subtasks 4 (0 open — 4 closed)

Related issues 2 (1 open — 1 closed)

Actions

Copy link

Updated by Ward Vandewege about 10 years ago

Description updated (diff)

Actions

Copy link

Updated by Ward Vandewege about 10 years ago

Description updated (diff)

Actions

Copy link

Updated by Ward Vandewege almost 10 years ago

Subject changed from [Keep] in a scenario where Keep can't write to its disks, it sucks up 100% cpu and becomes really slow after a while to [Keep] sometimes, Keep sucks up 100% cpu and becomes really slow after a while

Actions

Copy link

Updated by Ward Vandewege almost 10 years ago

This doesn't appear to be related to r/o disks, I have seen it happen on tb05z as well, where all disks can be written.

While it was happening there, datamanager was trying to do its thing, and failing (), and restarting, over and over and over (due to #5834).

Actions

Copy link

Updated by Tom Clegg almost 10 years ago

Target version changed from Bug Triage to 2015-05-20 sprint

Actions

Copy link

Updated by Tom Clegg almost 10 years ago

Category set to Keep
Assigned To set to Tom Clegg
Priority changed from Normal to High

Actions

Copy link

Updated by Tom Clegg almost 10 years ago

Status changed from New to In Progress

Actions

Copy link

Updated by Tom Clegg almost 10 years ago

a memory use bug¶

Currently, keepstore will allocate as many buffers as it has clients. There is no limit on the number of concurrent clients.

During a PUT request, the buffer grows (up to 64 mebibytes) while receiving data from the client, and of course has to last long enough to write the block to disk. When -serialize is enabled, this interval includes the "wait for serialize mutex" phase, which can last many seconds.
During a successful GET request, the buffer grows (up to 64 mebibytes) while reading data from disk and lasts long enough to verify the checksum and send the data to the client. Sending the data over the network can take many seconds (e.g., network congestion, slow client, slow network on the client side of a keepproxy).

If keepstore gets busy enough to exceed the available disk bandwidth, the PUT requests waiting for the serialize mutex can exhaust available RAM and the kernel will start swapping some buffers out to disk. At this point, each additional request will make all queued requests slower and less likely to finish at all.

Additionally, if a client times out and disconnects while its request is waiting in the queue, keepstore won't notice this: abandoned PUT requests waste memory just by staying in the queue, and (assuming the worst) get swapped in and written to a volume before the dead connection is noticed; meanwhile abandoned GET requests waste disk activity and (albeit for a short time, between reading and the first Write call, which presumably fails) memory.

proposed improvement¶

Rather than allocating them on the fly, use a pool¹ of data buffers. The total number of buffers is configured with a command line flag. Whenever a handler needs a non-trivial buffer to store a data block -- for either PUT or GET -- it gets one from the pool. If the pool is empty, the request waits in a queue to get one.

This should be a reasonable way to limit keepstore's memory without unduly restricting operations that don't need much of it (like index, status, and GET 404).

¹ probably based on http://golang.org/pkg/sync/#Pool -- maybe WaitingPool, wrapping Pool with Get() and Put() methods that use a sync.Cond to ensure size never goes above MaxSize.

Actions

Copy link

Updated by Brett Smith almost 10 years ago

Reviewing 4d53c7e. This is good, just a couple cleanliness things.

There are a couple of references to old Index name still lurking around: in the last sentence of the IndexTo doc comment, and in gotCall() in the MockVolume implementation.
In IndexTo, the comment "Print filenames beginning with prefix" seems misplaced right above a condition that skips this loop. Would it improve readability to merge that condition into the previous one? Or if not, move the comment down above the Fprintf?

Assuming those are all patched straightforwardly, I'm happy to see this merged. Thanks.

Actions

Copy link

#10

Updated by Peter Amstutz almost 10 years ago

Reviewing 5748-max-buffers 978afd4

The default maxBuffers = 128 seems excessive (that's 8 GiB). With serialize=true, it should probably be a small multiple of the number of volumes (between 2 and 4 buffers per volume) since we can't do more concurrent work than we have volumes, and the goal here is to prevent keepstore from over committing on the number of requests it can handle at once (since it Gets the block from the buffer pool before taking the volume serialize lock).

Suggest moving bufferpool into the Go SDK, it might be useful for other services such as keepproxy.

Actions

Copy link

#11

Updated by Tom Clegg almost 10 years ago

Peter Amstutz wrote:

Reviewing 5748-max-buffers 978afd4

The default maxBuffers = 128 seems excessive (that's 8 GiB). With serialize=true, it should probably be a small multiple of the number of volumes (between 2 and 4 buffers per volume) since we can't do more concurrent work than we have volumes, and the goal here is to prevent keepstore from over committing on the number of requests it can handle at once (since it Gets the block from the buffer pool before taking the volume serialize lock).

Not exactly. The goal here is to limit memory use, and memory (unlike the serialize lock) stays occupied longer when clients send/receive slowly. If you have 8 GiB of RAM, and you are waiting for 100 clients to receive the response data for their GET requests, you've got 6.4GiB of block data stored in buffers but the serialize locks are all free. Similarly, while you are waiting to receive PUT body content from 100 clients, you need buffers but you don't need any serialize locks yet. Essentially the minimum buffers:spindles ratio corresponds to the diskspeed:networkspeed ratio, and (especially since keepproxy pipelines reads instead of buffering them) that ratio can easily be much higher than 2 or 4. (If we could assume/detect that the network bottleneck is the keepstore node itself, it would be reasonable to stall new clients to help older clients finish, but we can't/don't.)

I thought "total system RAM ÷ block size" would be a good default, and a way to specify "50%" like sort(1) (perhaps based on https://godoc.org/github.com/cloudfoundry/gosigar#Mem.Get) would probably be handy, but rather than overthink/predict this I figured we should start with a simple config knob and a probably-not-too-low default, and get operational feedback.

I figured it would be better to err on the "too big" side because the sysadmin story "keepstore is using too much ram" seems easier to understand than "clients are starved but keepstore looks idle". (Hm, perhaps the "waiting..." message should recommend increasing -max-buffers?)

Suggest moving bufferpool into the Go SDK, it might be useful for other services such as keepproxy.

True. I thought about this too, but figured we can do those things together (move it to SDK and adopt it in keepproxy) / we might confirm how well it works before we arrange too much more code around it...

Actions

Copy link

#12

Updated by Peter Amstutz almost 10 years ago

Tom Clegg wrote:

I figured it would be better to err on the "too big" side because the sysadmin story "keepstore is using too much ram" seems easier to understand than "clients are starved but keepstore looks idle". (Hm, perhaps the "waiting..." message should recommend increasing -max-buffers?)

That's true, it's easy enough to configure the usage, the important thing will be to communicate to our admins that setting -max-buffers is not really optional, if it's not going to be smart about adjusting its default buffers to the RAM size of the system.

Suggest moving bufferpool into the Go SDK, it might be useful for other services such as keepproxy.

True. I thought about this too, but figured we can do those things together (move it to SDK and adopt it in keepproxy) / we might confirm how well it works before we arrange too much more code around it...

Ok, we should file a separate story for that, I think there have been some issues with memory usage of Keepproxy as well.

Actions

Copy link

#13

Updated by Tom Clegg almost 10 years ago

Peter Amstutz wrote:

That's true, it's easy enough to configure the usage, the important thing will be to communicate to our admins that setting -max-buffers is not really optional, if it's not going to be smart about adjusting its default buffers to the RAM size of the system.

True. Added a note to http://doc.arvados.org/install/install-keepstore.html. (Probably a good idea to point this out even if/when we do default to total system RAM, in case the node is expected to have memory available for anything else.)

Ok, we should file a separate story for that, I think there have been some issues with memory usage of Keepproxy as well.

Added #5962.

Actions

Copy link

#14

Updated by Tom Clegg almost 10 years ago

Status changed from In Progress to Feedback

Actions

Copy link

#15

Updated by Peter Amstutz almost 10 years ago

Status changed from Feedback to In Progress

aside from above comments LGTM, please merge

Actions

Copy link

#16

Updated by Tom Clegg almost 10 years ago

Status changed from In Progress to Resolved

Actions

Copy link

Also available in: Atom PDF

	Blocks Arvados - Bug #5662: [FUSE] cd/ls sometimes takes too long on su92l	Closed		04/03/2015			Actions
	Blocks Arvados - Bug #5713: [FUSE] File access sometimes takes too long on su92l	New	Tom Morris				Actions

Project

General

Profile

Arvados

Custom queries

Watchers (1)

Bug #5748

[Keep] sometimes, Keep sucks up 100% cpu and becomes really slow after a while

Updated by Ward Vandewege about 10 years ago

Updated by Ward Vandewege about 10 years ago

Updated by Ward Vandewege almost 10 years ago

Updated by Ward Vandewege almost 10 years ago

Updated by Tom Clegg almost 10 years ago

Updated by Tom Clegg almost 10 years ago

Updated by Tom Clegg almost 10 years ago

Updated by Tom Clegg almost 10 years ago

a memory use bug¶

proposed improvement¶

Updated by Brett Smith almost 10 years ago

Updated by Peter Amstutz almost 10 years ago

Updated by Tom Clegg almost 10 years ago

Updated by Peter Amstutz almost 10 years ago

Updated by Tom Clegg almost 10 years ago

Updated by Tom Clegg almost 10 years ago

Updated by Peter Amstutz almost 10 years ago

Updated by Tom Clegg almost 10 years ago

Task #5933: Configurable limit on block buffers allocated at a given time	Resolved	Tom Clegg	05/06/2015	Actions
Task #5925: Review 5748-keepstore-leak	Resolved	Brett Smith	05/06/2015	Actions
Task #5950: review 5748-max-buffers	Resolved	Peter Amstutz	05/06/2015	Actions
Task #6031: Review 5748-max-buffers-leak	Resolved		05/06/2015	Actions