Project

General

Profile

Actions

Bug #19563

open

crunch-run upload step possibly buffers too much

Added by Peter Amstutz 7 days ago. Updated about 10 hours ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
Crunch
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Story points:
-

Description

We have crunch-run processes that are getting OOM killed (and restarted) in the upload phase.

Crunch-run is uploading very large files (30+ GB) and running on very small nodes (t3.small) which have 1 core, 2 GB RAM, and throttled network bandwidth. The hoststat numbers show a much greater amount of data being received than transmitted.

The suspicion is that the crunch-run process is buffering data in RAM, which is piling up until it gets OOM killed.

a) determine if it is true that the queue of blocks to be uploaded is uncapped

b) if so, make it possible to set some cap which ensures there is backpressure that will block the uploader until there is more buffer space. Experimentally, I think we've found optimal upload rates with around 4-6 parallel block uploads.

Actions #1

Updated by Peter Amstutz 7 days ago

  • Description updated (diff)
Actions

Also available in: Atom PDF