Project

General

Profile

Actions

Bug #19563

closed

crunch-run upload step possibly buffers too much

Added by Peter Amstutz over 1 year ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
Story points:
-
Release relationship:
Auto

Description

We have crunch-run processes that are getting OOM killed (and restarted) in the upload phase.

Crunch-run is uploading very large files (30+ GB) and running on very small nodes (t3.small) which have 1 core, 2 GB RAM, and throttled network bandwidth. The hoststat numbers show a much greater amount of data being received than transmitted.

The suspicion is that the crunch-run process is buffering data in RAM, which is piling up until it gets OOM killed.

a) determine if it is true that the queue of blocks to be uploaded is uncapped

b) if so, make it possible to set some cap which ensures there is backpressure that will block the uploader until there is more buffer space. Experimentally, I think we've found optimal upload rates with around 4-6 parallel block uploads.


Subtasks 1 (0 open1 closed)

Task #19577: Review 19563-log-cr-memResolvedPeter Amstutz10/25/2022Actions
Actions

Also available in: Atom PDF