Project

General

Profile

Actions

Bug #13100

closed

[crunch-run] Replace custom manifest-writing code with collectionFS

Added by Joshua Randall about 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
2.0
Release:
Release relationship:
Auto

Description

After a lot of debugging trying to figure out why every job was being killed for exceeding memory limits seemingly regardless of the limit specified, I finally caught crunch-run in the act of consuming a truly massive amount of RAM towards the end of a job (in the end I just set container requirement for nearly all the RAM on the node, so I could see what was going on).

Mid run, crunch-run was using ~200MB, but then all of a sudden it started to allocate more memory at a rate of 850MB/s until 30s later when it had consumed 25.5GB, at which point it levelled off and held steady for 230s after which the job finished successfully. It had the full 25.5GB allocated until it exited (or within a second of when it exited).

Looking at the container logs shows that the point in time when it started to allocate lots of RAM corresponds with the end of the container, beginning at the "Container exited with code: 0" line and proceeding to upload the output, which in this case was specified (in CWL) as a Directory.

The output collection (5ca01264c4721b24c9d36320a00027ce+328812) contains 4005 files totalling 8.9GiB - so crunch-run allocated enough memory to cache the full output collection in memory 2.8x over, which seems somewhat excessive.


Subtasks 2 (0 open2 closed)

Task #13233: Review 13100-crunch-run-memoryResolvedPeter Amstutz03/15/2018Actions
Task #13294: Review 13100-crunch-run-outputResolvedPeter Amstutz03/15/2018Actions

Related issues

Related to Arvados - Bug #11583: [crunch-run] Fix excessive memory useResolvedActions
Related to Arvados - Idea #13048: Refactor crunch2 loggingNewActions
Related to Arvados - Bug #12606: Symlink in output points to invalid location -- no such file or directoryResolvedPeter Amstutz11/17/2017Actions
Actions

Also available in: Atom PDF