Bug #13093
closedadd configuration option for crunch-dispatch-slurm to add a fixed amount of memory to the slurm mem limit
Description
Currently, crunch-dispatch-slurm sets the memory limit in the sbatch command by adding the RAM runtime constraint and the KeepCacheRAM runtime constraint: https://github.com/curoverse/arvados/blob/master/services/crunch-dispatch-slurm/crunch-dispatch-slurm.go#L209
When running on SLURM with cgroup memory limits enabled, this can result in all containers that actually use close to the memory limit (such as pretty much anything involving the JVM inside the container) failing.
I believe this is because, in addition to the KeepCacheRAM, both crunch-run and arv-mount also use additional memory for their internal data structures such as the keep manifest caches, fuse inode mappings, etc. Ultimately, there should probably be a story to properly sort out how to set the memory limit to something appropriate and then stick within that limit. However, for now it would be helpful to just have a simple config option to add a specified constant amount of memory to every SLURM request as a reservation for what crunch-run and arv-mount use outside of the container.
I plan to implement this and submit a pull request shortly.
Updated by Joshua Randall about 6 years ago
I would propose the configuration option `ReserveExtraRAM` unless there are better suggestions.
I've implemented this taking an int64 already, but perhaps it would be better (and consistent with other options such as those that take Duration strings) to use something such as https://godoc.org/github.com/dustin/go-humanize#ParseBytes to parse a string representation?
In other words, I could say:
ReserveExtraRAM: 256MiB
Instead of:
ReserveExtraRAM: 268435456
Any opinions on this?
Updated by Tom Clegg about 6 years ago
Supporting suffixes in config files seems like a good convenience, but I think that should be considered as a separate topic and it should apply to other ram/disk size configs too (to avoid trying to remember which configs take suffixes and which don't).
So, for now, I think we should just stick with bytes.
Updated by Joshua Randall about 6 years ago
- % Done changed from 0 to 100