Joshua Randall

Issues

Projects

Activity

02/18/2018

10:21 AM Arvados Bug #13102 (New): containers are not reused unless runtime constraints (including RAM) match exactly
The logic for deciding when to reuse an existing container appears to include the full set of runtime_constraints. I ...

02/17/2018

09:25 PM Arvados Bug #13100: crunch-run consumes a MASSIVE amount of ram at the end of a job after the container exits
This may be related to https://dev.arvados.org/issues/11583 which points out an apparent memory leak in crunch-run wh...

02/16/2018

04:52 PM Arvados Bug #13100 (New): crunch-run consumes a MASSIVE amount of ram at the end of a job after the container exits
After a lot of debugging trying to figure out why every job was being killed for exceeding memory limits seemingly re...
04:21 PM Arvados Bug #13099 (New): crunch-dispatch-slurm occasionally logs a lot of "runner is handling updates slowly" debug messages
https://github.com/curoverse/arvados/blob/b51d376ed64efc68f7ee27fd061323da43faabd5/sdk/go/dispatch/dispatch.go#L304
...
02:51 PM Arvados Bug #13093: add configuration option for crunch-dispatch-slurm to add a fixed amount of memory to the slurm mem limit
https://github.com/curoverse/arvados/pull/63
02:50 PM Arvados Bug #13095 (New): when slurm murders a crunch2 job because it exceeds the memory limit, the container is left with a null `log`
If a crunch2 job exceeds its memory limit (with cgroup memory limits enabled), SLURM kills it, but no record of it ha...

02/15/2018

06:55 PM Arvados Bug #13093: add configuration option for crunch-dispatch-slurm to add a fixed amount of memory to the slurm mem limit
I would propose the configuration option `ReserveExtraRAM` unless there are better suggestions.
I've implemented ...
06:20 PM Arvados Bug #13093 (New): add configuration option for crunch-dispatch-slurm to add a fixed amount of memory to the slurm mem limit
Currently, crunch-dispatch-slurm sets the memory limit in the sbatch command by adding the RAM runtime constraint and...

02/13/2018

11:58 AM Arvados Bug #13067: "Uh oh" While uploading output files ... bad address
After tracing this (to `io.Copy()`) It seems likely this is somehow related to memory cgroup limits which have recent...
11:46 AM Arvados Bug #13067 (New): "Uh oh" While uploading output files ... bad address
We have a container that repeatedly fails "while uploading output files" after the container itself has apparently su...

Also available in: Atom