Bug #19981
openContainers that used an old DefaultKeepCacheRAM no longer get reused after a configuration change
Description
The Bug
beagle.cwl has the resource requirement
ResourceRequirement: coresMin: 2 ramMin: 10000
A new run:
https://workbench.2xpu4.arvadosapi.com/container_requests/2xpu4-xvhdp-ph1xry8mxbsol3j
An old run:
https://workbench.2xpu4.arvadosapi.com/container_requests/2xpu4-xvhdp-p571e0xq4g85ac7
The resource requirement didn't change, neither was keep_cache requirement specified. The recent run didn't reuse old run, since there is the following difference.
new runtime_constraints:
keep_cache_disk 10485760000 keep_cache_ram 0 ram 10485760000 vcpus 2
new node type:
"ProviderType": "m5.8xlarge", "VCPUs": 32, "RAM": 137438953472, "IncludedScratch": 4000000000, "AddedScratch": 100000000000, "Price": 1.542,
old runtime_constraints:
keep_cache_disk 0 keep_cache_ram 268435456 ram 10485760000 vcpus 2
old node type:
"ProviderType": "m5.xlarge", "VCPUs": 4, "RAM": 17179869184, "IncludedScratch": 4000000000, "AddedScratch": 0, "Price": 0.192,
The Fix
This happened because we changed the DefaultKeepCacheRAM
setting on the cluster, to start using disk cache instead of memory. As a consequence, Container.find_reusable
can no longer find containers that used the old default, because it searches for matching runtime_constraints
with a hash match, and it doesn't know what the old value of DefaultKeepCacheRAM
was to search for.
Ideally we would like to exclude the Keep cache constraints from reuse entirely but in order to do that we need some change to the way we store runtime_constraints
in the database—right now it's just plain text. Ideas that have been suggested:
- Convert the column to
jsonb
and do richer queries on it (Brett in note-14) - Add a column
reusable_runtime_constraints
that's limited to recording the constraints that affect reuse (Tom in note-15)
Agree on one and implement it.