Containers that used an old DefaultKeepCacheRAM no longer get reused after a configuration change
beagle.cwl has the resource requirement
ResourceRequirement: coresMin: 2 ramMin: 10000
A new run:
An old run:
The resource requirement didn't change, neither was keep_cache requirement specified. The recent run didn't reuse old run, since there is the following difference.
keep_cache_disk 10485760000 keep_cache_ram 0 ram 10485760000 vcpus 2
new node type:
"ProviderType": "m5.8xlarge", "VCPUs": 32, "RAM": 137438953472, "IncludedScratch": 4000000000, "AddedScratch": 100000000000, "Price": 1.542,
keep_cache_disk 0 keep_cache_ram 268435456 ram 10485760000 vcpus 2
old node type:
"ProviderType": "m5.xlarge", "VCPUs": 4, "RAM": 17179869184, "IncludedScratch": 4000000000, "AddedScratch": 0, "Price": 0.192,
This happened because we changed the
DefaultKeepCacheRAM setting on the cluster, to start using disk cache instead of memory. As a consequence,
Container.find_reusable can no longer find containers that used the old default, because it searches for matching
runtime_constraints with a hash match, and it doesn't know what the old value of
DefaultKeepCacheRAM was to search for.
Ideally we would like to exclude the Keep cache constraints from reuse entirely but in order to do that we need some change to the way we store
runtime_constraints in the database—right now it's just plain text. Ideas that have been suggested:
- Convert the column to
jsonband do richer queries on it (Brett in note-14)
- Add a column
reusable_runtime_constraintsthat's limited to recording the constraints that affect reuse (Tom in note-15)
Agree on one and implement it.