Project

General

Profile

Bug #19981

Updated by Brett Smith about 1 year ago

*The Bug* 

 beagle.cwl has the resource requirement 
 <pre> 
   ResourceRequirement: 
     coresMin: 2 
     ramMin: 10000 
 </pre> 

 A new run: 
 https://workbench.2xpu4.arvadosapi.com/container_requests/2xpu4-xvhdp-ph1xry8mxbsol3j 

 An old run: 
 https://workbench.2xpu4.arvadosapi.com/container_requests/2xpu4-xvhdp-p571e0xq4g85ac7 

 The resource requirement didn't change, neither was keep_cache requirement specified. The recent run didn't reuse old run, since there is the following difference. 

 new runtime_constraints: 
 <pre> 
 keep_cache_disk 	 10485760000 
 keep_cache_ram 	 0 
 ram 	 10485760000 
 vcpus 	 2 
 </pre> 

 new node type: 
 <pre> 
 "ProviderType": "m5.8xlarge", 
 "VCPUs": 32, 
 "RAM": 137438953472, 
 "IncludedScratch": 4000000000, 
 "AddedScratch": 100000000000, 
 "Price": 1.542, 
 </pre> 

 old runtime_constraints: 
 <pre> 
 keep_cache_disk 	 0 
 keep_cache_ram 	 268435456 
 ram 	 10485760000 
 vcpus 	 2 
 </pre> 

 old node type: 
 <pre> 
 "ProviderType": "m5.xlarge", 
 "VCPUs": 4, 
 "RAM": 17179869184, 
 "IncludedScratch": 4000000000, 
 "AddedScratch": 0, 
 "Price": 0.192, 
 </pre> 

 *The Fix* 

 This happened because we changed the @DefaultKeepCacheRAM@ setting on the cluster, We want to start using disk cache instead of memory. As a consequence, @Container.find_reusable@ can no longer find containers that used fix this by unconditionally excluding the old default, because it searches for matching @runtime_constraints@ with a hash match, @keep_cache_ram@ and it doesn't know what the old value of @DefaultKeepCacheRAM@ was to search for. 

 Ideally we would like to exclude the Keep cache @keep_cache_disk@ runtime constraints from reuse entirely but in order considerations. A container's output really shouldn't be affected by these variables, similar to other inputs we've decided not to track by default like clock time. 

 Another reason to do that this is avoid repeating this problem if we need some change to the way we store @runtime_constraints@ defaults again in the database—right now it's future, like just plain text. Ideas that have been suggested: tweaking the numbers for default @keep_cache_disk@. 

 * Convert the column The container reuse documentation should be updated to @jsonb@ and do richer queries on it (Brett in note-14) 
 * Add a column @reusable_runtime_constraints@ that's limited to recording the constraints that affect reuse (Tom in note-15) 

 Agree on one and implement it. reflect this change.

Back