Project

General

Profile

Actions

Bug #19981

open

Containers that used an old DefaultKeepCacheRAM no longer get reused after a configuration change

Added by Jiayong Li about 1 year ago. Updated about 1 year ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
API
Target version:
Story points:
-

Description

The Bug

beagle.cwl has the resource requirement

  ResourceRequirement:
    coresMin: 2
    ramMin: 10000

A new run:
https://workbench.2xpu4.arvadosapi.com/container_requests/2xpu4-xvhdp-ph1xry8mxbsol3j

An old run:
https://workbench.2xpu4.arvadosapi.com/container_requests/2xpu4-xvhdp-p571e0xq4g85ac7

The resource requirement didn't change, neither was keep_cache requirement specified. The recent run didn't reuse old run, since there is the following difference.

new runtime_constraints:

keep_cache_disk    10485760000
keep_cache_ram    0
ram    10485760000
vcpus    2

new node type:

"ProviderType": "m5.8xlarge",
"VCPUs": 32,
"RAM": 137438953472,
"IncludedScratch": 4000000000,
"AddedScratch": 100000000000,
"Price": 1.542,

old runtime_constraints:

keep_cache_disk    0
keep_cache_ram    268435456
ram    10485760000
vcpus    2

old node type:

"ProviderType": "m5.xlarge",
"VCPUs": 4,
"RAM": 17179869184,
"IncludedScratch": 4000000000,
"AddedScratch": 0,
"Price": 0.192,

The Fix

This happened because we changed the DefaultKeepCacheRAM setting on the cluster, to start using disk cache instead of memory. As a consequence, Container.find_reusable can no longer find containers that used the old default, because it searches for matching runtime_constraints with a hash match, and it doesn't know what the old value of DefaultKeepCacheRAM was to search for.

Ideally we would like to exclude the Keep cache constraints from reuse entirely but in order to do that we need some change to the way we store runtime_constraints in the database—right now it's just plain text. Ideas that have been suggested:

  • Convert the column to jsonb and do richer queries on it (Brett in note-14)
  • Add a column reusable_runtime_constraints that's limited to recording the constraints that affect reuse (Tom in note-15)

Agree on one and implement it.


Subtasks 1 (0 open1 closed)

Task #20202: Review 19981-reuse-flex-keep-cacheResolvedBrett Smith03/05/2023Actions

Related issues

Related to Arvados - Feature #18842: Local disk keep cache for Python SDK/arv-mountResolvedPeter Amstutz10/21/2022Actions
Related to Arvados - Bug #19884: keep_cache_disk runtime constraint is undocumentedResolvedBrett SmithActions
Actions

Also available in: Atom PDF