Bug #11561
[API] Limit number of lock/unlock cycles for a given container
100%
Description
Currently, if a container cannot be started due to some infrastructure problem (whether or not it's related to the specific container) it will be retried repeatedly forever.
Proposed solution:
Add a site config knob (analogous to num_retries) that limits the number of times a container can be unlocked (moved from Locked to Queued state) before being automatically cancelled.
Add:- Config key max_container_dispatch_attempts (default 5)
- DB column "lock_count" (do not include in API response)
- Increment lock_count during lock()
- When unlocking a container, if lock_count >= Rails.configuration.max_container_dispatch_attempts, change state to Cancelled instead of Queued (the unlock API should still respond 200 in this case) and update runtime_status[error] with an error message.
Write tests and update documentation.
Subtasks
Related issues
Associated revisions
History
#1
Updated by Tom Clegg over 3 years ago
- Description updated (diff)
#2
Updated by Tom Morris over 3 years ago
- Target version set to Arvados Future Sprints
#3
Updated by Peter Amstutz about 2 years ago
- Related to Bug #9688: [Crunch2] Limit number of dispatch attempts per container added
#4
Updated by Peter Amstutz about 2 years ago
- Target version changed from Arvados Future Sprints to To Be Groomed
#5
Updated by Tom Morris about 2 years ago
This is a near duplicate of #9688. We should probably just merge the two.
#6
Updated by Tom Morris about 2 years ago
- Related to Bug #14540: [API] Limit number of container lock/unlock cycles added
#7
Updated by Peter Amstutz about 2 years ago
- Related to deleted (Bug #14540: [API] Limit number of container lock/unlock cycles)
#8
Updated by Peter Amstutz about 2 years ago
- Has duplicate Bug #14540: [API] Limit number of container lock/unlock cycles added
#9
Updated by Peter Amstutz about 2 years ago
- Related to deleted (Bug #9688: [Crunch2] Limit number of dispatch attempts per container)
#10
Updated by Peter Amstutz about 2 years ago
- Is duplicate of Bug #9688: [Crunch2] Limit number of dispatch attempts per container added
#11
Updated by Peter Amstutz about 2 years ago
- Status changed from New to Duplicate
#12
Updated by Peter Amstutz about 2 years ago
- Status changed from Duplicate to New
- Priority changed from Normal to High
#14
Updated by Peter Amstutz about 2 years ago
- Description updated (diff)
#15
Updated by Peter Amstutz about 2 years ago
- Description updated (diff)
#16
Updated by Peter Amstutz about 2 years ago
- Story points set to 2.0
#17
Updated by Tom Morris almost 2 years ago
- Target version changed from To Be Groomed to 2019-02-27 Sprint
#18
Updated by Peter Amstutz almost 2 years ago
- Assigned To set to Peter Amstutz
#19
Updated by Peter Amstutz almost 2 years ago
11561-limit-container-locks @ 0f14b3456d2d3bdf95b78b65a1a41280a7416928
https://ci.curoverse.com/view/Developer/job/developer-run-tests/1074/
Added lock_count + migration
Updated lock/unlock
Added test
Added configuration parameter (and added it to the new cluster config design doc as well)
#20
Updated by Lucas Di Pentima almost 2 years ago
This LGTM, thanks!
#21
Updated by Peter Amstutz almost 2 years ago
- Status changed from New to Resolved
#22
Updated by Tom Morris almost 2 years ago
- Release set to 15
Merge branch '11561-limit-container-locks' refs #11561
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <pamstutz@veritasgenetics.com>