Bug #11561
closed[API] Limit number of lock/unlock cycles for a given container
Description
Currently, if a container cannot be started due to some infrastructure problem (whether or not it's related to the specific container) it will be retried repeatedly forever.
Proposed solution:
Add a site config knob (analogous to num_retries) that limits the number of times a container can be unlocked (moved from Locked to Queued state) before being automatically cancelled.
Add:- Config key max_container_dispatch_attempts (default 5)
- DB column "lock_count" (do not include in API response)
- Increment lock_count during lock()
- When unlocking a container, if lock_count >= Rails.configuration.max_container_dispatch_attempts, change state to Cancelled instead of Queued (the unlock API should still respond 200 in this case) and update runtime_status[error] with an error message.
Write tests and update documentation.
Related issues
Updated by Tom Morris about 7 years ago
- Target version set to Arvados Future Sprints
Updated by Peter Amstutz almost 6 years ago
- Related to Bug #9688: [Crunch2] Limit number of dispatch attempts per container added
Updated by Peter Amstutz almost 6 years ago
- Target version changed from Arvados Future Sprints to To Be Groomed
Updated by Tom Morris almost 6 years ago
This is a near duplicate of #9688. We should probably just merge the two.
Updated by Tom Morris almost 6 years ago
- Related to Bug #14540: [API] Limit number of container lock/unlock cycles added
Updated by Peter Amstutz almost 6 years ago
- Related to deleted (Bug #14540: [API] Limit number of container lock/unlock cycles)
Updated by Peter Amstutz almost 6 years ago
- Has duplicate Bug #14540: [API] Limit number of container lock/unlock cycles added
Updated by Peter Amstutz almost 6 years ago
- Related to deleted (Bug #9688: [Crunch2] Limit number of dispatch attempts per container)
Updated by Peter Amstutz almost 6 years ago
- Is duplicate of Bug #9688: [Crunch2] Limit number of dispatch attempts per container added
Updated by Peter Amstutz almost 6 years ago
- Status changed from New to Duplicate
Updated by Peter Amstutz almost 6 years ago
- Status changed from Duplicate to New
- Priority changed from Normal to High
Updated by Tom Morris almost 6 years ago
- Target version changed from To Be Groomed to 2019-02-27 Sprint
Updated by Peter Amstutz almost 6 years ago
11561-limit-container-locks @ 0f14b3456d2d3bdf95b78b65a1a41280a7416928
https://ci.curoverse.com/view/Developer/job/developer-run-tests/1074/
Added lock_count + migration
Updated lock/unlock
Added test
Added configuration parameter (and added it to the new cluster config design doc as well)
Updated by Peter Amstutz almost 6 years ago
- Status changed from New to Resolved
Updated by Tom Clegg about 3 years ago
- Related to Bug #18102: max dispatch attempts error added