[API] Limit number of lock/unlock cycles for a given container
Currently, if a container cannot be started due to some infrastructure problem (whether or not it's related to the specific container) it will be retried repeatedly forever.
Add a site config knob (analogous to num_retries) that limits the number of times a container can be unlocked (moved from Locked to Queued state) before being automatically cancelled.Add:
- Config key max_container_dispatch_attempts (default 5)
- DB column "lock_count" (do not include in API response)
- Increment lock_count during lock()
- When unlocking a container, if lock_count >= Rails.configuration.max_container_dispatch_attempts, change state to Cancelled instead of Queued (the unlock API should still respond 200 in this case) and update runtime_status[error] with an error message.
Write tests and update documentation.
#19 Updated by Peter Amstutz 2 months ago
11561-limit-container-locks @ 0f14b3456d2d3bdf95b78b65a1a41280a7416928
Added lock_count + migration
Added configuration parameter (and added it to the new cluster config design doc as well)