Project

General

Profile

Actions

Bug #11561

closed

[API] Limit number of lock/unlock cycles for a given container

Added by Tom Clegg over 7 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Target version:
Story points:
2.0
Release relationship:
Auto

Description

Currently, if a container cannot be started due to some infrastructure problem (whether or not it's related to the specific container) it will be retried repeatedly forever.

Proposed solution:

Add a site config knob (analogous to num_retries) that limits the number of times a container can be unlocked (moved from Locked to Queued state) before being automatically cancelled.

Add:
  • Config key max_container_dispatch_attempts (default 5)
  • DB column "lock_count" (do not include in API response)
  • Increment lock_count during lock()
  • When unlocking a container, if lock_count >= Rails.configuration.max_container_dispatch_attempts, change state to Cancelled instead of Queued (the unlock API should still respond 200 in this case) and update runtime_status[error] with an error message.

Write tests and update documentation.


Subtasks 1 (0 open1 closed)

Task #14837: Review 11561-limit-container-locksResolvedPeter Amstutz04/26/2017Actions

Related issues

Related to Arvados - Bug #11190: Containers seem to run more than once, which isn't supposed to happenResolvedTom Clegg03/01/2017Actions
Related to Arvados - Bug #18102: max dispatch attempts errorResolvedTom Clegg09/07/2021Actions
Has duplicate Arvados - Bug #14540: [API] Limit number of container lock/unlock cyclesDuplicateActions
Is duplicate of Arvados - Bug #9688: [Crunch2] Limit number of dispatch attempts per containerResolved08/02/2016Actions
Actions #1

Updated by Tom Clegg over 7 years ago

  • Description updated (diff)
Actions #2

Updated by Tom Morris about 7 years ago

  • Target version set to Arvados Future Sprints
Actions #3

Updated by Peter Amstutz almost 6 years ago

  • Related to Bug #9688: [Crunch2] Limit number of dispatch attempts per container added
Actions #4

Updated by Peter Amstutz almost 6 years ago

  • Target version changed from Arvados Future Sprints to To Be Groomed
Actions #5

Updated by Tom Morris almost 6 years ago

This is a near duplicate of #9688. We should probably just merge the two.

Actions #6

Updated by Tom Morris almost 6 years ago

  • Related to Bug #14540: [API] Limit number of container lock/unlock cycles added
Actions #7

Updated by Peter Amstutz almost 6 years ago

  • Related to deleted (Bug #14540: [API] Limit number of container lock/unlock cycles)
Actions #8

Updated by Peter Amstutz almost 6 years ago

  • Has duplicate Bug #14540: [API] Limit number of container lock/unlock cycles added
Actions #9

Updated by Peter Amstutz almost 6 years ago

  • Related to deleted (Bug #9688: [Crunch2] Limit number of dispatch attempts per container)
Actions #10

Updated by Peter Amstutz almost 6 years ago

  • Is duplicate of Bug #9688: [Crunch2] Limit number of dispatch attempts per container added
Actions #11

Updated by Peter Amstutz almost 6 years ago

  • Status changed from New to Duplicate
Actions #12

Updated by Peter Amstutz almost 6 years ago

  • Status changed from Duplicate to New
  • Priority changed from Normal to High
Actions #14

Updated by Peter Amstutz almost 6 years ago

  • Description updated (diff)
Actions #15

Updated by Peter Amstutz almost 6 years ago

  • Description updated (diff)
Actions #16

Updated by Peter Amstutz almost 6 years ago

  • Story points set to 2.0
Actions #17

Updated by Tom Morris almost 6 years ago

  • Target version changed from To Be Groomed to 2019-02-27 Sprint
Actions #18

Updated by Peter Amstutz almost 6 years ago

  • Assigned To set to Peter Amstutz
Actions #19

Updated by Peter Amstutz almost 6 years ago

11561-limit-container-locks @ 0f14b3456d2d3bdf95b78b65a1a41280a7416928

https://ci.curoverse.com/view/Developer/job/developer-run-tests/1074/

Added lock_count + migration

Updated lock/unlock

Added test

Added configuration parameter (and added it to the new cluster config design doc as well)

Actions #20

Updated by Lucas Di Pentima almost 6 years ago

This LGTM, thanks!

Actions #21

Updated by Peter Amstutz almost 6 years ago

  • Status changed from New to Resolved
Actions #22

Updated by Tom Morris over 5 years ago

  • Release set to 15
Actions #23

Updated by Tom Clegg about 3 years ago

  • Related to Bug #18102: max dispatch attempts error added
Actions

Also available in: Atom PDF