Project

General

Profile

Actions

Bug #19973

closed

Dispatcher responds to 503 errors by limiting container concurrency

Added by Tom Clegg almost 2 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
Story points:
1.0
Release relationship:
Auto

Description

When controller/RailsAPI returns 503, it is quite likely the dispatcher itself is causing an overload condition, and retrying is counterproductive -- especially when the operation being retried is something like "lock", which, if it succeeds, will be followed by a lot more load on controller/RailsAPI from the ensuing instances / crunch-run processes.

Proposed mitigation:
  • when dispatcher sees a 503 response from controller/RailsAPI, it reduces by 1/2 or 3/4 its internal upper limit on how many concurrent containers it should try to run
  • when dispatcher hasn't seen a 503 response for N seconds, it increases its upper limit by 1, up to a configured maximum

Subtasks 1 (0 open1 closed)

Task #20114: Review 19973-dispatch-throttleResolvedTom Clegg02/16/2023Actions

Related issues 2 (0 open2 closed)

Related to Arvados - Feature #19972: Go arvados.Client retry with backoffResolvedTom Clegg03/08/2023Actions
Related to Arvados - Feature #19984: Go arvados.Client responds to 503 errors by limiting outgoing connection concurrencyResolvedTom Clegg02/21/2023Actions
Actions

Also available in: Atom PDF