Project

General

Profile

Actions

Bug #20432

closed

Improve CWL runner handling 503 errors

Added by Peter Amstutz about 1 year ago. Updated 9 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
CWL
Story points:
-
Release relationship:
Auto

Description

  1. recoverable errors, like getting container states, is not an error, and should only be logged as a warning
  2. requesting /arvados/v1/config isn't retried
  3. requesting /discovery/v1/apis/arvados/v1/rest isn't retried
    1. _thread._local object has no attribute 'api' -- throwing and handling AttributeError is intentional but maybe getattr would be better, in any event it is the API object construction that is ultimately failing
  4. requesting /arvados/v1/containers/current
  5. requesting /arvados/v1/users/current
  6. everything should use 8-10 retries
  7. FUSE command.py also calls users.current without retries -- have seen a few instances of FUSE failing to start due to 503 errors on fetching discovery doc or other endpoints required for startup

Related issues

Related to Arvados - Bug #12684: Let user specify a retry strategy on the client object, used for all API callsResolvedBrett Smith05/09/2023Actions
Actions #1

Updated by Peter Amstutz about 1 year ago

  • Status changed from New to In Progress
Actions #2

Updated by Peter Amstutz about 1 year ago

  • Description updated (diff)
Actions #3

Updated by Peter Amstutz about 1 year ago

  • Subject changed from CWL runner error handling to Improve CWL runner handling 503 errors
Actions #4

Updated by Peter Amstutz about 1 year ago

  • Description updated (diff)
Actions #6

Updated by Peter Amstutz about 1 year ago

  • Release set to 63
  • Status changed from In Progress to New
Actions #7

Updated by Peter Amstutz about 1 year ago

  • Target version changed from Future to Development 2023-05-10 sprint
Actions #8

Updated by Brett Smith about 1 year ago

  • Related to Bug #12684: Let user specify a retry strategy on the client object, used for all API calls added
Actions #10

Updated by Peter Amstutz about 1 year ago

  • Target version changed from Development 2023-05-10 sprint to Development 2023-05-24 sprint
Actions #11

Updated by Peter Amstutz about 1 year ago

  • Target version changed from Development 2023-05-24 sprint to Development 2023-06-07
Actions #12

Updated by Peter Amstutz about 1 year ago

  • Release deleted (63)
Actions #13

Updated by Peter Amstutz about 1 year ago

  • Target version changed from Development 2023-06-07 to To be scheduled
Actions #14

Updated by Brett Smith about 1 year ago

All of this is obsoleted by #12684. There should be no need to write anything separate for this anymore.

Actions #15

Updated by Brett Smith about 1 year ago

  • Target version changed from To be scheduled to Development 2023-05-24 sprint
  • Assigned To changed from Peter Amstutz to Brett Smith
  • Status changed from New to Resolved
Actions #16

Updated by Peter Amstutz about 1 year ago

Brett Smith wrote in #note-14:

All of this is obsoleted by #12684. There should be no need to write anything separate for this anymore.

Can you confirm if fetching the discovery document is also now retried by default?

Actions #17

Updated by Brett Smith about 1 year ago

Peter Amstutz wrote in #note-16:

Can you confirm if fetching the discovery document is also now retried by default?

It is. We now pass num_retries to googleapiclient.discovery.build, and that is used when fetching the discovery document.

Actions #18

Updated by Peter Amstutz 9 months ago

  • Release set to 66
Actions

Also available in: Atom PDF