Bug #21547
closedreturn certain database errors as 500 so they can be retried
Description
Certain database errors represent transient errors. We should tell the client to retry the request by returning a 500 internal server error instead of 422 (which is the default behavior).
#<ActiveRecord::Deadlocked: PG::TRDeadlockDetected: ERROR: deadlock detected>
Rationale: The observed deadlocks in Arvados are conflicts between two statements (a lock ordering issue), so unwinding and retrying is a reasonable solution
#<ActiveRecord::StatementInvalid: PG::UnableToSend>
Rationale: It seems this gets thrown when the API server can't connect to the database.
Here's the list of postgres errors known to the PG gem:
https://github.com/ged/ruby-pg/blob/daec80f91b9519509ca1694a231f11a75cb43f7f/ext/errorcodes.def#L598
https://github.com/ged/ruby-pg/blob/daec80f91b9519509ca1694a231f11a75cb43f7f/ext/pg_errors.c#L88
Some other possible Exceptions to retry:
ConnectionBad
ConnectionException
ConnectionDoesNotExist
ConnectionFailure
TooManyConnections
CannotConnectNow
IdleSessionTimeout
ObjectInUse
LockNotAvailable
AdminShutdown
CrashShutdown
(There's a lot of connection related errors and I don't know the difference between them, but I included them all because it seems like those are very likely to be errors that occur through no fault of the client).
Updated by Peter Amstutz about 1 year ago
- Related to Bug #21540: occasional container_requests deadlock added
Updated by Peter Amstutz about 1 year ago
- Target version changed from Development 2024-03-13 sprint to Development 2024-03-27 sprint
Updated by Peter Amstutz about 1 year ago
- Target version changed from Development 2024-03-27 sprint to Development 2024-04-10 sprint
Updated by Peter Amstutz 12 months ago
- Target version changed from Development 2024-04-10 sprint to Development 2024-04-24 sprint
Updated by Peter Amstutz 12 months ago
- Target version changed from Development 2024-04-24 sprint to Development 2024-05-08 sprint
Updated by Peter Amstutz 12 months ago
- Target version changed from Development 2024-05-08 sprint to Development 2024-05-22 sprint
Updated by Peter Amstutz 11 months ago
- Target version changed from Development 2024-05-22 sprint to Development 2024-06-05 sprint
Updated by Peter Amstutz 11 months ago
- Target version changed from Development 2024-06-05 sprint to Future
Updated by Peter Amstutz 2 months ago
- Target version changed from Future to Development 2025-01-29
Updated by Peter Amstutz 2 months ago
21547-retryable-db-error @ c95642a9e36e67c9f6d246cfed5391d11e149d71
Updated by Tom Clegg 2 months ago
21547-retryable-db-error @ 67ab367bcaa860e74af929bbdf3c5711bb8e8f76 -- developer-run-tests: #4622
Updated by Peter Amstutz 2 months ago
21547-retryable-db-error @ f24f6d7167c32dadc80f436fdbb4806d88808e0c
Updated by Peter Amstutz 2 months ago
21547-retryable-db-error @ f24f6d7167c32dadc80f436fdbb4806d88808e0c
- All agreed upon points are implemented / addressed. Describe changes from pre-implementation design.
- Re-tries database errors. I didn't go all the way in and check the Postgresql error directly, but instead used the generic ActiveRecord errors. I believe that is good enough and the implementation is much simpler.
- Anything not implemented (discovered or discussed during work) has a follow-up story.
- n/a
- Code is tested and passing, both automated and manual, what manual testing was done is described.
- Tom helpfully contributed a test.
- New or changed UX/UX and has gotten feedback from stakeholders.
- n/a
- Documentation has been updated.
- n/a
- Behaves appropriately at the intended scale (describe intended scale).
- Should improve scale by making Arvados more robust to certain types of database errors
- Considered backwards and forwards compatibility issues between client and server.
- Returns a 500 error, which is in our list of retryable errors (
_HTTP_CAN_RETRY = set([408, 409, 423, 500, 502, 503, 504])
)
- Returns a 500 error, which is in our list of retryable errors (
- Follows our coding standards and GUI style guidelines.
- yes
Updated by Peter Amstutz 2 months ago
- Status changed from In Progress to Resolved