Project

General

Profile

Actions

Bug #17926

closed

[controller] lib/pq 1.3.0 does not handle stale db connections properly (Aurora RDS)

Added by Ward Vandewege over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
-
Release relationship:
Auto

Description

Context: Arvados cluster with Aurora RDS as db backend.

Symptom: After the cluster has been idle for a while, a fresh login fails with a "broken pipe" error. The logs say


{"PID":14505,"RequestID" :"req-22mvdy7j9r6di9xzn6os","level”:“info”, "msg":"response”, "remoteAddr”:"127.0.0.1:47966", "reqBytes":38,"reqForwardedFor":"1.2.3.4", “reqHost":"somewhere.
over.the.rainbow", “reqMethod": "POST", “reqPath":"arvados/v1/users/authenticate",“reqQuery":"","respBody":"{\"errors\":[\"w
rite tcp 9.1.2.3:57210-\\u003e5.6.7.8:5432: write: broken pipe\"]}\n","respBytes":91, respStatus":"Internal
Server Error”,"respStatusCode” :500, “time” :"2021-07-207T15:57:14.8873462372", “timeToStatus":0.177528, “timeTotal”:0.177538, "timeWriteBody":0.000018}

Likely cause: a bug in `lib/pq`, as described here: https://blog.bossylobster.com/2020/12/broken-pipe.html

The fix has been merged and is available in version 1.10.0 and up, but we are on version 1.3.0.


Subtasks 1 (0 open1 closed)

Task #17927: review 17962-bump-lib-pqResolvedPeter Amstutz07/20/2021Actions
Actions #1

Updated by Ward Vandewege over 2 years ago

  • Status changed from New to In Progress
Actions #2

Updated by Ward Vandewege over 2 years ago

  • Description updated (diff)
Actions #3

Updated by Peter Amstutz over 2 years ago

  • Release set to 41
Actions #4

Updated by Ward Vandewege over 2 years ago

  • Description updated (diff)
Actions #5

Updated by Ward Vandewege over 2 years ago

Ready for review at 004f220a006e4e9716ad6f229e5e3721090d44f0 on branch 17962-bump-lib-pq

Tests passed at developer-run-tests: #2598

Actions #6

Updated by Peter Amstutz over 2 years ago

Ward Vandewege wrote:

Ready for review at 004f220a006e4e9716ad6f229e5e3721090d44f0 on branch 17962-bump-lib-pq

Tests passed at developer-run-tests: #2598

LGTM

Actions #7

Updated by Ward Vandewege over 2 years ago

Fix is merged (though I typo'd the issue number in the git commits as 17962 instead of 17926...), waiting for confirmation that it fixes the problem.

Actions #8

Updated by Ward Vandewege over 2 years ago

  • Status changed from In Progress to Resolved

The fix appears to work, the bug was no longer observed. Resolving this ticket.

Actions

Also available in: Atom PDF