Bug #17926
closed[controller] lib/pq 1.3.0 does not handle stale db connections properly (Aurora RDS)
Description
Context: Arvados cluster with Aurora RDS as db backend.
Symptom: After the cluster has been idle for a while, a fresh login fails with a "broken pipe" error. The logs say
{"PID":14505,"RequestID" :"req-22mvdy7j9r6di9xzn6os","level”:“info”, "msg":"response”, "remoteAddr”:"127.0.0.1:47966", "reqBytes":38,"reqForwardedFor":"1.2.3.4", “reqHost":"somewhere. over.the.rainbow", “reqMethod": "POST", “reqPath":"arvados/v1/users/authenticate",“reqQuery":"","respBody":"{\"errors\":[\"w rite tcp 9.1.2.3:57210-\\u003e5.6.7.8:5432: write: broken pipe\"]}\n","respBytes":91, respStatus":"Internal Server Error”,"respStatusCode” :500, “time” :"2021-07-207T15:57:14.8873462372", “timeToStatus":0.177528, “timeTotal”:0.177538, "timeWriteBody":0.000018}
Likely cause: a bug in `lib/pq`, as described here: https://blog.bossylobster.com/2020/12/broken-pipe.html
The fix has been merged and is available in version 1.10.0 and up, but we are on version 1.3.0.
Updated by Ward Vandewege over 3 years ago
- Status changed from New to In Progress
Updated by Ward Vandewege over 3 years ago
Ready for review at 004f220a006e4e9716ad6f229e5e3721090d44f0 on branch 17962-bump-lib-pq
Tests passed at developer-run-tests: #2598
Updated by Peter Amstutz over 3 years ago
Ward Vandewege wrote:
Ready for review at 004f220a006e4e9716ad6f229e5e3721090d44f0 on branch 17962-bump-lib-pq
Tests passed at developer-run-tests: #2598
LGTM
Updated by Ward Vandewege over 3 years ago
Fix is merged (though I typo'd the issue number in the git commits as 17962 instead of 17926...), waiting for confirmation that it fixes the problem.
Updated by Ward Vandewege over 3 years ago
- Status changed from In Progress to Resolved
The fix appears to work, the bug was no longer observed. Resolving this ticket.