Project

General

Profile

Actions

Bug #16134

closed

[controller] handle unreachable federation peer better

Added by Ward Vandewege about 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
Story points:
-
Release relationship:
Auto

Description

When an arvados cluster is configured with an unreachable federation peer, things go south real fast, and arvados-controller quickly consumes all the file descriptors it can get:

Feb 05 22:00:45 9tee4.arvadosapi.com arvados-controller[22394]: {"PID":22394,"RequestID":"req-tuynvloji3hz9h42b16w","level":"info","msg":"response","remoteAddr":"127.0.0.1:33622","reqBytes":0,"reqForwardedFor":"10.100.32.5","reqHost":"9tee4.arvadosapi.com","reqMethod":"GET","reqPath":"arvados/v1/collections/9f26a86b6030a69ad222cf67d71c9502+65","reqQuery":"","respBody":"{\"errors\":[\"errors: [Get https://4xphq.arvadosapi.com/arvados/v1/collections/9f26a86b6030a69ad222cf67d71c9502+65: dial tcp 54.209.184.185:443: i/o timeout request failed: https://9tee4.arvadosapi.com/arvados/v1/collections/9f26a86b6030a69ad222cf67d71c9502+65: 502 Bad Gateway: errors: [request failed: https://c97qk.arvadosapi.com/arvados/v1/collections/9f26a86b6030a69ad222cf67d71c9502+65: 502 Bad Gateway: errors: [Get https://c97qk.arvadosapi.com/arvados/v1/collections/9f26a86b6030a69ad222cf67d71c9502+65: dial tcp 10.25.0.6:443: socket: too many open files Get https://4xphq.arvadosapi.com/arvados/v1/collections/9f26a86b6030a69ad222cf67d71c9502+65: dial tcp: lookup 4xphq.arvadosapi.com on 127.0.0.1:53: dial udp 127.0.0.1:53: socket: too many open files Get https://9tee4.arvadosapi.com/arvados/v1/collections/9f26a86b6030a69ad222cf67d71c9502+65: dial tcp: lookup 9tee4.arvadosapi.com on 127.0.0.1:53: dial udp 127.0.0.1:53: socket: too many open files] Get https://4xphq.arvadosapi.com/arvados/v1/collections/9f26a86","respBytes":4853,"respStatus":"Bad Gateway","respStatusCode":502,"time":"2020-02-05T22:00:45.131869763Z","timeToStatus":54.670812,"timeTotal":54.670929,"timeWriteBody":0.000117}

Related issues

Related to Arvados - Bug #16133: [controller] add loop prevention to federation lookups in new code pathResolvedTom Clegg02/06/2020Actions
Actions

Also available in: Atom PDF