https://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422020-02-06T15:06:20ZArvadosArvados - Bug #16133: [controller] add loop prevention to federation lookups in new code pathhttps://dev.arvados.org/issues/16133?journal_id=816252020-02-06T15:06:20ZWard Vandewegeward@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/81625/diff?detail_id=78308">diff</a>)</li></ul> Arvados - Bug #16133: [controller] add loop prevention to federation lookups in new code pathhttps://dev.arvados.org/issues/16133?journal_id=816552020-02-06T16:12:11ZTom Cleggtom@curii.com
<ul><li><strong>Assigned To</strong> set to <i>Tom Clegg</i></li><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul><p>16133-federation-loop @ <a class="changeset" title="16133: Don't take multiple hops when getting collections. Arvados-DCO-1.1-Signed-off-by: Tom Cle..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/14bb7aa24ed7a06ce94e59f64ab32bdb44641168">14bb7aa24ed7a06ce94e59f64ab32bdb44641168</a> -- <a class="external" href="https://ci.arvados.org/view/Developer/job/developer-run-tests/1717/"<a href="https://ci.arvados.org/view/Developer/job/developer-run-tests/1717/">developer-run-tests: #1717 <img src="https://ci.arvados.org/buildStatus/icon?job=developer-run-tests&build=1717" alt="" /></a></a></p> Arvados - Bug #16133: [controller] add loop prevention to federation lookups in new code pathhttps://dev.arvados.org/issues/16133?journal_id=816842020-02-06T17:22:58ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>The old strategy was that controller wouldn't forward any request that said it was from another instance of controller. Is there a reason not to do that any more?</p>
<p>The new strategy is that specifically for collection GET requests, it adds a new query parameter that tracks which clusters the query has passed through - in practice this is only ever the originating cluster, because it stops propagating the query after that.</p>
<p>I'm a little confused this isn't a more general solution, if every federated method needs to separately account for loops that seems error prone.</p> Arvados - Bug #16133: [controller] add loop prevention to federation lookups in new code pathhttps://dev.arvados.org/issues/16133?journal_id=817762020-02-06T19:07:52ZTom Cleggtom@curii.com
<ul></ul><p>The HTTP header seems like the wrong layer (if we use a transport other than HTTP, we still want to detect cycles). It changes the meaning of the request so we'd probably need a "Vary" header to do it right. Putting a trail in the request options just seems more explicit and easier, and it can distinguish cycles from multi-hop.</p>
<p>"Never makes sense to take multiple hops" is specific to collections, because of blob signatures. OTOH we also want to avoid taking multiple paths to a single destination, like a→b→d and a→c→d... which isn't specific to collections. So (at least until we need / can handle the multi-hop case) this might as well be handled in tryLocalThenRemotes.</p>
<p>Most federation methods don't need to detect loops because the upstream requests only have UUIDs that match their next hop.</p>
<p>16133-federation-loop @ <a class="changeset" title="16133: Don't take multiple federation hops. Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomcle..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/d8861941ef704dd729252ee1592e48a082798b65">d8861941ef704dd729252ee1592e48a082798b65</a> -- <a class="external" href="https://ci.arvados.org/view/Developer/job/developer-run-tests/1719/"<a href="https://ci.arvados.org/view/Developer/job/developer-run-tests/1719/">developer-run-tests: #1719 <img src="https://ci.arvados.org/buildStatus/icon?job=developer-run-tests&build=1719" alt="" /></a></a></p> Arvados - Bug #16133: [controller] add loop prevention to federation lookups in new code pathhttps://dev.arvados.org/issues/16133?journal_id=818042020-02-06T19:22:42ZWard Vandewegeward@curii.com
<ul></ul><p>I built the binary from source and am testing it on 9tee4.</p>
<p>Both 9tee4 and c97qk are configured in a 3-way federation with 4xphq. The latter is powered off. c97qk is running controller version 1.5.0.pre20200115150419-1 (it is no longer auto-deployed).</p>
<p>9tee4 is running controller from the tip of the 16133-federation-loop branch at <a class="changeset" title="16133: Don't take multiple hops when getting collections. Arvados-DCO-1.1-Signed-off-by: Tom Cle..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/14bb7aa24ed7a06ce94e59f64ab32bdb44641168">14bb7aa24ed7a06ce94e59f64ab32bdb44641168</a></p>
<p>Searching for the block 9f26a86b6030a69ad222cf67d71c9502+65 (this is the one that caused the loop last night) on workbench.9tee4 eventually gave me a fiddlesticks with this error:</p>
<pre>
{
":errors":[
"errors: [request failed: https://9tee4.arvadosapi.com/arvados/v1/collections/9f26a86b6030a69ad222cf67d71c9502+65?forwarded_for=9tee4-&reader_tokens=%5B%22v2%2F<a href="https://arvadosapi.com/9tee4-gj3su-dbww194nbb2pcwb">9tee4-gj3su-dbww194nbb2pcwb</a>%2F30c99ba7191be91634d1439eab1ee4f969d12699%22%5D: 404 Not Found: not found request failed: https://c97qk.arvadosapi.com/arvados/v1/collections/9f26a86b6030a69ad222cf67d71c9502+65?forwarded_for=9tee4-&reader_tokens=%5B%22v2%2F<a href="https://arvadosapi.com/9tee4-gj3su-dbww194nbb2pcwb">9tee4-gj3su-dbww194nbb2pcwb</a>%2F2938ad3eb7eb5c1ff6b5bb868ac90cc8d54d06d4%22%5D: 401 Unauthorized: request failed: http://localhost:8000/arvados/v1/collections/9f26a86b6030a69ad222cf67d71c9502+65?reader_tokens=%5B%22v2%2F<a href="https://arvadosapi.com/9tee4-gj3su-dbww194nbb2pcwb">9tee4-gj3su-dbww194nbb2pcwb</a>%2F2938ad3eb7eb5c1ff6b5bb868ac90cc8d54d06d4%22%5D: 401 Unauthorized: Not logged in (req-2yf2393hw25yh35g10uo) Get https://4xphq.arvadosapi.com/arvados/v1/collections/9f26a86b6030a69ad222cf67d71c9502+65?forwarded_for=9tee4-&reader_tokens=%5B%22v2%2F<a href="https://arvadosapi.com/9tee4-gj3su-dbww194nbb2pcwb">9tee4-gj3su-dbww194nbb2pcwb</a>%2Fff9d848d98373dd58e3cecad335d52d4765d580f%22%5D: dial tcp 54.209.184.185:443: i/o timeout]"
]
}
</pre>
<p>This IP in the error message, 54.209.184.185, is 4xphq's api server IP. Cf. <a class="issue tracker-1 status-3 priority-4 priority-default closed" title="Bug: [controller] handle unreachable federation peer better (Resolved)" href="https://dev.arvados.org/issues/16134">#16134</a>.</p>
<p>I no longer see a loop looking for that block, so I think this patch works. There is no excessive use of file descriptors either.</p> Arvados - Bug #16133: [controller] add loop prevention to federation lookups in new code pathhttps://dev.arvados.org/issues/16133?journal_id=818072020-02-06T19:27:21ZWard Vandewegeward@curii.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-3 priority-4 priority-default closed" href="/issues/16134">Bug #16134</a>: [controller] handle unreachable federation peer better</i> added</li></ul> Arvados - Bug #16133: [controller] add loop prevention to federation lookups in new code pathhttps://dev.arvados.org/issues/16133?journal_id=818092020-02-06T19:55:56ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>Tom Clegg wrote:</p>
<blockquote>
<p>The HTTP header seems like the wrong layer (if we use a transport other than HTTP, we still want to detect cycles). It changes the meaning of the request so we'd probably need a "Vary" header to do it right. Putting a trail in the request options just seems more explicit and easier, and it can distinguish cycles from multi-hop.</p>
</blockquote>
<p>Got it, thanks for explaining the rationale.</p>
<blockquote>
<p>"Never makes sense to take multiple hops" is specific to collections, because of blob signatures. OTOH we also want to avoid taking multiple paths to a single destination, like a→b→d and a→c→d... which isn't specific to collections. So (at least until we need / can handle the multi-hop case) this might as well be handled in tryLocalThenRemotes.</p>
<p>Most federation methods don't need to detect loops because the upstream requests only have UUIDs that match their next hop.</p>
</blockquote>
<p>Right, it's the only one out of the current set of supported federation requests, but in the future I expect we'll have other open-ended queries that don't feature UUIDs directing them to a specific cluster.</p>
<p>Putting the check in tryLocalThenRemotes() does also have the minor benefit of not doing the extra work of starting, immediately failing, and gathering results from N extra goroutines.</p>
<blockquote>
<p>16133-federation-loop @ <a class="changeset" title="16133: Don't take multiple federation hops. Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@tomcle..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/d8861941ef704dd729252ee1592e48a082798b65">d8861941ef704dd729252ee1592e48a082798b65</a> -- <a class="external" href="https://ci.arvados.org/view/Developer/job/developer-run-tests/1719/"<a href="https://ci.arvados.org/view/Developer/job/developer-run-tests/1719/">developer-run-tests: #1719 <img src="https://ci.arvados.org/buildStatus/icon?job=developer-run-tests&build=1719" alt="" /></a></a></p>
</blockquote>
<p>This LGTM.</p> Arvados - Bug #16133: [controller] add loop prevention to federation lookups in new code pathhttps://dev.arvados.org/issues/16133?journal_id=818182020-02-06T22:52:01ZAnonymous
<ul><li><strong>% Done</strong> changed from <i>0</i> to <i>100</i></li><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li></ul><p>Applied in changeset <a class="changeset" title="Merge branch '16133-federation-loop' fixes #16133 Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/118908c39c6ffa0ae8b62cddbdb610c51a461b6d">arvados|118908c39c6ffa0ae8b62cddbdb610c51a461b6d</a>.</p>