Bug #16683

Trouble sharing with federated users

Added by Peter Amstutz about 1 month ago. Updated 29 days ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Target version:
Start date:
08/13/2020
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

Workbench 1

Share with group

Steps to reproduce:

  1. Create a LoginCluster federation with 2 instances
  2. Go to the 2nd instance and log in with a non-admin federated user
  3. Create a project
  4. Go to the sharing tab
  5. Share with "all users"

On reloading the sharing tab, it'll get fiddlesticks:

request failed: x1ly6-j7d0g-fffffffffffffff%22%5D%5D%5D&limit=9223372036854775807&offset=0">https://172.17.0.3:8000/arvados/v1/users?cluster_id=&count=&filters=%5B%5B%22uuid%22%2C%22in%22%2C%5B%22x1ly6-j7d0g-fffffffffffffff%22%5D%5D%5D&limit=9223372036854775807&offset=0: 400 Bad Request: cannot execute federated list query unless count=="none" [API: 400]

Workbench 2

Sharing dialog

Fails to open with the similar error:

request failed: x1ly6-j7d0g-fffffffffffffff%22%5D%5D%5D&offset=0">https://172.17.0.3:8000/arvados/v1/users?cluster_id=&count=&filters=%5B%5B%22uuid%22%2C%22in%22%2C%5B%22x1ly6-j7d0g-fffffffffffffff%22%5D%5D%5D&offset=0: 400 Bad Request: cannot execute federated list query unless count=="none"

Sharing with user

Also observed: attempting to share with a user gets a "unknown uuid" error. This is probably due to the fact that although the user can see other users from the login cluster, the permissions are checked on the federated cluster, which doesn't know that. The validity check on the link tail for permission links should probably just suppress raising an error when the cluster id isn't the local one.

(alternately, we could skip the check entirely, this would partially address the previously reported problem of curators being unable to share with groups that they don't have access to, although they would need to add the permission link by uuid, possibly via the command line, which might not be acceptable.)


Subtasks

Task #16691: Review 16683-fed-sharing (arvados)ResolvedPeter Amstutz

Task #16700: Review 16683-fed-sharing (workbench2)ResolvedPeter Amstutz


Related issues

Related to Arvados - Bug #16726: other cluster's special users (root and anonymous) can appear in user listResolved08/31/2020

Is duplicate of Arvados - Bug #16681: Sharing dialog fails with "count must be none" when using LoginCluster featureDuplicate

Blocks Arvados - Story #16688: Release Arvados 2.0.4Resolved08/14/202008/17/2020

Associated revisions

Revision 4901a3c5
Added by Peter Amstutz about 1 month ago

Merge branch '16683-fed-sharing' refs #16683

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <>

Revision d4349585
Added by Peter Amstutz about 1 month ago

Merge branch '16683-fed-sharing' refs #16683

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <>

History

#1 Updated by Peter Amstutz about 1 month ago

  • Description updated (diff)

#2 Updated by Peter Amstutz about 1 month ago

  • Is duplicate of Bug #16681: Sharing dialog fails with "count must be none" when using LoginCluster feature added

#3 Updated by Peter Amstutz about 1 month ago

  • Status changed from New to Duplicate

#4 Updated by Peter Amstutz about 1 month ago

  • Status changed from Duplicate to New
  • Description updated (diff)

#5 Updated by Peter Amstutz about 1 month ago

Adding

.with_count("none")
.fetch_multiple_pages(false)

seems like it should produce a working query but now it fails this way:

2020-08-11_19:41:34.01115 {"PID":1419,"RequestID":"req-29gqcx0h0hqobxo9efl6","level":"info","msg":"response","remoteAddr":"127.
0.0.1:34258","reqBytes":0,"reqForwardedFor":"172.17.0.3","reqHost":"172.17.0.4:8000","reqMethod":"GET","reqPath":"arvados/v1/us
ers","reqQuery":"cluster_id=\u0026count=none\u0026filters=%5B%5B%22uuid%22%2C%22in%22%2C%5B%22x1ly6-j7d0g-fffffffffffffff%22%5D
%5D%5D\u0026offset=0","respBody":"{\"errors\":[\"Get https://172.17.0.3:8000/arvados/v1/users?cluster_id=\\u0026count=none\\u00
26filters=%5B%5B%22uuid%22%2C%22in%22%2C%5B%22x1ly6-j7d0g-fffffffffffffff%22%5D%5D%5D\\u0026offset=0: EOF\"]}\n","respBytes":196,"respStatus":"Internal Server Error","respStatusCode":500,"time":"2020-08-11T19:41:34.011033813Z","timeToStatus":0.003099,"timeTotal":0.003152,"timeWriteBody":0.000053}

Retries also seem to be doing something weird, is it interpreting the '%' as a format character?

https://172.17.0.3:8000/arvados/v1/users?cluster_id=\\u0026cou
nt=none\\u0026filters=%5B%5B%22uuid%22%2C%22in%22%2C%5B%22x1ly6-j7d0g-fffffffffffffff%22%5D%5D%5D\\u0026offset=0: 502 Bad Gatew
ay: request failed: https://172.17.0.4:8000/arvados/v1/users?cluster_id=\\u0026count=none\\u0026filters=%!B(MISSING)%!B(MISSING
)%!u(MISSING)uid%2C%!i(MISSING)n%2C%!B(MISSING)%!x(MISSING)1ly6-j7d0g-fffffffffffffff%5D%!D(MISSING)%!D(MISSING)\\u0026offset=0
: 500 Internal Server Error: Get https://172.17.0.3:8000/arvados/v1/users?cluster_id=\\u0026count=none\\u0026filters=%!B(MISSIN
G)%!B(MISSING)%!u(MISSING)uid%2C%!i(MISSING)n%2C%!B(MISSING)%!x(MISSING)1ly6-j7d0g-fffffffffffffff%5D%!D(MISSING)%!D(MISSING)\\
u0026offset=0: EOF\

#6 Updated by Peter Amstutz about 1 month ago

Response on LoginCluster

2020-08-11_20:13:41.04890 {"PID":1419,"RequestID":"req-79u0kwp3hsvwgb19cdci","level":"info","msg":"response","remoteAddr":"127.0.0.1:59
428","reqBytes":0,"reqForwardedFor":"172.17.0.3","reqHost":"172.17.0.4:8000","reqMethod":"GET","reqPath":"arvados/v1/users","reqQuery":
"cluster_id=\u0026count=none\u0026filters=%5B%5B%22uuid%22%2C%22in%22%2C%5B%22x1ly6-j7d0g-fffffffffffffff%22%5D%5D%5D\u0026offset=0","r
espBody":"{\"errors\":[\"Get https://172.17.0.3:8000/arvados/v1/users?cluster_id=\\u0026count=none\\u0026filters=%5B%5B%22uuid%22%2C%22
in%22%2C%5B%22x1ly6-j7d0g-fffffffffffffff%22%5D%5D%5D\\u0026offset=0: EOF\"]}\n","respBytes":196,"respStatus":"Internal Server Error","
respStatusCode":500,"time":"2020-08-11T20:13:41.048774508Z","timeToStatus":0.005508,"timeTotal":0.005535,"timeWriteBody":0.000027}

Response on federate:

2020-08-11_20:13:41.04952 {"PID":14215,"RequestID":"req-79u0kwp3hsvwgb19cdci","level":"info","msg":"response","remoteAddr":"127.0.0.1:5
9424","reqBytes":0,"reqForwardedFor":"172.17.0.4","reqHost":"172.17.0.3:8000","reqMethod":"GET","reqPath":"arvados/v1/users","reqQuery"
:"cluster_id=\u0026count=none\u0026filters=%5B%5B%22uuid%22%2C%22in%22%2C%5B%22x1ly6-j7d0g-fffffffffffffff%22%5D%5D%5D\u0026offset=0","
respBody":"{\"errors\":[\"request failed: https://172.17.0.4:8000/arvados/v1/users?cluster_id=\\u0026count=none\\u0026filters=%!B(MISSI
NG)%!B(MISSING)%!u(MISSING)uid%2C%!i(MISSING)n%2C%!B(MISSING)%!x(MISSING)1ly6-j7d0g-fffffffffffffff%5D%!D(MISSING)%!D(MISSING)\\u0026of
fset=0: 500 Internal Server Error: Get https://172.17.0.3:8000/arvados/v1/users?cluster_id=\\u0026count=none\\u0026filters=%!B(MISSING)
!B(MISSING)!u(MISSING)uid%2C%!i(MISSING)n%2C%!B(MISSING)%!x(MISSING)1ly6-j7d0g-fffffffffffffff%5D%!D(MISSING)%!D(MISSING)\\u0026offse
t=0: EOF\"]}\n","respBytes":532,"respStatus":"Bad Gateway","respStatusCode":502,"time":"2020-08-11T20:13:41.049414288Z","timeToStatus":0.015050,"timeTotal":0.015071,"timeWriteBody":0.000021}

#7 Updated by Peter Amstutz about 1 month ago

The %(MISSING) part is just a silly logging bug. Fix pending.

What's really going on:

It's getting stuck in a request loop. The federate ("x1ly6") forwards the user list request to the login cluster ("xrmjz"). The login cluster sees this filter:

[["uuid", "in", "x1ly6-j7d0g-fffffffffffffff"]]

It recognizes it as a uuid and so it forwards the request to the federate. Which forwards the request back to login cluster, which forwards it back to the federate, and so forth until it gives up. (I don't know how it decides to break out of the loop, but at least it doesn't get stuck infinitely).

Issues:

"x1ly6-j7d0g-fffffffffffffff" isn't a user. I think Workbench just simplistically sends the same "uuid in" list for both users and groups (should be harmless, right?). The intended behavior is to share with the "all users" group on the x1ly6 cluster.

Possible fixes:

  • Check the type portion of the uuid
  • Filter "uuid in" before sending the list request to LoginCluster
  • The call chain sets "bypass federation" flag to suppress request propagation.

#8 Updated by Peter Amstutz about 1 month ago

  • Release set to 34

#9 Updated by Peter Amstutz about 1 month ago

#10 Updated by Peter Amstutz about 1 month ago

  • Description updated (diff)

#11 Updated by Peter Amstutz about 1 month ago

  • Description updated (diff)

#12 Updated by Peter Amstutz about 1 month ago

  • Assigned To set to Peter Amstutz

#13 Updated by Peter Amstutz about 1 month ago

  • Status changed from New to In Progress

#16 Updated by Lucas Di Pentima about 1 month ago

Reviewing 877689f

  • File services/api/app/models/link.rb
    • Updates don’t check for cluster federated configuration, do you think it’s convenient to add it to at least avoid some of the copy&paste errors?
  • File services/api/app/models/arvados_model.rb
    • Line 761 & 762: shouldn’t be added only to the Link model?
    • Are we already using links to/from PDHs? I haven’t found documentation about it, maybe it’ll need an update? For example, https://doc.arvados.org/v2.0/api/permission-model.html#links talks about tail_uuid being only from a User or Group.
  • I think having tests for the above changes would be convenient.
  • Ran the fed migration tests on Jenkins and had a failure: https://ci.arvados.org/job/arv-federation-migrate-test/104/

#17 Updated by Lucas Di Pentima about 1 month ago

On a second thought: Maybe the fed-migrate tests on Jenkins don't start a federation from the provided commit hash. I was looking to avoid running them locally that usually is slower.

#18 Updated by Peter Amstutz about 1 month ago

Lucas Di Pentima wrote:

Reviewing 877689f

  • File services/api/app/models/link.rb
    • Updates don’t check for cluster federated configuration, do you think it’s convenient to add it to at least avoid some of the copy&paste errors?

Added a call to ApiClientAuthorization.remote_host to check for a valid cluster id.

  • File services/api/app/models/arvados_model.rb
    • Line 761 & 762: shouldn’t be added only to the Link model?
    • Are we already using links to/from PDHs? I haven’t found documentation about it, maybe it’ll need an update? For example, https://doc.arvados.org/v2.0/api/permission-model.html#links talks about tail_uuid being only from a User or Group.

I removed the check since it was functionally useless and didn't exist in the old code.

As it stands, if something doesn't fit the uuid format, it just won't be checked, so it doesn't matter if it is a PDH. That said there are some legacy links classes that use PDH, like 'docker_image_migration'.

  • I think having tests for the above changes would be convenient.

Added a test.

We discussed this at standup, the jenkins test could (and probably should) do this, but it requires some fiddling, and may come at the expense of the job taking longer. I don't know if we want to block on it.

16683-fed-sharing @ 78444f2fb480801787e486d4b65198d72ab4fe15

#19 Updated by Lucas Di Pentima about 1 month ago

This (78444f2fb480801787e486d4b65198d72ab4fe15) LGTM, thanks!!
Will take a look at wb2.

#20 Updated by Lucas Di Pentima about 1 month ago

Reviewing arvados-workbench2|63e2f0e

WB2 still fails when re-opening the share dialog after being shared with all. The following fixes the problem:

diff --git a/src/views-components/sharing-dialog/participant-select.tsx b/src/views-components/sharing-dialog/participant-select.tsx
index 5d062da0..5260743f 100644
--- a/src/views-components/sharing-dialog/participant-select.tsx
+++ b/src/views-components/sharing-dialog/participant-select.tsx
@@ -131,14 +131,14 @@ export const ParticipantSelect = connect()(
             const filterUsers = new FilterBuilder()
                 .addILike('any', value)
                 .getFilters();
-            const userItems: ListResults<any> = await userService.list({ filters: filterUsers, limit });
+            const userItems: ListResults<any> = await userService.list({ filters: filterUsers, limit, count: "none" });

             const filterGroups = new FilterBuilder()
                 .addNotIn('group_class', [GroupClass.PROJECT])
                 .addILike('name', value)
                 .getFilters();

-            const groupItems: ListResults<any> = await groupsService.list({ filters: filterGroups, limit });
+            const groupItems: ListResults<any> = await groupsService.list({ filters: filterGroups, limit, count: "none" });
             this.setState({
                 suggestions: this.props.onlyPeople
                     ? userItems.items

With that, it LGTM.

#21 Updated by Peter Amstutz about 1 month ago

Noticed during testing that "share public" doesn't work -- will fix

#22 Updated by Peter Amstutz about 1 month ago

I incorporated your fix and did a little more manual testing. I noticed the "share public" option failed, so I fixed that. I added {count: "none"} in a few more places.

16683-fed-sharing arvados-workbench2|b1250d13a43d4351c6cbca7990c996b3219693d0

https://ci.arvados.org/view/Developer/job/developer-tests-workbench2/73/

#23 Updated by Lucas Di Pentima about 1 month ago

Did some manual testing against ce8i5 and all seems to work correctly, looks good to merge, thanks!

#24 Updated by Peter Amstutz about 1 month ago

  • Status changed from In Progress to Resolved

#25 Updated by Peter Amstutz about 1 month ago

We set up a configuration with jutro (LoginCluster) and pirca (federate).

On pirca, the sharing dialog works. This is the case we tested for.

On jutro, the sharing dialog fails:

request failed: pirca-tpzed-anonymouspublic%22%5D%5D%5D&forwarded_for=jutro-&offset=0&reader_tokens=%5B%22v2%2Fjutro-gj3su-e2o9x84aeg7q005%2Fc23929fe5eb184a81e0afda2cff8aed9d10e9ed2%22%5D">https://pirca.arvadosapi.com/arvados/v1/users?cluster_id=&count=none&filters=%5B%5B%22uuid%22%2C%22in%22%2C%5B%22pirca-tpzed-anonymouspublic%22%5D%5D%5D&forwarded_for=jutro-&offset=0&reader_tokens=%5B%22v2%2Fjutro-gj3su-e2o9x84aeg7q005%2Fc23929fe5eb184a81e0afda2cff8aed9d10e9ed2%22%5D: 401 Unauthorized: request failed: pirca-tpzed-anonymouspublic%22%5D%5D%5D&forwarded_for=jutro-&offset=0&reader_tokens=%5B%22v2%2Fjutro-gj3su-e2o9x84aeg7q005%2Fc23929fe5eb184a81e0afda2cff8aed9d10e9ed2%22%5D">https://jutro.arvadosapi.com/arvados/v1/users?cluster_id=&count=none&filters=%5B%5B%22uuid%22%2C%22in%22%2C%5B%22pirca-tpzed-anonymouspublic%22%5D%5D%5D&forwarded_for=jutro-&offset=0&reader_tokens=%5B%22v2%2Fjutro-gj3su-e2o9x84aeg7q005%2Fc23929fe5eb184a81e0afda2cff8aed9d10e9ed2%22%5D: 401 Unauthorized: request failed: pirca-tpzed-anonymouspublic%22%5D%5D%5D&forwarded_for=jutro-jutro-&offset=0&reader_tokens=%5B%22v2%2Fjutro-gj3su-e2o9x84aeg7q005%2Fc23929fe5eb184a81e0afda2cff8aed9d10e9ed2%22%5D">http://localhost:8004/arvados/v1/users?cluster_id=&count=none&filters=%5B%5B%22uuid%22%2C%22in%22%2C%5B%22pirca-tpzed-anonymouspublic%22%5D%5D%5D&forwarded_for=jutro-jutro-&offset=0&reader_tokens=%5B%22v2%2Fjutro-gj3su-e2o9x84aeg7q005%2Fc23929fe5eb184a81e0afda2cff8aed9d10e9ed2%22%5D: 401 Unauthorized: Not logged in (req-datocp8oyu00b24wspew) [API: 502]

On further research:

It is failing because it was mistakenly shared with the pirca anonymous user (pirca-tpzed-anonymouspublic). This causes some weird federation queries that don't work.

This happened because on jutro, there are two anonymous users to choose from, the jutro one and the pirca one. That shouldn't happen, and it is very confusing.

a) I'm not sure how the pirca one ended up on jutro. Maybe accessing a federated collection as the anonymous user, starting from pirca it used pirca's anonymous user credentials to access the collection on jutro?

b) I can't delete the user from jutro using the API, because controller redirects "user delete" API calls to the cluster that owns the user (pirca, of course). I tried to delete the user on jutro and I'm pretty sure I deleted pirca's anonymous user instead. So now the anonymous user on pirca is gone.

c) When LoginCluster is set, every cluster should use the LoginCluster's anonymous user and token.

#26 Updated by Peter Amstutz about 1 month ago

  • Status changed from Resolved to Feedback

#27 Updated by Javier Bértoli about 1 month ago

Applied the changes for the anonymous user in {pirca,jutro} through salt (commit b7de27c@saltstack). We have the automated runs disabled atm, as we are doing some manual changes in the hosts, so deployed with these steps in the salt master:

cd /data/saltstack/pillarstack && gut pull
salt '*jutro*' saltutil.refresh_pillar
salt '*jutro*' state.apply custom_files,packages,services,exim 2>&1 |tee /tmp/salida_jutro
salt '*pirca*' saltutil.refresh_pillar
salt '*pirca*' state.apply custom_files,packages,services,exim 2>&1 |tee /tmp/salida_pirca

#28 Updated by Peter Amstutz 29 days ago

  • Related to Bug #16726: other cluster's special users (root and anonymous) can appear in user list added

#29 Updated by Peter Amstutz 29 days ago

  • Status changed from Feedback to Resolved

Also available in: Atom PDF