Bug #16007

Permission graph update is slow with large numbers of groups

Added by Peter Amstutz 3 months ago. Updated 5 days ago.

Status:
In Progress
Priority:
Normal
Assigned To:
Category:
API
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
(Total: 0.00 h)
Story points:
5.0

Description

400 users
5k groups
3.5k permission links

May be due to the large number of permission links (one project is described as having 20 sharing links).

Investigate, try to recreate

Propose an update strategy that is more efficient that current one (which recomputes all permissions any time any permission changes).


Subtasks

Task #16185: ReviewNewLucas Di Pentima


Related issues

Related to Arvados Epics - Story #16156: Observability and ScalabilityNew03/11/202005/27/2020

History

#1 Updated by Peter Amstutz 3 months ago

  • Description updated (diff)
  • Subject changed from Permission graph update is slow with large numbers of groups to Permission graph update is slow

#2 Updated by Peter Amstutz 3 months ago

  • Category set to API
  • Subject changed from Permission graph update is slow to Permission graph update is slow with large numbers of groups

#3 Updated by Peter Amstutz about 1 month ago

  • Story points set to 2.0
  • Description updated (diff)

#4 Updated by Peter Amstutz about 1 month ago

  • Target version set to 2020-03-11 Sprint

#5 Updated by Tom Clegg about 1 month ago

If building the materialized permission view is unavoidably slow, perhaps we can compute a per-user subset of it while it's being updated, instead of blocking on the full update.

#6 Updated by Stanislaw Adaszewski about 1 month ago

@Just thinking loud here - maybe more indirection levels could help. For example:

links:
tail    permission    head
Group_1 --can_read--> Group_2
Group_1 --can_read--> Project_3
Group_2 --can_read--> User_4
Group_2 --can_read--> User_5

materialized_group_permissions (no users here, hence the big speed win)
tail    permission    head
Group_1 --can_read--> Group_2
Group_1 --can_read--> Project_3
Group_2 --can_read--> Project_3 # this is a derived link

then to query user-accessible collections one could

SELECT DISTINCT c.* FROM collections AS c
  LEFT JOIN materialized_group_permissions AS gp ON (gp.head=c.owner_uuid)
  LEFT JOIN links AS ug ON(ug.tail=gp.tail AND ug.head=:current_user)
  WHERE c.owner_uuid=:current_user OR ug.permission IS NOT NULL

If it gets properly planned by the SQL engine it shouldn't be a very heavy query. Am I missing something here?@

#7 Updated by Stanislaw Adaszewski about 1 month ago

One could even take it a step further:

links:
tail    permission    head
Group_1 --can_read--> Group_2
Group_2 --can_read--> Group_3
Group_1 --can_read--> Project_3
Group_3 --can_read--> User_4
Group_3 --can_read--> User_5

Project_3 --can_read--> Project_4
Project_4 --can_read--> Project_5

materialized_group_permissions (no users OR projects here!)
tail    permission    head
Group_1 --can_read--> Group_2
Group_2 --can_read--> Group_3
Group_1 --can_read--> Group_3 # derived

materialized_project_permissions (no users here)
tail      permission    head
Project_3 --can_read--> Project_4
Project_4 --can_read--> Project_5
Project_3 --can_read--> Project_5 # derived

SELECT DISTINCT c.* FROM collections AS c
  LEFT JOIN materialized_project_permissions AS pp ON(pp.head=c.owner_uuid)
  LEFT JOIN links AS ll ON(ll.head=pp.tail)
  LEFT JOIN materialized_group_permissions AS gp ON (gp.head=ll.tail)
  LEFT JOIN links AS ug ON(ug.tail=gp.tail AND ug.head=:current_user)
  LEFT JOIN links AS dp ON(dp.tail=:current_user AND db.head=pp.tail)
  WHERE c.owner_uuid=:current_user # direct ownership
    OR ug.permission IS NOT NULL   # permission via group
    OR dp.permission IS NOT NULL   # direct permission for user

#8 Updated by Stanislaw Adaszewski about 1 month ago

IF I am not mistaken you would have to refresh the materialized tables only on some particular changes, e.g. adding a Group->User link wouldn't entail a refresh at all. And those refreshes should be much much faster. Unless I missed something, the only question would be whether it is detrimental to SELECT queries.

#9 Updated by Stanislaw Adaszewski about 1 month ago

And on top of everything you could also follow the idea in #16155 and instead of refreshes just insert/remove the relevant rows manually as it would be probably easier to conceptualize in the above setup. I think this could be done only for materialized_project_permissions because projects are for sure the most numerous among groups and they are constantly created, it is a fundamental operation in organizing data. Much more frequent than Group/User operations.

#10 Updated by Peter Amstutz about 1 month ago

  • Assigned To set to Peter Amstutz

#11 Updated by Peter Amstutz 23 days ago

  • Target version changed from 2020-03-11 Sprint to 2020-03-25 Sprint

#12 Updated by Peter Amstutz 9 days ago

Current strategy under consideration:

Materialized view becomes a regular table. Perform incremental updates. Added permissions can be inserted immediately, removing permissions requires removing affected subgraph from the permission table and recomputing from remaining permissions.

Scaling will generally be (number of users that can see a group) * (1 + number of subprojects under a group).

Enforce the following constraints:

  • group_class must be one of 'role' or 'project' and cannot change group_class after creation
  • permission links cannot be changed after creation
  • valid permission links are
    • from 'user' to 'role'
    • from 'role' to 'project'
    • from 'role' to 'user'
    • from 'user' to 'project'
    • from 'user' to 'collection' (??? traditionally allowed but adds complexity)
  • owner_uuid for most objects only 'user' or 'project'
  • 'role' can only be owned by 'user'

When a new project is created

  1. Select where target_uuid = owner_uuid, get a list of (user_uuid, perm_level)
  2. Insert new rows for each (user_uuid, new project uuid, perm_level from parent project row)

When a project is trashed

  1. Starting from project_uuid, traverse down projects owned by the starting project
  2. Update trashed = 1

When a project is untrashed

  1. Starting from project_uuid, traverse down projects owned by the starting project
  2. Update trashed = 0

When a project is deleted

  1. Starting from project_uuid, traverse down projects owned by the starting project
  2. Delete all rows for target_uuid of project or child project

When a link to a group (role or project) is added

  1. Select where target_uuid = tail_uuid, get a list of (user_uuid, perm_level)
  2. Starting from head_uuid, traverse permission links and subprojects to get list of targets, compute perm_level
  3. Insert rows for each (user_uuid, target uuid, min(link perm, target perm_level))

When a link to a group (role or project) is removed

  1. Starting from head_uuid, traverse permission links and subprojects to get list of targets
  2. Delete all rows for target_uuid of targets
  3. Recompute ownership
    1. Select where target_uuid = owner_uuid, get a list of (user_uuid, perm_level)
    2. Insert new rows for each (user_uuid, target_uuid, perm_level)
  4. Recompute links
    1. Find links with head_uuid in list of targets
    2. Re-add permissions associated with links to targets

When project is moved (owner_uuid changes)

Treat like link remove behavior + link add behavior.

#13 Updated by Peter Amstutz 8 days ago

  • Target version changed from 2020-03-25 Sprint to 2020-04-08 Sprint

#14 Updated by Peter Amstutz 8 days ago

  • Story points changed from 2.0 to 5.0

#15 Updated by Peter Amstutz 8 days ago

  • Status changed from New to In Progress

#16 Updated by Stanislaw Adaszewski 5 days ago

Sounds great, very much looking forward to it.

If this turns out not to be sufficient I was doing some reading and developed a conviction that https://debezium.io/ + Apache Kafka + (Neo4j or Dgraph) would surely do the trick. It could complicate the architecture of Arvados a bit but an in-memory graph database would be able to query the permissions in real time, thus we would eliminate all delays for good (as long as we don't absolutely need joins, which from a brief look at the code we don't).

Hope the simpler enhancement is enough.

#17 Updated by Peter Amstutz about 15 hours ago

  • Related to Story #16156: Observability and Scalability added

Also available in: Atom PDF