Bug #21030


Permission table slowness

Added by Peter Amstutz 5 months ago. Updated 4 months ago.

Assigned To:
Story points:
Release relationship:


Reported that permission updates on a large / busy cluster get slower and slower over time. Doing a "VACUUM FULL" fixes it but the slowness comes back after a few days.

Currently speculating that it may be due to the fact that every request that updates permission, trashed projects, or frozen projects currently creates a temporary table.

I think the queries that use temporary tables can be rewritten to use a CTE fairly easily, so that's worth a try.

Subtasks 2 (0 open2 closed)

Task #21065: InvestigateResolvedPeter Amstutz10/13/2023Actions
Task #21073: Review 21030-update-perm-cteResolvedPeter Amstutz10/13/2023Actions
Actions #1

Updated by Peter Amstutz 5 months ago

  • Status changed from New to In Progress
Actions #2

Updated by Peter Amstutz 5 months ago

  • Category set to API
Actions #3

Updated by Peter Amstutz 5 months ago

  • Category deleted (API)
  • Description updated (diff)
Actions #4

Updated by Peter Amstutz 5 months ago

  • Category set to API
  • Description updated (diff)
Actions #6

Updated by Peter Amstutz 5 months ago

  • Assigned To set to Peter Amstutz
Actions #7

Updated by Peter Amstutz 5 months ago

  • Target version changed from Development 2023-10-11 sprint to Development 2023-10-25 sprint
Actions #9

Updated by Peter Amstutz 5 months ago

Winner winner chicken dinner

create index index_materialized_permissions_target_is_not_user3 on materialized_permissions (target_uuid, traverse_owned, (user_uuid = target_uuid or target_uuid not like '_____-tpzed-_______________'));

Actions #10

Updated by Peter Amstutz 5 months ago

I'm abandoning this branch but recording it here just for history

21030-target-is-user 2e14158bd0066efa48cb971cde7f8bf69de44651

Actions #11

Updated by Peter Amstutz 5 months ago

21030-update-perm-cte @ 4727334c7ef28cdb7fb2df89211c09eb5d51bef4

developer-run-tests: #3863

Note each item completed with additional detail if necessary. If an item is irrelevant to a specific branch, briefly explain why.

  • All agreed upon points are implemented / addressed.
    • The main bottleneck on permission updates has been identified, and the solution has been confirmed to improve performance on real-world user clusters.
  • Anything not implemented (discovered or discussed during work) has a follow-up story.
    • n/a
  • Code is tested and passing, both automated and manual, what manual testing was done is described
    • The changes to remove temporary tables are pass all tests
    • The new index has been manually confirmed to improve performance by looking at query plans & query execution times
  • Documentation has been updated.
    • n/a
  • Behaves appropriately at the intended scale (describe intended scale).
    • By avoiding a sequential scan, this both greatly improves performance and should hopefully also avoid the performance deterioration of accumulated dead rows between table vacuums.
  • Considered backwards and forwards compatibility issues between client and server.
    • n/a
  • Follows our coding standards and GUI style guidelines.
    • yes

This branch has two improvements in response to reported performance problems.

The first improvement is to eliminate the creation of temporary tables. This turned out to not be the main source of performance problems, however research indicated that it is not best practice to use temporary tables for frequently executed queries, and confirmed that the pg_attributes table was accumulating dead rows due to the use of temporary tables. The temporary tables have been replaced by Computed Table Expressions (CTEs) which behave equivalently for our purposes but apparently have less overhead.

The second improvement, which does make a large performance improvement, is a new index on materialized permissions. Here's the comment in the code:

    # The inner part of compute_permission_subgraph has a query clause like this:
    #    where u.perm_origin_uuid = m.target_uuid AND m.traverse_owned
    #         AND (m.user_uuid = m.target_uuid or m.target_uuid not like '_____-tpzed-_______________')
    # This will end up doing a sequential scan on
    # materialized_permissions, which can easily have millions of
    # rows, unless we fully index the table for this query.  In one test,
    # this brought the compute_permission_subgraph query from over 6
    # seconds down to 250ms.

A third detail is that we confirmed that the "SET LOCAL enable_mergejoin to false;" workaround is still needed on Postgres 14, so you'll see in the branch history that I took it out and then put it back.

Actions #12

Updated by Peter Amstutz 5 months ago

  • Release set to 67
Actions #13

Updated by Brett Smith 5 months ago

Peter Amstutz wrote in #note-11:

21030-update-perm-cte @ 4727334c7ef28cdb7fb2df89211c09eb5d51bef4

LGTM, thanks.

Actions #14

Updated by Peter Amstutz 4 months ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF