Story #9502

Updated by Brett Smith over 4 years ago

Today the API server holds its expanded permissions graph in the Rails cache. Any request that needs the permissions graph when it isn't in the cache will generate it. Writes that affect the permissions graph invalidate this change.

Move to a model where we generate the graph on startup, and after each write that affects the graph, before returning a result for that write request. Functional requirements:

* The API server always has an expanded permissions graph cached. It generates one at startup. When it handles a write request that changes the permissions graph, before it returns the result of the request, it generates a new permissions graph that atomically replaces the old one.
* When the async_permissions_update setting is true, incoming requests use whatever copy of the permissions graph is currently complete, without waiting for updates. When the setting is false, if the request needs to use the permissions graph while an update is being prepared, the request waits for that update to finish, then uses the new graph. This wait happens at most once per request.
* Only one update should run at a time, and each update should unblock any write requests that made their underlying database update before the graph rebuild started. For illustration, the implementation should allow this timeline of events:
*# API server receives write request A.
*# Write request A updates the database.
*# API server begins updating the permissions graph.
*# Write request B comes in and updates the database.
*# Write request C comes in and updates the database.
*# Permissions graph update finishes.
*# Send the result for write request A.
*# API server begins updating the permissions graph.
*# Permissions graph update finishes.
*# Send the result for write requests B and C.

The primary motivation for this branch is to make #9186 more practical. Right now we know it will break clients that make a permissions change, then make an API request that relies on that permissions change to be effective. The idea here is that blocking writes on permission updates should make it possible for those clients to continue working unmodified, without blocking large numbers of readers on the graph update.

We believe this change will also provide performance improvements when async permissions updates are not enabled, just by avoiding redundant graph rebuilds, but that's less of a priority.

Implementation plan for the writer:

# Commit all changes to the database. Doesn't matter whether it's one or more transactions.
# Get the current time.
# Subscribe to notifications on the graph updater timestamp.
# In a blocking loop, keep reading notifications from the subscription until the timestamp is >= the time we remembered in step 2.
# Return the API response.

Implementation plan for the updater:

# Subscribe to relevant changes to the API server database. Remember the times of events that come in from the subscription.
# When a new event comes in, if its timestamp is older than the most recent mtime of the permissions graph, discard it.
# Otherwise, in a single database transaction, get the current time, generate the expanded permissions graph, write it to the graph table, and write the remembered time as the new graph's mtime.