Project

General

Profile

Actions

Concurrent writes to a single collection

Background:

Currently, if a client uses concurrent WebDAV PUT requests to write many files to a single collection, the resulting collection is guaranteed to contain all of the uploaded files only if all of the overlapping requests are handled by the same keep-web process. In a load-balancing scenario where some of the requests are handled by different keep-web processes, race conditions are not handled, so some of the uploaded files might not be preserved (despite successful response codes sent to the client).

Additionally, within a given keep-web process, overlapping write requests for a single collection are processed sequentially, which largely defeats the performance advantage of doing multi-threaded uploads.

Design goal:

When processing concurrent file uploads to a single collection via WebDAV, Arvados should:
  • Accept file data and write through to Keep concurrently, rather than sequentially.
  • Ensure all uploaded files are preserved, even if upload requests are distributed across multiple keep-web, controller, and railsapi processes/hosts.

Proposal for concurrent upload handling:

When processing an upload, keep-web should:
  1. Write the uploaded data to Keep using a temporary in-memory collection.
  2. Explicitly lock the target collection to prevent races with other keep-web processes/goroutines (see below).
  3. Get the target collection's current manifest, splice the uploaded file into it, and save it.
  4. Unlock the target collection.

Steps 2-4 could be done either by keep-web itself, or using the "replace_files" controller feature.

If "replace_files" is used, it will need two new features:
  • Ability to express "file X from a manifest supplied in this update request" (to avoid the overhead of creating and deleting a separate temporary collection just for the sake of referring to it in a "replace_files" request).
  • Ability to express "file X from the current version of the target collection" (to avoid the overhead and race potential of retrieving the target collection's current PDH ahead of time just for the sake of referring to it in a "replace_files" request).

Proposed changes to replace_files API

(1) UUID of the target collection can be used as a source. Example, atomically renaming foo to bar in collection with uuid zzzzz-4zz18-abcdefghijklmno:

uuid: zzzzz-4zz18-abcdefghijklmno
replace_files:
  foo: "" 
  bar: current/foo
  • The existing API would reject the above request because the source for "bar" does not begin with "{pdh}/".
  • The existing API can already do a non-atomic rename, but after two clients perform the request sequences "get pdh; move pdh/foo to bar1" and "get pdh; move pdh/foo to bar2", and another collection with the old pdh happens to exist, then the final collection may contain both "bar1" and "bar2". This seems likely to cause problems.

(2) Manifest supplied with the update request can be used as a source. Example, atomically adding/replacing a file named foo with content foo in collection with uuid zzzzz-4zz18-abcdefghijklmno:

uuid: zzzzz-4zz18-abcdefghijklmno
manifest_text: ". acbd18db4cc2f85cedef654fccc4a4d8+3 0:3:uploaded-file\n" 
replace_files:
  foo: manifest_text/uploaded-file
  • The existing API would reject the above request because both replace_files and manifest_text parameters are provided, and because the source for "bar" does not begin with "{pdh}/".
  • In this example the filename in the supplied manifest is "uploaded-file" to be clear about API behavior, but a real application doing this would probably use the real destination filename "foo" instead.
  • Using the existing manifest_text field makes the collections#update API behave more like one mode with a default (if replace_files is not provided, its implicit value is {"/":"manifest_text/"}) rather than two different modes (if replace_files is provided, then we ignore manifest_text and use alternate_manifest_text instead).

Proposal for locking collections:

Normal postgresql row locks ("select from collections for update") are not conducive to this situation because the relevant row needs to be updated by a Ruby program while the lock is held by a Go program.

However, if we create a separate table for the sole purpose of locking collections by UUID, Go programs can use row locks in that table to prevent overlapping update requests from getting through to RailsAPI.

For example, given the following setup:

create table uuid_locks (uuid varchar primary key, n integer);

The following SQL statements function as an exclusive lock:

begin;
insert into uuid_locks (uuid) values ('zzzzz-tpzed-123451234512345') on conflict (uuid) do update set n=uuid_locks.n+1;
-- (lock is held here)
commit;
-- (lock is released by commit, rollback, or disconnect)

The following SQL statement safely removes unused locks without blocking or deleting any in-use locks:

delete from uuid_locks where uuid in (select uuid from uuid_locks for update skip locked);

The table can be created with the "unlogged" option to improve performance. This accepts data loss on server crash, which is acceptable here because the table is only used for its locking semantics anyway. With or without "unlogged", if the above commit ("release lock") fails, then controller needs to assume the lock was not held long enough to protect the update from being overwritten by a different update, and return a 500 error to the caller.

Regardless of whether keep-web calls replace_files or implements locking itself, the "replace_files" feature should also use this locking mechanism. That way, any number of overlapping keep-web uploads and other uses of "replace_files" will be handled safely.

Once this feature is in place, arv-mount and arvados-client mount can, in principle, be updated to use "replace_files" to improve their behavior in concurrent write scenarios.

Updated by Tom Clegg about 23 hours ago · 6 revisions