Project

General

Profile

Concurrent writes to a single collection » History » Version 6

Tom Clegg, 05/07/2024 06:02 PM

1 1 Tom Clegg
h1. Concurrent writes to a single collection
2
3
h2. Background:
4
5
Currently, if a client uses concurrent WebDAV PUT requests to write many files to a single collection, the resulting collection is guaranteed to contain all of the uploaded files *only if* all of the overlapping requests are handled by the same keep-web process. In a load-balancing scenario where some of the requests are handled by different keep-web processes, race conditions are not handled, so some of the uploaded files might not be preserved (despite successful response codes sent to the client).
6
7
Additionally, within a given keep-web process, overlapping write requests for a single collection are processed sequentially, which largely defeats the performance advantage of doing multi-threaded uploads.
8
9
h2. Design goal:
10
11
When processing concurrent file uploads to a single collection via WebDAV, Arvados should:
12
* Accept file data and write through to Keep concurrently, rather than sequentially.
13
* Ensure all uploaded files are preserved, even if upload requests are distributed across multiple keep-web, controller, and railsapi processes/hosts.
14
15
h2. Proposal for concurrent upload handling:
16
17
When processing an upload, keep-web should:
18
# Write the uploaded data to Keep using a temporary in-memory collection.
19
# Explicitly lock the target collection to prevent races with other keep-web processes/goroutines (see below).
20
# Get the target collection's current manifest, splice the uploaded file into it, and save it.
21
# Unlock the target collection.
22
23
Steps 2-4 could be done either by keep-web itself, or using the "replace_files" controller feature.
24
25
If "replace_files" is used, it will need two new features:
26
* Ability to express "file X from a manifest supplied in this update request" (to avoid the overhead of creating and deleting a separate temporary collection just for the sake of referring to it in a "replace_files" request).
27
* Ability to express "file X from the current version of the target collection" (to avoid the overhead and race potential of retrieving the target collection's current PDH ahead of time just for the sake of referring to it in a "replace_files" request).
28
29 3 Tom Clegg
h3. Proposed changes to replace_files API
30
31
*(1)* UUID of the target collection can be used as a source. Example, atomically renaming @foo@ to @bar@ in collection with uuid @zzzzz-4zz18-abcdefghijklmno@:
32
33
<pre><code class="yaml">
34
uuid: zzzzz-4zz18-abcdefghijklmno
35
replace_files:
36
  foo: ""
37 5 Tom Clegg
  bar: current/foo
38 3 Tom Clegg
</code></pre>
39
40
* The existing API would reject the above request because the source for "bar" does not begin with "{pdh}/".
41 5 Tom Clegg
* The existing API can already do a _non-atomic_ rename, but after two clients perform the request sequences "get pdh; move pdh/foo to bar1" and "get pdh; move pdh/foo to bar2", and another collection with the old pdh happens to exist, then the final collection may contain both "bar1" and "bar2". This seems likely to cause problems.
42 3 Tom Clegg
43
*(2)* Manifest supplied with the update request can be used as a source. Example, atomically adding/replacing a file named @foo@ with content @foo@ in collection with uuid @zzzzz-4zz18-abcdefghijklmno@:
44
45
<pre><code class="yaml">
46
uuid: zzzzz-4zz18-abcdefghijklmno
47 5 Tom Clegg
manifest_text: ". acbd18db4cc2f85cedef654fccc4a4d8+3 0:3:uploaded-file\n"
48 3 Tom Clegg
replace_files:
49 5 Tom Clegg
  foo: manifest_text/uploaded-file
50 1 Tom Clegg
</code></pre>
51
52 3 Tom Clegg
* The existing API would reject the above request because both @replace_files@ and @manifest_text@ parameters are provided, and because the source for "bar" does not begin with "{pdh}/".
53
* In this example the filename in the supplied manifest is "uploaded-file" to be clear about API behavior, but a real application doing this would probably use the real destination filename "foo" instead.
54 6 Tom Clegg
* Using the existing @manifest_text@ field makes the collections#update API behave more like one mode with a default (if @replace_files@ is not provided, its implicit value is @{"/":"manifest_text/"}@) rather than two different modes (if @replace_files@ is provided, then we ignore @manifest_text@ and use @alternate_manifest_text@ instead).
55 3 Tom Clegg
56 1 Tom Clegg
h2. Proposal for locking collections:
57
58
Normal postgresql row locks (@"select from collections for update"@) are not conducive to this situation because the relevant row needs to be updated by a Ruby program while the lock is held by a Go program.
59
60
However, if we create a separate table for the sole purpose of locking collections by UUID, Go programs can use row locks in that table to prevent overlapping update requests from getting through to RailsAPI.
61
62
For example, given the following setup:
63
64
<pre>
65
create table uuid_locks (uuid varchar primary key, n integer);
66
</pre>
67
68
The following SQL statements function as an exclusive lock:
69
70
<pre>
71
begin;
72
insert into uuid_locks (uuid) values ('zzzzz-tpzed-123451234512345') on conflict (uuid) do update set n=uuid_locks.n+1;
73
-- (lock is held here)
74
commit;
75
-- (lock is released by commit, rollback, or disconnect)
76
</pre>
77
78
The following SQL statement safely removes unused locks without blocking or deleting any in-use locks:
79
80
<pre>
81
delete from uuid_locks where uuid in (select uuid from uuid_locks for update skip locked);
82
</pre>
83
84 2 Tom Clegg
The table can be created with the "unlogged" option to improve performance. This accepts data loss on server crash, which is acceptable here because the table is only used for its locking semantics anyway. With or without "unlogged", if the above @commit@ ("release lock") fails, then controller needs to assume the lock was not held long enough to protect the update from being overwritten by a different update, and return a 500 error to the caller.
85
86 1 Tom Clegg
Regardless of whether keep-web calls replace_files or implements locking itself, the "replace_files" feature should also use this locking mechanism. That way, any number of overlapping keep-web uploads _and_ other uses of "replace_files" will be handled safely.
87
88
Once this feature is in place, @arv-mount@ and @arvados-client mount@ can, in principle, be updated to use "replace_files" to improve their behavior in concurrent write scenarios.