Bug #19362
closedkeep-web notices when collections accessed by uuid have changed
Description
Users reported issues when opening HTML reports via Browser after updating the collection content via S3. Updated HTML pages are not immediately displayed and require a forced reload.
a) if they are uploading via S3, keep-web itself should be able to know that something changed
b) it seems like it would be a good idea to have keep-web either poll for changes or listen to websockets to find out when collections change instead of waiting for the cache to expire
Updated by Peter Amstutz over 2 years ago
- Target version changed from 2022-08-31 sprint to 2022-08-17 sprint
Updated by Peter Amstutz over 2 years ago
- Assigned To set to Tom Clegg
- Category set to Keep
Updated by Tom Clegg over 2 years ago
- Status changed from New to In Progress
phase 1: Retire the old per-collection code path in favor of using the per-token session/sitefs for all keep-web requests. That will give better cache behavior, and address the case where a user uploads via S3 and then downloads via web using the same token.
phase 2: Mechanism for sitefs to invalidate cached collections on websocket events. That will also be an important piece of the Go-based fuse mount.
Updated by Peter Amstutz over 2 years ago
- Target version changed from 2022-08-17 sprint to 2022-08-31 sprint
Updated by Tom Clegg over 2 years ago
- Target version changed from 2022-08-31 sprint to 2022-09-14 sprint
Updated by Tom Clegg over 2 years ago
19362-all-webdav-via-sitefs @ 25162c18b4290f7763cd545f2d94d96650ebc9d9 -- developer-run-tests: #3287
Merges the sitefs and non-sitefs code paths, so now all requests are done with a sitefs on the backend, using the per-token session cache.
Includes some needed improvements to the filesystem code in sdk/go/arvados:- Fix deadlock in Rename (when renaming to/from root dir of a collectionfs inside a sitefs)
- Sync() in a collectionfs checks for upstream changes before trying to save local changes (if both have changes, upstream wins)
- Unexpected errors (forbidden/unauthorized) are returned to caller with HTTPStatus() interface instead of being converted to os.ErrNotExist
Happily there is a net improvement in LOC: "14 files changed, 523 insertions(+), 777 deletions(-)"
(note this doesn't actually fix/test the collection-out-of-sync issue yet)
Updated by Tom Clegg over 2 years ago
- Target version changed from 2022-09-14 sprint to 2022-09-28 sprint
Updated by Lucas Di Pentima over 2 years ago
Some minor comments & questions:
- File
sdk/go/arvados/fs_site.go
- Lines 158-169: The GET documentation at https://doc.arvados.org/v2.4/api/methods/collections.html already say that only
pdh
&manifest_text
are returned when requesting a collection by PDH, has that behavior changed? (that is, do we need to update our docs?) - Changes on
mountByID()
&mountCollection()
signatures didn't require any update on thefs_site_test.go
file, so I'm assuming we're not testing them. Especially the return error code change. Do you think it would be an excellent opportunity to do so?
- Lines 158-169: The GET documentation at https://doc.arvados.org/v2.4/api/methods/collections.html already say that only
- If we're removing
keepweb_collectioncache
prometheus counters/metrics, should we say so on the upgrading docs to warn admins that may be relying on them? - Are
Collections.WebDAVCache.MaxUUIDEntries
andCollections.WebDAVCache.MaxCollectionEntries
still relevant?
The rest LGTM
Updated by Tom Clegg over 2 years ago
Lucas Di Pentima wrote in #note-10:
- Lines 158-169: The GET documentation at https://doc.arvados.org/v2.4/api/methods/collections.html already say that only
pdh
&manifest_text
are returned when requesting a collection by PDH, has that behavior changed? (that is, do we need to update our docs?)
I think the docs are correct, but test stubs don't always behave that way (also, belt+suspenders). I've updated the comment "Even if the controller returns more fields..."
- Changes on
mountByID()
&mountCollection()
signatures didn't require any update on thefs_site_test.go
file, so I'm assuming we're not testing them. Especially the return error code change. Do you think it would be an excellent opportunity to do so?
Those are used as (*vdirnode).create methods. They're exercised in tests, but mountByID is only called from (*vdirnode)Child() via vn.create(vn, name)
, which you can't find by searching for mountByID(
.
- If we're removing
keepweb_collectioncache
prometheus counters/metrics, should we say so on the upgrading docs to warn admins that may be relying on them?
Ah, good call. Added a release note.
- Are
Collections.WebDAVCache.MaxUUIDEntries
andCollections.WebDAVCache.MaxCollectionEntries
still relevant?
Oops, no. Removed from config struct & config.default.yml, and noted that in the release note too.
19362-all-webdav-via-sitefs @ f1ec6199fbe9b6abdba3a9eba95eda7b46eb5265 -- developer-run-tests: #3296
19362-all-webdav-via-sitefs @ a88aa282a8da52589053244f4ef41283d51f629a (f1ec6199 plus api doc fix)
Updated by Tom Clegg over 2 years ago
19362-s3-webdav-sync @ 6289b570a8eb6aea964c14c8e8c5bf11443f726d -- developer-run-tests: #3301
This changes the sitefs implementation so the different paths to any given project or collection dir (e.g., by_id/$coll_uuid, by_id/$proj_uuid/$coll_name/, home/$proj_name/$coll_name) behave like hardlinks to the same collection: if you create/write/delete a file in one of those paths, it is immediately visible in all others too.
In the context of webdav/s3 this means that during a write operation, we can identify the collection that was modified, and splice the updated version into by_id/$coll_uuid in the current token's persistent readonly sitefs, so the update will be visible in a following read operation without a reload.
This is much more efficient than throwing out the entire cached session after each write, which is how we previously achieved read-after-write consistency.
Updated by Lucas Di Pentima over 2 years ago
Some comments/questions:
- File
sdk/go/arvados/fs_site.go
- Lines 161 & 164: Could those
Unlock()
calls be replaced with just one just after then
var assignment? - Lines 245, 247, 250, 263, 264: I find so much locking/unlocking calls from the same lock somewhat confusing. Not sure if it can be expressed in simpler terms, but just wanted to mention it.
- Lines 161 & 164: Could those
- I think some of the above lock/unlock ops can be removed, I'm reading that although map ops are not atomic, doing map reading operations concurrently is safe (https://go.dev/doc/faq#atomic_maps), wdyt?
The rest LGTM.
Updated by Tom Clegg over 2 years ago
- Lines 161 & 164: Could those
Unlock()
calls be replaced with just one just after then
var assignment?
Yes, good point.
- Lines 245, 247, 250, 263, 264: I find so much locking/unlocking calls from the same lock somewhat confusing. Not sure if it can be expressed in simpler terms, but just wanted to mention it.
I've cleaned up and commented these.
- I think some of the above lock/unlock ops can be removed, I'm reading that although map ops are not atomic, doing map reading operations concurrently is safe (https://go.dev/doc/faq#atomic_maps), wdyt?
That's true, but I don't think there's any opportunity to avoid locking on read ops here, because it's always possible another goroutine is writing.
19362-s3-webdav-sync @ bf6539ec9d5f207be4be363720ef118e25e7abbe -- developer-run-tests: #3302
Updated by Lucas Di Pentima over 2 years ago
Thanks for the clarification, LGTM, thanks!
Updated by Tom Clegg over 2 years ago
- % Done changed from 50 to 100
- Status changed from In Progress to Resolved
Applied in changeset arvados|a9be3117466506dffc39617be1c58406c5914e4b.