Bug #22420
closedfile contents in cached arv-mount directory don't update
Description
When a collection is updated upstream, it updates the file contents using this method:
def replace_contents(self, other):
This ensures the inode is preserved.
However, this method does not call notify
, as a result the event doesn't propagate to FUSE to send cache invalidation to the kernel.
Updated by Peter Amstutz about 1 month ago
- Position changed from -930828 to -930827
- Status changed from New to In Progress
Updated by Peter Amstutz about 1 month ago
- Target version changed from Development 2025-01-08 to Development 2025-01-29
Updated by Peter Amstutz about 1 month ago
- Status changed from In Progress to New
- Category set to FUSE
Updated by Peter Amstutz about 1 month ago
- Status changed from New to In Progress
Updated by Peter Amstutz about 1 month ago
Have this partially working, but there's still some circumstance where this happens -- this is just a guess but I think it might happen when the collection is created after the mount has been created.
Updated by Peter Amstutz 24 days ago
22420-file-update @ 6243e782a5cf458421c939bebc7b30c1e0007683
Updated by Peter Amstutz 24 days ago
- Related to Bug #22465: arv-mount may not notice permission changes granting/removing write access added
Updated by Peter Amstutz 24 days ago
- All agreed upon points are implemented / addressed. Describe changes from pre-implementation design.
- Behaves better in terms of picking up upstream change.
- Anything not implemented (discovered or discussed during work) has a follow-up story.
- From inspecting the code, it looks like there's probably an issue responding to permission changes. I didn't want to get into it, filed #22465
- Code is tested and passing, both automated and manual, what manual testing was done is described.
- Passes existing python SDK and FUSE tests
- Manual testing consists of setting up arv-mount on two different systems and using one system to make changes to files and then confirming that those changes are visible on the other system.
- New or changed UX/UX and has gotten feedback from stakeholders.
- n/a
- Documentation has been updated.
- n/a
- Behaves appropriately at the intended scale (describe intended scale).
- I don't think this changes the scale.
- Considered backwards and forwards compatibility issues between client and server.
- I made one change to start using
can_write
instead ofwritable_by
to determine permissions, this flag has been present for at least a couple of major releases prior to 3.0 so I think it's safe to use.
- I made one change to start using
- Follows our coding standards and GUI style guidelines.
- yes
This branch addresses various ways a file exposed in the FUSE mount may become stale.
The first change was to make sure that when we know a file has changed (a "MOD" event from the collection) we tell the kernel to invalidate it.
The next change was to determine polling time (this might be more accurately described as a TTL on the cached collection content) based on whether it was mounted by PDH or UUID. Previously it was setting a very long TTL (1/2 of the blob signature TTL) for all cases.
The next change was to rework how the Collection "update" method works, so that it is properly comparing local contents against the upstream contents to detect changes. As a bonus, this gets rid of the "known_past_version" tracking, which I was never very happy about as it potentially could have an unbounded history of past versions, but also probably buggy if a collection was deliberately reverted to an earlier PDH.
Finally, I adjusted flush and fsync to flush the parent directory as well, which should ensure the collection is committed.
I noticed that, by how FUSE works, it appears that it sends a flush() before a release(). I believe this means when a file is closed, it is saved back to the upstream collection immediately.
Updated by Tom Clegg 23 days ago
Changes LGTM.
Is it feasible to (automated-)test any of this?
In sdk/python/arvados/collection.py → RichCollectionBase.notify, if I'm following correctly, this docstring should mention that item is actually (olditem, newitem) in the MOD case:
* item: arvados.arvfile.ArvadosFile | arvados.collection.Subcollection --- The new contents at `name` within `collection`.
Updated by Peter Amstutz 17 days ago
22420-file-update @ 512944c236ad20afb3027fa988d3a8f8fbf0fe1f
Despite the previous improvements to managing collection updates, I was still intermittently observing stale content in files.
I eventually determined that the problem was the file data being sticky in the kernel cache -- forcing the kernel to flush caches (https://stackoverflow.com/questions/161885/how-to-evict-file-from-system-cache-on-linux) allowed me to get the correct up-to-date file content.
When a file is updated on the backend, FUSE is supposed to send an invalidation to the kernel. Sometimes, the invalidation doesn't happen. I have been unable to reproduce this reliably enough to work out the precisely why kernel invalidation sometimes doesn't happen. However, intuitively, we're operating on model of polling for changes and we expect the file content to be re-validated anyway. So the most recent set of changes have it check on file open()
whether the file is "stale", and send another invalidation to the kernel if so.
Finally, because updating collections should now be working as intended, I added a --poll-time
option to arv-mount which lets you control the eventual consistency tradeoff between having up-to-date results vs local metadata caching and not having to wait for results. The default is 15 seconds.
(Note: I'm calling it "polling" in the code but the behavior is actually: when thing X is accessed, and it hasn't been refreshed in Y seconds, then update first. It doesn't refresh things if they are not accessed).
Updated by Peter Amstutz 11 days ago
Tom Clegg wrote in #note-14:
I agree
--poll-time
is a misleading name, and I suppose right now is the only good time to come up with a better one. Perhaps something along the lines of--refresh-stale
?
How about --refresh-time
?
Updated by Peter Amstutz 11 days ago
22420-file-update @ f343b01918d5fa8f50b46c768bb1d01b041f46e6
Renamed --poll-time
to --refresh-time
.
Updated by Peter Amstutz 10 days ago
- Target version changed from Development 2025-01-29 to Development 2025-02-12
Updated by Peter Amstutz 10 days ago
Tom Clegg wrote in #note-19:
Code LGTM.
Only nit: the "refresh on next access" idea doesn't come across for me in the docstring for
--refresh-time
-- maybe just adding the words "on next access" at the end would improve it?
I agree, that is clearer. Will make that addition and merge.
Updated by Peter Amstutz 10 days ago
- Status changed from In Progress to Resolved