Bug #10008
open
[SDKs] [Python] When reading data through Collection et al., signatures should refresh automatically when needed
Added by Tom Clegg over 8 years ago.
Updated 10 months ago.
Release relationship:
Auto
Description
Background¶
TokenExpiryTest in source:services/fuse/tests/test_mount.py is slow and unreliable because it relies on sleep(). (fixed)
There is signature-refreshing logic ("re-fetch collection from API server before signatures expire") in arv-mount. However:
- It checks expiry times only when a file is opened. If a file is left open for a long time, reading from the file will eventually stop working.
- Other programs using the Python SDK need the same auto-refresh behavior for the same reasons as arv-mount, but no facility is provided.
Proposed fix¶
Implement manifest-refreshing logic in the Python SDK. Reading from a file should always work:
- regardless of how much time passes between instantiating the Collection, opening a file, writing, and reading.
- regardless of whether the collection is writable or not, and regardless of whether it is referenced by PDH or UUID.
This will make the arv-mount token-refresh mechanism (added in notes 4-6 here) redundant, so that can be reverted.
WIP: 10008-flaky-token-test
Unfortunately, the new test seems to indicate the token-refresh behavior is actually broken:
- time is mocked so it appears to advance by 13 days each time open() is called
- open() is called 3 times in a collection directory
- API server logs show a single "get collection" request (expect 3)
- Status changed from New to In Progress
- Assigned To set to Tom Clegg
- Story points set to 0.5
- Target version set to 2016-10-26 sprint
- File
services/fuse/arvados_fuse/__init__.py
:
- Line 504: Could you please explain in a comment why is the parent traversal needed on
open()
?
- Line 595: Is the method
opendir()
needing the same parent traversal to touch()
all of its ancestors too?
- Is there any test covering the case when the token is expired? If I’m understanding this test correctly, it checks that the token is refreshed if the fuse is accessed every 13 days, right?
Lucas Di Pentima wrote:
- File
services/fuse/arvados_fuse/__init__.py
:
- Line 504: Could you please explain in a comment why is the parent traversal needed on
open()
?
Added.
- Line 595: Is the method
opendir()
needing the same parent traversal to touch()
all of its ancestors too?
The passage of time (and the resulting invalidation of the signature tokens) is the only kind of event we don't expect websocket to tell us about. So opendir() doesn't really need this.
Of course, I suppose that means this really belongs in read(): if a file is read after being left open for 2 weeks, even open() won't have a chance to refresh.
- Is there any test covering the case when the token is expired? If I’m understanding this test correctly, it checks that the token is refreshed if the fuse is accessed every 13 days, right?
Yes, that's right. There test covers only "old enough to be nearly expired", not "actually expired". I'm not too worried about it. This would be a good integration test, but for that we'd also have to figure out how to convince keepstore to expire tokens immediately...
7bf89e17945bb6610d6339409a6a411803851434
Notes from meeting:
- Checking in read() shouldn't wreck performance
- Even checking in read() might not be enough
- we will initiate an update, but will the read progress before the updated response arrives?
- if we have a new collection when we're fulfilling the read(), but we created the collection, subdir, and file inodes back when we had the old collection, will we use the new collection's block locators, or will the cached inodes still cause us to use the old ones?
- It would be better to fix this (re-fetch the manifest when reading data, if we notice the tokens expire soon) in the Python SDK so arv-mount wouldn't have to do anything special. However, if it's not too hard to fix arv-mount, we might as well do that in the meantime.
A few notes:
Operations.lookup()
calls p[name]
which turns into Directory.__getitem__
which is wrapped by @check_update
. This means when a lookup happens, it is supposed to synchronously refresh the underlying directory if it is stale.
- FUSE directory entry caching behavior has changed over kernel versions, so it is possible that
lookup()
used to always occur before open()
, but now sometimes does not happen if the kernel has the directory entry in cache.
CollectionDirectory.update
has an early exit on self.collection_record is not None and portable_data_hash_pattern.match(self.collection_locator)
which suggests that collections referenced by portable data hash never get refreshed, even if they need to be in order to refresh tokens.
- ArvadosFiles which are the same except for token updates are applied as
TOK
changes (in other words, Collection.update() should be presumed to work, and the problem is that it isn't being called when it is supposed to.)
Updating is synchronous and it will hold a lock on the collection which will prevent any other activity on the collection while it is updating, so there should not be an update race.
Suggested fix:
CollectionDirectory.update
needs to force update when poll time is exceeded, even for read only collections.
FuseArvadosFile.readfrom()
should be wrapped with @check_update
. When created, FuseArvadosFile
should have poll time set to match the parent.
FuseArvadosFile
needs to have a reference to either the parent directory object or the inodes table (to look up parent_inode) so that it can initiate an update()
.
- Subject changed from [FUSE] [Tests] Fix flaky test TokenExpiryTest to [FUSE] [SDKs] When reading data through Collection et al., signatures should refresh automatically when needed
- Target version deleted (
2016-10-26 sprint)
- Category changed from Tests to FUSE
- Assigned To deleted (
Tom Clegg)
- Assigned To set to Tom Clegg
- Target version set to 2017-03-01 sprint
- Description updated (diff)
- Subject changed from [FUSE] [SDKs] When reading data through Collection et al., signatures should refresh automatically when needed to [SDKs] [Python] When reading data through Collection et al., signatures should refresh automatically when needed
- Category changed from FUSE to SDKs
- Status changed from In Progress to New
- Assigned To deleted (
Tom Clegg)
- Target version deleted (
2017-03-01 sprint)
- Story points deleted (
0.5)
- Target version set to Arvados Future Sprints
- Target version deleted (
Arvados Future Sprints)
- Target version set to Future
Also available in: Atom
PDF