Bug #10008

[SDKs] [Python] When reading data through Collection et al., signatures should refresh automatically when needed

Added by Tom Clegg about 1 year ago. Updated about 1 month ago.

Status:NewStart date:09/12/2016
Priority:NormalDue date:
Assignee:-% Done:

100%

Category:SDKs
Target version:Arvados Future Sprints
Story points-Remaining (hours)0.00 hour
Velocity based estimate-

Description

Background

TokenExpiryTest in source:services/fuse/tests/test_mount.py is slow and unreliable because it relies on sleep(). (fixed)

There is signature-refreshing logic ("re-fetch collection from API server before signatures expire") in arv-mount. However:
  • It checks expiry times only when a file is opened. If a file is left open for a long time, reading from the file will eventually stop working.
  • Other programs using the Python SDK need the same auto-refresh behavior for the same reasons as arv-mount, but no facility is provided.

Proposed fix

Implement manifest-refreshing logic in the Python SDK. Reading from a file should always work:
  • regardless of how much time passes between instantiating the Collection, opening a file, writing, and reading.
  • regardless of whether the collection is writable or not, and regardless of whether it is referenced by PDH or UUID.

This will make the arv-mount token-refresh mechanism (added in notes 4-6 here) redundant, so that can be reverted.


Subtasks

Task #10330: Review 10008-check-token-exp-on-openResolvedTom Clegg


Related issues

Related to Arvados - Bug #6833: [FUSE] arv-mount caches manifests too long, ends up using... Resolved 07/30/2015

Associated revisions

Revision c4e25267
Added by Tom Clegg about 1 year ago

10008: Disable flaky test. refs #10008

Revision 0066dc77
Added by Tom Clegg 12 months ago

Merge branch '10008-check-token-exp-on-open' refs #10008

History

#1 Updated by Tom Clegg about 1 year ago

WIP: 10008-flaky-token-test

Unfortunately, the new test seems to indicate the token-refresh behavior is actually broken:
  • time is mocked so it appears to advance by 13 days each time open() is called
  • open() is called 3 times in a collection directory
  • API server logs show a single "get collection" request (expect 3)

#2 Updated by Tom Clegg about 1 year ago

  • Status changed from New to In Progress
  • Assignee set to Tom Clegg
  • Story points set to 0.5

#3 Updated by Tom Clegg about 1 year ago

  • Target version set to 2016-10-26 sprint

#4 Updated by Tom Clegg 12 months ago

10008-check-token-exp-on-open

test b73985d8a0c9173aec57f6a81fe540b2813a5bff

#5 Updated by Lucas Di Pentima 12 months ago

  • File services/fuse/arvados_fuse/__init__.py:
    • Line 504: Could you please explain in a comment why is the parent traversal needed on open()?
    • Line 595: Is the method opendir() needing the same parent traversal to touch() all of its ancestors too?
  • Is there any test covering the case when the token is expired? If I’m understanding this test correctly, it checks that the token is refreshed if the fuse is accessed every 13 days, right?

#6 Updated by Tom Clegg 12 months ago

Lucas Di Pentima wrote:

  • File services/fuse/arvados_fuse/__init__.py:
    • Line 504: Could you please explain in a comment why is the parent traversal needed on open()?

Added.

  • Line 595: Is the method opendir() needing the same parent traversal to touch() all of its ancestors too?

The passage of time (and the resulting invalidation of the signature tokens) is the only kind of event we don't expect websocket to tell us about. So opendir() doesn't really need this.

Of course, I suppose that means this really belongs in read(): if a file is read after being left open for 2 weeks, even open() won't have a chance to refresh.

  • Is there any test covering the case when the token is expired? If I’m understanding this test correctly, it checks that the token is refreshed if the fuse is accessed every 13 days, right?

Yes, that's right. There test covers only "old enough to be nearly expired", not "actually expired". I'm not too worried about it. This would be a good integration test, but for that we'd also have to figure out how to convince keepstore to expire tokens immediately...

7bf89e17945bb6610d6339409a6a411803851434

Notes from meeting:
  • Checking in read() shouldn't wreck performance
  • Even checking in read() might not be enough
    • we will initiate an update, but will the read progress before the updated response arrives?
    • if we have a new collection when we're fulfilling the read(), but we created the collection, subdir, and file inodes back when we had the old collection, will we use the new collection's block locators, or will the cached inodes still cause us to use the old ones?
  • It would be better to fix this (re-fetch the manifest when reading data, if we notice the tokens expire soon) in the Python SDK so arv-mount wouldn't have to do anything special. However, if it's not too hard to fix arv-mount, we might as well do that in the meantime.

#7 Updated by Peter Amstutz 12 months ago

A few notes:

  1. Operations.lookup() calls p[name] which turns into Directory.__getitem__ which is wrapped by @check_update. This means when a lookup happens, it is supposed to synchronously refresh the underlying directory if it is stale.
  2. FUSE directory entry caching behavior has changed over kernel versions, so it is possible that lookup() used to always occur before open(), but now sometimes does not happen if the kernel has the directory entry in cache.
  3. CollectionDirectory.update has an early exit on self.collection_record is not None and portable_data_hash_pattern.match(self.collection_locator) which suggests that collections referenced by portable data hash never get refreshed, even if they need to be in order to refresh tokens.
  4. ArvadosFiles which are the same except for token updates are applied as TOK changes (in other words, Collection.update() should be presumed to work, and the problem is that it isn't being called when it is supposed to.)

Updating is synchronous and it will hold a lock on the collection which will prevent any other activity on the collection while it is updating, so there should not be an update race.

#8 Updated by Peter Amstutz 12 months ago

Suggested fix:

CollectionDirectory.update needs to force update when poll time is exceeded, even for read only collections.

FuseArvadosFile.readfrom() should be wrapped with @check_update. When created, FuseArvadosFile should have poll time set to match the parent.

FuseArvadosFile needs to have a reference to either the parent directory object or the inodes table (to look up parent_inode) so that it can initiate an update().

#9 Updated by Tom Clegg 12 months ago

  • Subject changed from [FUSE] [Tests] Fix flaky test TokenExpiryTest to [FUSE] [SDKs] When reading data through Collection et al., signatures should refresh automatically when needed

#10 Updated by Tom Clegg 12 months ago

  • Target version deleted (2016-10-26 sprint)

#11 Updated by Tom Clegg 12 months ago

  • Category changed from Tests to FUSE
  • Assignee deleted (Tom Clegg)

#12 Updated by Tom Clegg 8 months ago

  • Assignee set to Tom Clegg
  • Target version set to 2017-03-01 sprint

#13 Updated by Tom Clegg 8 months ago

  • Description updated (diff)

#14 Updated by Tom Clegg 8 months ago

  • Subject changed from [FUSE] [SDKs] When reading data through Collection et al., signatures should refresh automatically when needed to [SDKs] [Python] When reading data through Collection et al., signatures should refresh automatically when needed

#15 Updated by Tom Clegg 8 months ago

  • Category changed from FUSE to SDKs
  • Status changed from In Progress to New
  • Assignee deleted (Tom Clegg)
  • Target version deleted (2017-03-01 sprint)
  • Story points deleted (0.5)

#16 Updated by Tom Morris about 1 month ago

  • Target version set to Arvados Future Sprints

Also available in: Atom PDF