Bug #14012

arv-put does not check cached signatures when resuming

Added by Lucas Di Pentima 4 months ago. Updated 1 day ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
12/03/2018
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-

Description

When resuming an upload, the cached manifest may contain block signatures made with a different user token.
This would generate a PermissionDenied error when trying to create the collection after finishing the upload.


Subtasks

Task #14542: Review 14012-arvput-check-cacheResolvedPeter Amstutz

Associated revisions

Revision 8ff3fd06
Added by Lucas Di Pentima 1 day ago

Merge branch '14012-arvput-check-cache'
Closes #14012

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <>

History

#1 Updated by Lucas Di Pentima 16 days ago

  • Assigned To set to Lucas Di Pentima
  • Target version changed from To Be Groomed to 2018-12-12 Sprint

#2 Updated by Tom Clegg 16 days ago

Suggestion: Do a HEAD request for the first cached block locator. If that doesn't work, consider the cached blocks invalid.

#3 Updated by Lucas Di Pentima 15 days ago

  • Status changed from New to In Progress

#4 Updated by Lucas Di Pentima 11 days ago

Updates at 1e02903b90dbaf1f0e9fac222f65e3969b5f0352 - branch 14012-arvput-check-cache
Test run: https://ci.curoverse.com/job/developer-run-tests/995/

Replaced block expiration check on every file with a block signature check by making a HEAD request as suggested by Tom at early stages of the execution so it can fail fast.
When encountering an invalid manifest, provide options to the user: either start a new cache or skip it entirely.

#5 Updated by Peter Amstutz 10 days ago

  • loc = self._state['manifest'][m.start():m.end()]
    

    Isn't this the same as m.group(0) ?
  • There's a note about "do something different when we have a token refresh API". Well, we have that now (HEAD request and X-Keep-Locator response header), so it seems like the most robust solution would be to go through and refresh every token. This would both check that the all the tokens are valid (not just the first one), and reset the time-to-live timestamps. The main concern is probably performance, refreshing 1000s of tokens up front might take some time.

#6 Updated by Lucas Di Pentima 10 days ago

As discussed on chat, we should be checking the block signature with the lowest expiration timestamp.

#7 Updated by Peter Amstutz 8 days ago

(12/04/2018 05:21:39 PM) tom: hm. If we are capable of "re-upload the affected files if signatures are expired" then I think we should keep that capability.
(12/04/2018 05:23:29 PM) tom: Seems like maybe that should continue to work as it did before... maybe the initial check should be "find oldest still-valid signature by iterating and comparing the exp timestamps, do a HEAD req on it, and if it's invalid, warn & clear/don't use the cache"?
(12/04/2018 05:27:58 PM) tom: There's no point checking the already-expired signatures since a) we wouldn't have tried to use them anyway and b) of course they won't validate... I think the only thing we really care about here is the most obvious case where all signatures are invalid despite having exp times in the future -- we want to detect right away that something's wrong.

#8 Updated by Lucas Di Pentima 8 days ago

Updates at 6fc7bd062
Test run: https://ci.curoverse.com/job/developer-run-tests/1002/

  • Before loading the cached manifest for resuming, search for the oldest non-expired block signature and do a HEAD request to check if it's valid.
  • Restored the per-file expiration check when deciding if each file should be uploaded.
  • Added tests for all cases.

#9 Updated by Peter Amstutz 8 days ago

  • Could you add a test case to get some better coverage on _cache_manifest_valid(), I'm thinking a matrix of test cases with two blocks where the 1st block is expired/older/newer/fails head() and the 2nd block is expired/older/newer/fails head().
  • Suggest tweaking error message:

    "arv-put: Cache seems to contain invalid data: it may have expired",
    " or been created with another arvados user's credentials.",
    " Use --no-resume to start a new cache file.",
    " Use --no-cache to skip current cache file usage."]))

    "arv-put: Resume cache contains invalid signature: it may have expired",
    " or been created with another Arvados user's credentials.",
    " Switch user or use one of the following options to restart upload:"
    " --no-resume to start with a new resume cache.",
    " --no-cache to disable resume cache."]))

#10 Updated by Lucas Di Pentima 4 days ago

Updates at f1dfa6a02
Test run: https://ci.curoverse.com/job/developer-run-tests/1008/

  • Updated the error message as requested
  • Took some time to figure out the correct way of testing _cached_manifest_valid() method, as it's part of the entire ArvPutUploadJob initialization process: I created a mock subclass that does just the minimum to be able to call the method and test its return value.

#11 Updated by Lucas Di Pentima 2 days ago

  • Target version changed from 2018-12-12 Sprint to 2018-12-21 Sprint

#12 Updated by Peter Amstutz 1 day ago

Lucas Di Pentima wrote:

Updates at f1dfa6a02
Test run: https://ci.curoverse.com/job/developer-run-tests/1008/

  • Updated the error message as requested
  • Took some time to figure out the correct way of testing _cached_manifest_valid() method, as it's part of the entire ArvPutUploadJob initialization process: I created a mock subclass that does just the minimum to be able to call the method and test its return value.

This LGTM, thanks.

#13 Updated by Lucas Di Pentima 1 day ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF