Project

General

Profile

Actions

Idea #3706

closed

[SDKs] Remove fallback-to-keep warning from python SDK if block hash has a permission signature

Added by Tom Clegg over 10 years ago. Updated about 10 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
SDKs
Target version:
Start date:
07/31/2014
Due date:
Story points:
0.5

Description

Background: Crunch job tasks can use Keep as a cache for manifest fragments, which crunch-job retrieves at the end of the job using arv-get {block-locator}. The arv-get program does not know at this point whether the supplied block-locator is a reference to a block to retrieve from Keep or a collection to retrieve from the API server.

Current behavior: arv-get issues a warning.
  • Sat Sep 27 02:00:42 2014 2014-09-27 02:00:41 arvados.collection[7714] WARNING: \
    API server did not return Collection '4de3386d281fec68fdc91c31d5ffd3fe+51502+Axxx@54380491'. Trying to fetch directly from Keep (deprecated).
    
Desired behavior:
  • If arv-get is asked to retrieve something that looks like a block hash and has a permission signature hint, it should try fetching it from Keep first, then issue a warning and fall back on API server.
  • If arv-get is asked to retrieve something that looks like a block hash and has no permission signature hint, it should try fetching it from API server, then fall back on Keep. If both fail, the warning should indicate that both methods were attempted and both failed. The existing warning ("plan A failed, trying plan B now") should be removed, i.e., if Keep succeeds, no warning is printed.
    • This scenario is useful only when keepstore services are configured to permit anonymous unauthenticated read access. Otherwise, it wastes an HTTP request but is otherwise harmless.

Subtasks 3 (0 open3 closed)

Task #4367: Review 3706-keep-warningResolvedTom Clegg07/31/2014Actions
Task #4346: Write testsResolvedTom Clegg07/31/2014Actions
Task #4345: Do not warn unless both methods failResolvedTom Clegg07/31/2014Actions

Related issues 2 (0 open2 closed)

Related to Arvados - Bug #5298: [SDKs] Should CollectionReader.all_streams() iterate lines in the manifest, or "logical" streams?Closed02/13/2015Actions
Blocks Arvados - Support #4013: [SDKs] Pipeline instance logging emits misleading warning: API server did not return Collection "...."Closed09/26/2014Actions
Actions #1

Updated by Tom Clegg about 10 years ago

  • Subject changed from Remove fallback-to-keep code from arv-mount to Remove fallback-to-keep warning from python SDK if block hash has a permission signature
  • Assigned To set to Tom Clegg
Actions #2

Updated by Tom Clegg about 10 years ago

  • Subject changed from Remove fallback-to-keep warning from python SDK if block hash has a permission signature to [SDKs] Remove fallback-to-keep warning from python SDK if block hash has a permission signature
  • Description updated (diff)
Actions #3

Updated by Tom Clegg about 10 years ago

  • Parent task deleted (#3261)
Actions #4

Updated by Tom Clegg about 10 years ago

  • Tracker changed from Task to Idea
  • Target version changed from 2014-10-08 sprint to Arvados Future Sprints
  • Story points set to 0.5
Actions #5

Updated by Tom Clegg about 10 years ago

  • Estimated time deleted (2.00 h)
Actions #6

Updated by Tom Clegg about 10 years ago

  • Category set to SDKs
Actions #7

Updated by Ward Vandewege about 10 years ago

  • Target version changed from Arvados Future Sprints to 2014-10-29 sprint
Actions #8

Updated by Brett Smith about 10 years ago

  • Assigned To changed from Tom Clegg to Brett Smith

Stealing the ticket because it's easy and I'm working in this space anyway for #4248.

Actions #9

Updated by Brett Smith about 10 years ago

  • Assigned To changed from Brett Smith to Tom Clegg

Un-stealing this ticket because #4248 isn't even a bug, so I'm not working on this anymore (although working on the code helped me realize my mistake). Sorry for the noise.

Actions #10

Updated by Tom Clegg about 10 years ago

  • Status changed from New to In Progress
Actions #11

Updated by Tom Clegg about 10 years ago

  • Target version changed from 2014-10-29 sprint to 2014-11-19 sprint
Actions #12

Updated by Tom Clegg about 10 years ago

Along the way I noticed a few things were being broken by CollectionReader and CollectionWriter automatically applying normalize() in various places.
  • "arv-get {manifesthash} -" makes a CollectionReader and spits out its manifest_text(). This could (should?) be done differently -- this form is handy for hashes of non-manifest data blocks too, which we shouldn't be [coming so close to] trying to parse -- but in the meantime at least it's easy to make CollectionReader return the manifest just as it was retrieved.
  • The --max-manifest-depth argument was ineffective. We actually had tests confirming that max_manifest_depth was being ignored!
  • Generally, CollectionReader and CollectionWriter would accept A and behave as if they had been given B -- hashes don't match, files/streams are reordered, etc. -- and there was no way for the caller to avoid this short of re-implementing them.

I fixed this by adding a normalize() method, and calling it from arv-normalize and (when --normalize is given) arv-put. (The arv-put usage message mentions that --normalize makes --max-manifest-depth ineffective.)

Incidentally, normalize() was masking a CollectionWriter bug where calling finish_current_file() followed by finish() or manifest_text() would produce an extra zero-length file with the same name as the previous file.

Other stuff:
  • CollectionReader's stripped_manifest() method was (incorrectly) stripping only permission hints. It was also stripping +A[stuff] from any filenames (but the last file in the stream was safe). Meanwhile, CollectionWriter's manifest_text(strip=True) path was (correctly) stripping all non-size hints, using Stream's "stripped_manifest" method. Now, both CollectionReader and CollectionWriter use a common stripped_manifest() method which doesn't have those two bugs.
  • I moved some regular expressions from CollectionReader to the util package.

I noticed that the Keep client's error message, when it can't find a block, is just the block hash it was looking for. This is a bit confusing and not especially helpful, but I decided not to address that here too. The "normalize" adventure was quite enough.

Actions #13

Updated by Brett Smith about 10 years ago

First, mea culpa on the bad tests for max_manifest_depth. I wrote those when I was adding retry support to arv-put, which was my first big foray into Keep. I think I saw how CollectionWriter put files above the limit in the same block, and figured that was the intended behavior. This is an excellent lesson in the dangers of writing tests after the fact.

Main comments on 0e8d9b4:

  • Would it make more sense to move the populate try/except work into the individual _populate methods? They could return any exception raised, or else None. Then the main _populate can just say error_via_keep = self._populate_from_keep(). It would at least cut down on some redundant code.
  • Two things about except e (although this might just be a typo). First, this does not do what you might think: rather than catching every exception and saving it to e, it tries to match the raised exception against e. For example, e could be a tuple of exception classes defined at runtime. This code as-is will raise a NameError trying to find e if an exception is raised.
    Second, a bare except in Python is almost always a mistake, because it will catch BaseException, which includes KeyboardInterrupt, MemoryError, and other exceptions you almost certainly don't want to catch. Yes, it's unfortunate that this is the exact opposite of the situation in Ruby. Them's the breaks.
    Prefer except Exception as e, as elsewhere in this code.
  • Having normalize() return self is really un-Pythonic. Methods that primarily mutate internal state conventionally return None (a.k.a. don't have a return) to signal exactly this to the caller. If we want our SDK to feel familiar to people familiar with Python, we should follow suit.
    What if we added a normalize=False argument to manifest_text()? That seems to be the primary use case.
  • Since --normalize and --max-manifest-depth are incompatible, let's put them into a mutually_exclusive_group in the parser, so arv-put will generate a nice error message if the user specifies both.
  • I'm not sure I see where the API server is mocked to return the wrong data in test_try_keep_first_if_permission_hint. e.g., you could call mock_get_collection with a collection other than foo_file.
  • test_read_nonnormalized_manifest_with_collection_reader can just check against nonnormal in the first assertEqual. Because Python's strings are immutable, there's no risk that the tested function could warp the test.
  • The Python style guide calls for two blank lines after a class definition; this got lost after CollectionBase.

I wrote some additional style comments while I was going through, before I realized that these applied to code that you merely moved around, and didn't actually touch. Brownie points if you're up for addressing them, but no need to hold up the merge if you're not:

  • The sorting part of normalize() could be one-lined: self._streams = [normalize_stream(s, streams[s]) for s in sorted(streams)]
  • Empty collections are falsy in Python, so if list is preferred over if len(list) > 0.
  • String concatenation is relatively expensive in Python. It's preferable to build an array and turn it into a string when you're done with str.join (e.g., in stripped_manifest()).

Thanks.

Actions #14

Updated by Tom Clegg about 10 years ago

Brett Smith wrote:

First, mea culpa on the bad tests for max_manifest_depth. I wrote those when I was adding retry support to arv-put, which was my first big foray into Keep. I think I saw how CollectionWriter put files above the limit in the same block, and figured that was the intended behavior. This is an excellent lesson in the dangers of writing tests after the fact.

Yeah, I had assumed this was a "make sure this continues doing whatever it did before I touched it"...

  • Would it make more sense to move the populate try/except work into the individual _populate methods? They could return any exception raised, or else None. Then the main _populate can just say error_via_keep = self._populate_from_keep(). It would at least cut down on some redundant code.

That had occurred to me but I didn't get as far as convincing myself it was that much better. But now that you mention it, I've done it and I think it was worth it.

Prefer except Exception as e, as elsewhere in this code.

Whoops. Fixed, thank you.

  • Having normalize() return self is really un-Pythonic. Methods that primarily mutate internal state conventionally return None (a.k.a. don't have a return) to signal exactly this to the caller. If we want our SDK to feel familiar to people familiar with Python, we should follow suit.
    What if we added a normalize=False argument to manifest_text()? That seems to be the primary use case.

I've rearranged this as

  • foo.normalize() → mutate and return None.
  • foo.manifest_text(normalize=True) → don't mutate, just return a normalized manifest_text. (Under the hood, create a new CollectionReader and normalize() it.)
  • Since --normalize and --max-manifest-depth are incompatible, let's put them into a mutually_exclusive_group in the parser, so arv-put will generate a nice error message if the user specifies both.

Good call. Done.

  • I'm not sure I see where the API server is mocked to return the wrong data in test_try_keep_first_if_permission_hint. e.g., you could call mock_get_collection with a collection other than foo_file.

Indeed, apparently it wasn't. I've added an ALT_COLLECTION (using the "bar_file" fixture) and now it really does test what it says it tests. (This time I tried breaking the code and making sure the test failed.)

  • test_read_nonnormalized_manifest_with_collection_reader can just check against nonnormal in the first assertEqual. Because Python's strings are immutable, there's no risk that the tested function could warp the test.

If I'm following you correctly: Those strings aren't identical. I'm guessing I confused you by testing two things at once here: (1) the files don't get rearranged, (2) the permission signature gets stripped. I split this into two assertions, and used re.sub() instead of giving another literal to make it (hopefully) more clear what's happening.

ce2362b

  • The Python style guide calls for two blank lines after a class definition; this got lost after CollectionBase.

Fixed.

I wrote some additional style comments while I was going through, before I realized that these applied to code that you merely moved around, and didn't actually touch. Brownie points if you're up for addressing them, but no need to hold up the merge if you're not:

Here I go, apparently I want those brownie points...

  • The sorting part of normalize() could be one-lined: self._streams = [normalize_stream(s, streams[s]) for s in sorted(streams)]

Done.

  • Empty collections are falsy in Python, so if list is preferred over if len(list) > 0.

Fixed ×3.

  • String concatenation is relatively expensive in Python. It's preferable to build an array and turn it into a string when you're done with str.join (e.g., in stripped_manifest()).

55a0e06 better? It does 1+Nstreams joins. It also avoids making an exception for the last token: although it's true that the last token is always a file token, the old code hinted that only the last token was exempt, which is a bit of a red herring.

Now at ce2362b

Actions #15

Updated by Brett Smith about 10 years ago

Reviewing ce2362b

Tom Clegg wrote:

Brett Smith wrote:

  • Would it make more sense to move the populate try/except work into the individual _populate methods? They could return any exception raised, or else None. Then the main _populate can just say error_via_keep = self._populate_from_keep(). It would at least cut down on some redundant code.

That had occurred to me but I didn't get as far as convincing myself it was that much better. But now that you mention it, I've done it and I think it was worth it.

Great, thanks. One small thing: in _populate, please prefer is None over == None and is not None over != None.

  • Having normalize() return self is really un-Pythonic. Methods that primarily mutate internal state conventionally return None (a.k.a. don't have a return) to signal exactly this to the caller. If we want our SDK to feel familiar to people familiar with Python, we should follow suit.
    What if we added a normalize=False argument to manifest_text()? That seems to be the primary use case.

I've rearranged this…

Thank you, I think this is a much more natural fit. You missed one small spot: a change in the FUSE driver is still expecting normalize() to return the object, making the driver crash and tests fail.

  • Since --normalize and --max-manifest-depth are incompatible, let's put them into a mutually_exclusive_group in the parser, so arv-put will generate a nice error message if the user specifies both.

Good call. Done.

This is making me wonder, should --normalize be the default behavior for arv-put, since it's been doing that all along? Someone who wanted neither normalization nor a max manifest depth could get that by saying --max-manifest-depth $BIGNUM, or we could provide --no-normalize. But I'm honestly not sure what the right answer is here, policy-wise.

A couple other small things:

  • This branch removes the line self.mock_get_collection(client, 404, None) from test_locator_init_fallback_to_keep. However, the default setup method mocks the API client to return a 200 for a requested collection. If I'm following correctly, without this line, CollectionReader could request the collection from the API server and pass, which I think thwarts the test's intent.
  • The condition at the end of CollectionWriter.manifest_text() is now a noop and can be simplified to return manifest.

Thanks.

Actions #16

Updated by Tom Clegg about 10 years ago

Brett Smith wrote:

Great, thanks. One small thing: in _populate, please prefer is None over == None and is not None over != None.

Fixed. Also fixed a few other existing uses of != and ==, including one where the result of the comparison was thrown away anyway:

  •          while now <= timeout:
    -            os.kill(server_pid, signal.SIGTERM) == None
    +            os.kill(server_pid, signal.SIGTERM)
    

Thank you, I think this is a much more natural fit. You missed one small spot: a change in the FUSE driver is still expecting normalize() to return the object, making the driver crash and tests fail.

Whoops. Fixed now. Maybe I'll run all the tests this time. :)

This is making me wonder, should --normalize be the default behavior for arv-put, since it's been doing that all along? Someone who wanted neither normalization nor a max manifest depth could get that by saying --max-manifest-depth $BIGNUM, or we could provide --no-normalize. But I'm honestly not sure what the right answer is here, policy-wise.

I've updated work_directory_tree so it always sorts entries. Unless I'm forgetting something, this is nearly1 equivalent to making arv-put default to --normalize, except that the blocks now get written in the same order as the files. Previously, when relying on normalize(), although the files appeared in sorted order, the data blocks were written based on the arbitrary order in which the directory entries were returned by os.listdir().

This change means readers who read all_files() in the order given in the resulting manifest will read blocks sequentially, which in general will perform better.

1 I suppose the remaining difference is that "arv-put --normalize" allows you to specify multiple overlapping paths and get a manifest that (more or less) hides the fact that you did that: e.g., "arv-put foo/bar.txt foo/baz.txt" would have one stream instead of two, but it would have two data blocks even if the files are tiny.

  • This branch removes the line self.mock_get_collection(client, 404, None) from test_locator_init_fallback_to_keep. However, the default setup method mocks the API client to return a 200 for a requested collection. If I'm following correctly, without this line, CollectionReader could request the collection from the API server and pass, which I think thwarts the test's intent.

Huh. I think I was expecting tutil.mock_responses(self.DEFAULT_MANIFEST, 404, 200) to return 404 for collections.get and then 200 for the "get" from the Keep server. In any case, I've changed it back to do mock_get_collection(...404...) followed by mock_responses(...200) since there seems to have been no reason to change this in the first place.

  • The condition at the end of CollectionWriter.manifest_text() is now a noop and can be simplified to return manifest.

Indeed, thanks. Fixed.

All tests passed before that very last thing. Re-running now just in case. But I think it will be fine. :)

Now at 411305d

Actions #17

Updated by Brett Smith about 10 years ago

Tom Clegg wrote:

Brett Smith wrote:

  • This branch removes the line self.mock_get_collection(client, 404, None) from test_locator_init_fallback_to_keep. However, the default setup method mocks the API client to return a 200 for a requested collection. If I'm following correctly, without this line, CollectionReader could request the collection from the API server and pass, which I think thwarts the test's intent.

Huh. I think I was expecting tutil.mock_responses(self.DEFAULT_MANIFEST, 404, 200) to return 404 for collections.get and then 200 for the "get" from the Keep server.

It might have that effect if the API client wasn't itself a mock. Maybe we use that strategy in another test suite?

411305d is good to merge, thanks.

Actions #18

Updated by Brett Smith about 10 years ago

I recanted on IRC. For posterity: I think _work_trees needs to say: max_depth = (None if max_manifest_depth < 0 else max_manifest_depth). You may also need to put some guards on CollectionWriter's internal handling of max_manifest_depth, to make sure it doesn't go from 0 to -1. Because listdir_recursive never returned directory entries before, that was never an issue.

Actions #19

Updated by Anonymous about 10 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 80 to 100

Applied in changeset arvados|commit:68dae835a2f8bebd5664211a60fc76d50e5ecc23.

Actions

Also available in: Atom PDF