Project

General

Profile

Actions

Bug #5692

closed

[SDK] Can't get signed manifest locator from new Collection API.

Added by Peter Amstutz about 9 years ago. Updated almost 9 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
0.5

Subtasks 2 (0 open2 closed)

Task #5710: Review 5692-pysdk-manifest-text-flushResolvedPeter Amstutz04/10/2015Actions
Task #5693: Review 5692-pysdk-signed-manifest-locatorClosedPeter Amstutz04/10/2015Actions
Actions #1

Updated by Peter Amstutz about 9 years ago

  • Assigned To set to Peter Amstutz
  • Target version changed from Bug Triage to 2015-04-29 sprint
Actions #2

Updated by Brett Smith about 9 years ago

Reviewing dbafbd69, and just one comment: please add tests for the return values in different cases. If users are going to rely on this behavior, it's important enough to write tests for. Thanks.

Actions #3

Updated by Tom Clegg about 9 years ago

What's the use case for this?

A manifest without block signatures (like the one we've been storing, and dbafbd6 now returns a locator for) is not a particularly useful thing: it could be used to verify data you already have locally, but we don't have any tools for that (do we?), and it doesn't enable you to get any data.

Even a manifest with block signatures is rarely worth storing in Keep, because the signatures expire. The only use case I know of is Crunch tasks, and the only reason they do it is to avoid creating a bunch of extra partial collections, so getting it through save() seems like the wrong API. It seems a bit wrong to write all of these useless bits to Keep every time anyone hits save(). They'll get deleted by garbage collection eventually, but it's still rather inefficient. How about just let crunch tasks call manifest_text() and keep put() themselves, since they're trying to work outside the normal framework anyway (and they won't be able to do this at all when they become reusable)?

So, unless I'm missing something, "write the manifest to Keep" shouldn't be a side effect of save() (even with the signatures that could potentially make it useful for Crunch tasks), and of course that means "returns a locator for a useless chunk of data" shouldn't be part of the save() API.

Actions #4

Updated by Sarah Guthrie about 9 years ago

The current way to end a task using the arvados SDK is to use current_task().set_output() which requires the signed manifest locator as an input. CollectionWriter.save() returned this, so I just got used to passing in the result of CollectionWriter.save() to set_output, which meant that when Collection.save() or save_new() didn't return anything, none of my results were saved at the end of the job.

Actions #5

Updated by Peter Amstutz about 9 years ago

The use case we're trying to fix is to be able to use the new Collection API the same way as CollectionWriter to produce output for a task that will be collated when the job finished.

However, you're right, I didn't think this one through. If it saves the manifest without tokens, it is useless for this use case.

From discussion on IRC, we will adopt the CollectionWriter.manifest_text() behavior and commit any outstanding blocks when calling Collection.manifest_text(strip=False). The return value from manifest_text() can be assigned to the task output.

Actions #6

Updated by Tom Clegg about 9 years ago

For the sake of completeness there are two ways to tell crunch-job your task output:
  • current_task().set_output(locator) where locator is a signed locator for a data block containing an unstripped manifest (or a fragment of one)
  • current_task().set_output(manifest_text) where manifest_text is an unstripped manifest (or a fragment of one)

In both cases you don't need to save() (in fact, you shouldn't even bother cluttering your workspace by creating a collection with a uuid, which means save() wouldn't work anyway, but you don't want save_new() either) -- you just need to get c.manifest_text() and give it (or keep.KeepClient().put(c.manifest_text())) to set_output().

Actions #7

Updated by Brett Smith about 9 years ago

Reviewing 88f5731

  • Giving ArvadosFile.flush() a default num_retries=None is a little dicey. Since it isn't (and can't be) a @retry_method, the None will eventually make its way to the googleapiclient boundary, and then who knows what happens. Suggest 0 instead.
  • The changes to the BufferBlock tests manipulate internal state (bufferblock.owner._current_bblock). Instead of mocking Keep and building from there, what if you mocked the owner instead? You could instantiate the BufferBlock directly with that, and assert that committing calls owner.flush(). I think it'll actually make the tests shorter and sweeter.
  • Am I right to think that a lot of the strip=True additions to the tests were added during the previous implementation? If so, reverting these changes would help improve confidence that the SDK changes in this branch don't have undesired side effects.

Thanks.

Actions #8

Updated by Peter Amstutz about 9 years ago

Brett Smith wrote:

Reviewing 88f5731

  • Giving ArvadosFile.flush() a default num_retries=None is a little dicey. Since it isn't (and can't be) a @retry_method, the None will eventually make its way to the googleapiclient boundary, and then who knows what happens. Suggest 0 instead.

Fixed.

  • The changes to the BufferBlock tests manipulate internal state (bufferblock.owner._current_bblock). Instead of mocking Keep and building from there, what if you mocked the owner instead? You could instantiate the BufferBlock directly with that, and assert that committing calls owner.flush(). I think it'll actually make the tests shorter and sweeter.

Done.

  • Am I right to think that a lot of the strip=True additions to the tests were added during the previous implementation? If so, reverting these changes would help improve confidence that the SDK changes in this branch don't have undesired side effects.

Instead of strip=True, those calls have been changed to use portable_manifest_text() instead, because the affected tests were written on the assumption that .manifest_text() doesn't produce side effects (which is now true for portable_manifest_text() but no longer true for manifest_text()). I did a general search-and-replace manifest_text() -> portable_manifest_text() to avoid accidental test breakage in the future.

Actions #9

Updated by Brett Smith about 9 years ago

Reviewing c162a42

Is it necessary to add another behavior-controlling boolean knob to the list of manifest_text() arguments? #4823 envisioned a method that took no arguments; we already have two for backward compatibility, and adding a third takes us even farther away from the ideal. If it's workable, I would prefer to see the current manifest_text() implementation turned into a private method; then portable_manifest_text() can call that with all the right arguments, and the public manifest_text() can always flush then call the private method. I believe that would give us the functionality we need without drifting away from the design in #4823.

There are a few typos in the new docstrings:

  • In portable_manifest_text(): "This method does not flush outstanding to Keep." is missing a word (blocks?).
  • In manifest_text(): "By default, this method will flush outstanding blocksto Keep. By default it will not normalize the manifest or strip access tokens." "blocksto" is missing a space. This text is not consistent about whether or not a comma appears after "By default." Either style is fine by me, but please be consistent.

Thanks.

Actions #10

Updated by Peter Amstutz about 9 years ago

Brett Smith wrote:

Reviewing c162a42

Is it necessary to add another behavior-controlling boolean knob to the list of manifest_text() arguments? #4823 envisioned a method that took no arguments; we already have two for backward compatibility, and adding a third takes us even farther away from the ideal. If it's workable, I would prefer to see the current manifest_text() implementation turned into a private method; then portable_manifest_text() can call that with all the right arguments, and the public manifest_text() can always flush then call the private method. I believe that would give us the functionality we need without drifting away from the design in #4823.

Refactored into _get_manifest_text().

There are a few typos in the new docstrings:

  • In portable_manifest_text(): "This method does not flush outstanding to Keep." is missing a word (blocks?).
  • In manifest_text(): "By default, this method will flush outstanding blocksto Keep. By default it will not normalize the manifest or strip access tokens." "blocksto" is missing a space. This text is not consistent about whether or not a comma appears after "By default." Either style is fine by me, but please be consistent.

Fixed.

Thanks.

Actions #11

Updated by Peter Amstutz about 9 years ago

  • Status changed from New to In Progress
Actions #12

Updated by Brett Smith about 9 years ago

  • Status changed from In Progress to New

At 679eef4, test_diff_apply_with_token is calling the private _get_manifest_text() method. This test suite runs a Keep server without permissions enforcement, so I think you can just change this to call manifest_text() (and that passes for me). If there's some nuance of the test I'm missing here, let me know.

Actions #13

Updated by Peter Amstutz about 9 years ago

Brett Smith wrote:

At 679eef4, test_diff_apply_with_token is calling the private _get_manifest_text() method. This test suite runs a Keep server without permissions enforcement, so I think you can just change this to call manifest_text() (and that passes for me). If there's some nuance of the test I'm missing here, let me know.

You're right, it doesn't actually have any blocks to flush in that test so I can just use the regular manifest_text() method. The tests pass for me as well. Good to merge?

Actions #14

Updated by Brett Smith about 9 years ago

ee0411e is good to merge, thanks.

Actions #15

Updated by Peter Amstutz about 9 years ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:ec8879aa56810dfc6475ad8cdc56770ff91f84f2.

Actions

Also available in: Atom PDF