Bug #5692
closed[SDK] Can't get signed manifest locator from new Collection API.
Added by Peter Amstutz over 9 years ago. Updated over 9 years ago.
Updated by Peter Amstutz over 9 years ago
- Assigned To set to Peter Amstutz
- Target version changed from Bug Triage to 2015-04-29 sprint
Updated by Brett Smith over 9 years ago
Reviewing dbafbd69, and just one comment: please add tests for the return values in different cases. If users are going to rely on this behavior, it's important enough to write tests for. Thanks.
Updated by Tom Clegg over 9 years ago
What's the use case for this?
A manifest without block signatures (like the one we've been storing, and dbafbd6 now returns a locator for) is not a particularly useful thing: it could be used to verify data you already have locally, but we don't have any tools for that (do we?), and it doesn't enable you to get any data.
Even a manifest with block signatures is rarely worth storing in Keep, because the signatures expire. The only use case I know of is Crunch tasks, and the only reason they do it is to avoid creating a bunch of extra partial collections, so getting it through save()
seems like the wrong API. It seems a bit wrong to write all of these useless bits to Keep every time anyone hits save(). They'll get deleted by garbage collection eventually, but it's still rather inefficient. How about just let crunch tasks call manifest_text() and keep put() themselves, since they're trying to work outside the normal framework anyway (and they won't be able to do this at all when they become reusable)?
So, unless I'm missing something, "write the manifest to Keep" shouldn't be a side effect of save()
(even with the signatures that could potentially make it useful for Crunch tasks), and of course that means "returns a locator for a useless chunk of data" shouldn't be part of the save()
API.
Updated by Sarah Guthrie over 9 years ago
The current way to end a task using the arvados SDK is to use current_task().set_output() which requires the signed manifest locator as an input. CollectionWriter.save() returned this, so I just got used to passing in the result of CollectionWriter.save() to set_output, which meant that when Collection.save() or save_new() didn't return anything, none of my results were saved at the end of the job.
Updated by Peter Amstutz over 9 years ago
The use case we're trying to fix is to be able to use the new Collection API the same way as CollectionWriter to produce output for a task that will be collated when the job finished.
However, you're right, I didn't think this one through. If it saves the manifest without tokens, it is useless for this use case.
From discussion on IRC, we will adopt the CollectionWriter.manifest_text()
behavior and commit any outstanding blocks when calling Collection.manifest_text(strip=False)
. The return value from manifest_text()
can be assigned to the task output.
Updated by Tom Clegg over 9 years ago
current_task().set_output(locator)
wherelocator
is a signed locator for a data block containing an unstripped manifest (or a fragment of one)current_task().set_output(manifest_text)
where manifest_text is an unstripped manifest (or a fragment of one)
In both cases you don't need to save()
(in fact, you shouldn't even bother cluttering your workspace by creating a collection with a uuid, which means save() wouldn't work anyway, but you don't want save_new()
either) -- you just need to get c.manifest_text()
and give it (or keep.KeepClient().put(c.manifest_text())
) to set_output()
.
Updated by Brett Smith over 9 years ago
Reviewing 88f5731
- Giving ArvadosFile.flush() a default
num_retries=None
is a little dicey. Since it isn't (and can't be) a @retry_method, the None will eventually make its way to the googleapiclient boundary, and then who knows what happens. Suggest 0 instead. - The changes to the BufferBlock tests manipulate internal state (
bufferblock.owner._current_bblock
). Instead of mocking Keep and building from there, what if you mocked the owner instead? You could instantiate the BufferBlock directly with that, and assert that committing callsowner.flush()
. I think it'll actually make the tests shorter and sweeter. - Am I right to think that a lot of the
strip=True
additions to the tests were added during the previous implementation? If so, reverting these changes would help improve confidence that the SDK changes in this branch don't have undesired side effects.
Thanks.
Updated by Peter Amstutz over 9 years ago
Brett Smith wrote:
Reviewing 88f5731
- Giving ArvadosFile.flush() a default
num_retries=None
is a little dicey. Since it isn't (and can't be) a @retry_method, the None will eventually make its way to the googleapiclient boundary, and then who knows what happens. Suggest 0 instead.
Fixed.
- The changes to the BufferBlock tests manipulate internal state (
bufferblock.owner._current_bblock
). Instead of mocking Keep and building from there, what if you mocked the owner instead? You could instantiate the BufferBlock directly with that, and assert that committing callsowner.flush()
. I think it'll actually make the tests shorter and sweeter.
Done.
- Am I right to think that a lot of the
strip=True
additions to the tests were added during the previous implementation? If so, reverting these changes would help improve confidence that the SDK changes in this branch don't have undesired side effects.
Instead of strip=True, those calls have been changed to use portable_manifest_text() instead, because the affected tests were written on the assumption that .manifest_text() doesn't produce side effects (which is now true for portable_manifest_text() but no longer true for manifest_text()). I did a general search-and-replace manifest_text() -> portable_manifest_text() to avoid accidental test breakage in the future.
Updated by Brett Smith over 9 years ago
Reviewing c162a42
Is it necessary to add another behavior-controlling boolean knob to the list of manifest_text()
arguments? #4823 envisioned a method that took no arguments; we already have two for backward compatibility, and adding a third takes us even farther away from the ideal. If it's workable, I would prefer to see the current manifest_text()
implementation turned into a private method; then portable_manifest_text()
can call that with all the right arguments, and the public manifest_text()
can always flush then call the private method. I believe that would give us the functionality we need without drifting away from the design in #4823.
There are a few typos in the new docstrings:
- In
portable_manifest_text()
: "This method does not flush outstanding to Keep." is missing a word (blocks?). - In
manifest_text()
: "By default, this method will flush outstanding blocksto Keep. By default it will not normalize the manifest or strip access tokens." "blocksto" is missing a space. This text is not consistent about whether or not a comma appears after "By default." Either style is fine by me, but please be consistent.
Thanks.
Updated by Peter Amstutz over 9 years ago
Brett Smith wrote:
Reviewing c162a42
Is it necessary to add another behavior-controlling boolean knob to the list of
manifest_text()
arguments? #4823 envisioned a method that took no arguments; we already have two for backward compatibility, and adding a third takes us even farther away from the ideal. If it's workable, I would prefer to see the currentmanifest_text()
implementation turned into a private method; thenportable_manifest_text()
can call that with all the right arguments, and the publicmanifest_text()
can always flush then call the private method. I believe that would give us the functionality we need without drifting away from the design in #4823.
Refactored into _get_manifest_text()
.
There are a few typos in the new docstrings:
- In
portable_manifest_text()
: "This method does not flush outstanding to Keep." is missing a word (blocks?).- In
manifest_text()
: "By default, this method will flush outstanding blocksto Keep. By default it will not normalize the manifest or strip access tokens." "blocksto" is missing a space. This text is not consistent about whether or not a comma appears after "By default." Either style is fine by me, but please be consistent.
Fixed.
Thanks.
Updated by Peter Amstutz over 9 years ago
- Status changed from New to In Progress
Updated by Brett Smith over 9 years ago
- Status changed from In Progress to New
At 679eef4, test_diff_apply_with_token
is calling the private _get_manifest_text()
method. This test suite runs a Keep server without permissions enforcement, so I think you can just change this to call manifest_text()
(and that passes for me). If there's some nuance of the test I'm missing here, let me know.
Updated by Peter Amstutz over 9 years ago
Brett Smith wrote:
At 679eef4,
test_diff_apply_with_token
is calling the private_get_manifest_text()
method. This test suite runs a Keep server without permissions enforcement, so I think you can just change this to callmanifest_text()
(and that passes for me). If there's some nuance of the test I'm missing here, let me know.
You're right, it doesn't actually have any blocks to flush in that test so I can just use the regular manifest_text() method. The tests pass for me as well. Good to merge?
Updated by Peter Amstutz over 9 years ago
- Status changed from New to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|commit:ec8879aa56810dfc6475ad8cdc56770ff91f84f2.