Bug #5298

[SDKs] Should CollectionReader.all_streams() iterate lines in the manifest, or "logical" streams?

Added by Brett Smith almost 7 years ago. Updated about 2 years ago.

Status:
Closed
Priority:
Normal
Assigned To:
-
Category:
SDKs
Target version:
-
Start date:
02/13/2015
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
0.5

Description

A user just encountered an issue where their Crunch script had surprising behavior because an input manifest defined multiple files in the same stream with one file per line, like this:

. [locator] 0:3:foo
. [locator] 0:6:bar
…

Currently, CollectionReader.all_streams() iterates lines in the manifest. Using this method in a for loop, the user expected to find all of the files listed above in a single iteration. However, that's not the behavior all_streams() actually presented.

I believe our general expectation is that the SDK handles all the work of presenting manifests logically, so I think the method should be changed to iterate over "logical" streams rather than physical lines in the manifest. As long as the final list of files is correct, I believe this would be backward compatible: since writing the manifest on one or multiple lines is functionally indistinguishable, presenting it as such to the SDK client should be indistinguishable, too.


Subtasks

Task #5449: Update examples to use new Python Collection SDK and add deprecation notes to old APIsResolved


Related issues

Related to Arvados - Story #3706: [SDKs] Remove fallback-to-keep warning from python SDK if block hash has a permission signatureResolved07/31/2014

History

#1 Updated by Brett Smith almost 7 years ago

  • Description updated (diff)

#2 Updated by Tom Clegg almost 7 years ago

It's intended to preserve the streams as given in the manifest. It certainly makes sense that that's not the desired behavior in many cases, but this API is deprecated -- best if the code in question can be updated to either
  1. use the new Collection API, or
  2. normalize() the collection before calling all_streams().

#3 Updated by Peter Amstutz almost 7 years ago

Will update documentation with the existing behavior and note that the method is deprecated (#5449)

#4 Updated by Peter Amstutz almost 7 years ago

  • Target version changed from Bug Triage to 2015-04-01 sprint

#5 Updated by Tom Clegg almost 7 years ago

  • Status changed from New to Feedback

#6 Updated by Peter Amstutz almost 7 years ago

  • Assigned To set to Peter Amstutz

#7 Updated by Peter Amstutz almost 7 years ago

  • Target version changed from 2015-04-01 sprint to 2015-04-29 sprint

#8 Updated by Peter Amstutz almost 7 years ago

  • Story points set to 0.5

#9 Updated by Tom Clegg almost 7 years ago

  • Target version deleted (2015-04-29 sprint)

#10 Updated by Peter Amstutz over 5 years ago

  • Assigned To deleted (Peter Amstutz)

#11 Updated by Peter Amstutz about 2 years ago

  • Status changed from Feedback to Closed

Also available in: Atom PDF