[SDKs] Should CollectionReader.all_streams() iterate lines in the manifest, or "logical" streams?
A user just encountered an issue where their Crunch script had surprising behavior because an input manifest defined multiple files in the same stream with one file per line, like this:
. [locator] 0:3:foo . [locator] 0:6:bar …
Currently, CollectionReader.all_streams() iterates lines in the manifest. Using this method in a for loop, the user expected to find all of the files listed above in a single iteration. However, that's not the behavior all_streams() actually presented.
I believe our general expectation is that the SDK handles all the work of presenting manifests logically, so I think the method should be changed to iterate over "logical" streams rather than physical lines in the manifest. As long as the final list of files is correct, I believe this would be backward compatible: since writing the manifest on one or multiple lines is functionally indistinguishable, presenting it as such to the SDK client should be indistinguishable, too.
#2 Updated by Tom Clegg almost 7 years ago
- use the new Collection API, or
- normalize() the collection before calling all_streams().