[SDKs] Python CollectionReader should return at least one byte to caller per block read from Keep.
- "foo.txt consists of the first 4 bytes of 64M blob A, followed by the first 4 bytes of 64M blob B, ..."
- (Big improvement in a future story) Retrieve partial content from Keep. Doing an HTTP request per 4-byte segment will be slow, but much faster than doing an HTTP request and 64MB of disk and network traffic per 4-byte segment!
- (Small improvement in this story) After fetching a full (or partial) block from Keep, there is always at least one byte ready to return to the caller. Return the available data to the caller right away. Don't fetch the next block until the next
This is the offending code in
def read(self, size): """Read up to 'size' bytes from the stream, starting at the current file position""" if size == 0: return '' data = '' for locator, blocksize, segmentoffset, segmentsize in locators_and_ranges(self.segments, self._filepos, size): data += self._stream.readfrom(locator+segmentoffset, segmentsize) self._filepos += len(data) return data
Rather than looping through locators_and_ranges, it should get as much data as it can from the first element of locators_and_ranges, and return that.
This mimics the behavior of io.read(), except that we don't [yet] support the "unspecified
size" and "
Related: flush outfile after each