Bug #11510
closed[SDK] Support writes to offsets beyond end of file
Description
Two distinct but closely related issues encountered by a customer:
- Using s3.download_fileobj() with an ArvadosFileWriter target: files over some threshold (likely 8MB) are arbitrarily truncated.
- Using "aws s3 cp" to download to a writable keep mount: a 15 GB file results in only an 8 MB file appearing it keep(!)
I believe the common theme here is that they both rely on the Python SDK and are using multipart download (request and commit chunks 8 MB at a time) which suggests some issue with the way it writes chunks non-sequentially (possibly relating to seek() or ftruncate()).
Related issues
Updated by Peter Amstutz over 7 years ago
- Subject changed from S3 multipart downloads fail truncated writing to keep to S3 multipart download truncated writing to keep
- Description updated (diff)
Updated by Peter Amstutz over 7 years ago
$ python Python 2.7.9 (default, Jun 29 2016, 13:08:31) [GCC 4.9.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> f = open("ttt", "wb") >>> f.seek(10, 1) >>> f.tell() 10 >>> f.write("blub") >>> f.close() $ ls -l ttt -rw-r--r-- 1 peter peter 14 Apr 17 16:54 ttt
Oh dear.
Updated by Peter Amstutz over 7 years ago
So apparently this is POSIX behavior for creating sparse files, and the Arvados Python SDK (and hence FUSE) currently don't support it (both writeto() and truncate() will throw exceptions if you try to increase the size of a file or start writing at a point past the end of the file).
This messes up tools which rely on it this behavior that we need to support. We should implement it.
Updated by Peter Amstutz over 7 years ago
- Subject changed from S3 multipart download truncated writing to keep to [SDK] Support writes to offsets beyond end of file
- Status changed from New to In Progress
- Assigned To set to Peter Amstutz
- Target version set to 2017-04-26 sprint
Updated by Tom Clegg over 7 years ago
- remove debugging print statements in arvfile.py → ArvadosFile → writeto()
- remove unused PADDING_BLOCK_LOCATOR in config.py (or perhaps move it into arvados_testutil and use it in the test cases)
- I don't have any particular reason to suspect this won't work, and it doesn't become any more necessary now than before, but... it seems like we should test truncating a file in such a way that one or more blocks become unreferenced.
- test_sparse_write3 might be even more edge-casey if it left out i=3, so
self.assertEqual(writer.read(), "000011112222\x00\x00\x00\x004444")
...? - test seek to a negative offset
- test seek past end of file, read (returns ''), seek back to non-sparse portion using SEEK_CUR, check file size did not change
- test return value of seek() is new file position
Updated by Peter Amstutz over 7 years ago
Tom Clegg wrote:
11510-sdk-extend-files @ f5638851ddb5a859a54a16fe963ee98591af17a9
- remove debugging print statements in arvfile.py → ArvadosFile → writeto()
Fixed.
- remove unused PADDING_BLOCK_LOCATOR in config.py (or perhaps move it into arvados_testutil and use it in the test cases)
Moved it into a docstring.
- I don't have any particular reason to suspect this won't work, and it doesn't become any more necessary now than before, but... it seems like we should test truncating a file in such a way that one or more blocks become unreferenced.
Added test_truncate3() which does this.
- test_sparse_write3 might be even more edge-casey if it left out i=3, so
self.assertEqual(writer.read(), "000011112222\x00\x00\x00\x004444")
...?
Added test_sparse_write4() which does this.
- test seek to a negative offset
test_seek_min_zero() tests this.
- test seek past end of file, read (returns ''), seek back to non-sparse portion using SEEK_CUR, check file size did not change
- test return value of seek() is new file position
Also tested in test_truncate3().
Updated by Tom Clegg over 7 years ago
Peter Amstutz wrote:
- test return value of seek() is new file position
Also tested in test_truncate3().
seek return value is still untested and seems to be incorrect. If I add a check:
Traceback (most recent call last): File "/home/tom/src/arvados/sdk/python/tests/test_arvfile.py", line 169, in test_write_to_end self.assertEqual(5, writer.seek(5, os.SEEK_SET)) AssertionError: 5 != None
https://docs.python.org/2/library/io.html#io.IOBase.seek
Rest LGTM, thanks
Updated by Peter Amstutz over 7 years ago
- Status changed from In Progress to Resolved