Bug #10035
closed[FUSE] Determine why bcl2fastq doesn't work with writable keep mount
Description
I'm not sure there's anything we can do about this other than to make a private copy of blobxfer which plays more nicely with writable keep mounts, so this is primarily a reminder to document the current behavior. The file space preallocation code fragment below fails with an "I/O error" on the file close. The full output is below that.
if allocatesize > 0: filedesc.seek(allocatesize - 1) filedesc.write(b'\0') filedesc.close()
$ time blobxfer rawseqdata 160819-e00504-0013-ahnht2ccxx keep/home/foo --saskey=$SASKEY --download --remoteresource=. --disable-urllib-warnings
!!! WARNING: DISABLING URLLIB3 WARNINGS !!!
=====================================
azure blobxfer parameters [v0.11.4]
=====================================
platform: Linux-3.19.0-49-generic-x86_64-with-Ubuntu-14.04-trusty
python interpreter: CPython 2.7.6
package versions: az.common=1.1.4 az.sml=0.20.4 az.stor=0.33.0 crypt=1.5 req=2.11.1
subscription id: None
management cert: None
transfer direction: Azure->local
local resource: keep/home/foo
include pattern: None
remote resource: .
max num of workers: 6
timeout: None
storage account: rawseqdata
use SAS: True
upload as page blob: False
auto vhd->page blob: False
upload to file share: False
container/share name: 160819-e00504-0013-ahnht2ccxx
container/share URI: https://rawseqdata.blob.core.windows.net/160819-e00504-0013-ahnht2ccxx
compute block MD5: False
compute file MD5: True
skip on MD5 match: True
chunk size (bytes): 4194304
create container: False
keep mismatched MD5: False
recursive if dir: True
component strip on up: 1
remote delete: False
collate to: disabled
local overwrite: True
encryption mode: disabled
RSA key file: disabled
RSA key type: disabled
=======================================
script start time: 2016-09-13 22:18:11
attempting to copy entire container 160819-e00504-0013-ahnht2ccxx to keep/home/foo
generating local directory structure and pre-allocating space
created local directory: keep/home/foo/HiSeq/160819_E00504_0013_AHNHT2CCXX/Data/Intensities/BaseCalls/L005/C163.1
remote blob: HiSeq/160819_E00504_0013_AHNHT2CCXX/Data/Intensities/BaseCalls/L005/C163.1/s_5_2215.bcl.gz length: 2300498 bytes, md5: DWpLFJdWfz1sdF0LG2bOkg==
Traceback (most recent call last):
File "/home/tfmorris/venv/bin/blobxfer", line 11, in <module>
sys.exit(main())
File "/home/tfmorris/venv/local/lib/python2.7/site-packages/blobxfer.py", line 2525, in main
localfile, blob, False, blobdict[blob])
File "/home/tfmorris/venv/local/lib/python2.7/site-packages/blobxfer.py", line 1858, in generate_xferspec_download
filedesc.close()
IOError: [Errno 5] Input/output error
real 12m16.266s
user 6m56.856s
sys 0m2.144s
Related issues
Updated by Tom Morris about 8 years ago
Does seek() need to be added to the list of unsupported operations here or is something else going on?:
http://doc.arvados.org/user/tutorials/tutorial-keep-mount.html
The failing code snippet quoted in the original report is here:
https://github.com/Azure/blobxfer/blob/3222c6047e73cbc01adac987d399eb49fb6573fc/blobxfer.py#L1858
Updated by Peter Amstutz about 8 years ago
I suspect the problem is that blobxfer (and probably also Aspera downloader) calls truncate(2) or ftruncate(2) to allocate space for the resize. If the size specified in these syscalls are larger than the file, it's supposed to be padded with zeros.
However, the "truncate" operation implemented in ArvadosFile
only supports shrinking the file. So when blobxfer
tries to seek() to the end position, it fails because file wasn't successfully resized.
I recommend we improve ArvadosFile.truncate
to support increasing file size and then retest blobxfer and aspera.
Updated by Peter Amstutz about 8 years ago
We should confirm that it is calling truncate() before we go and write new code, though.
Updated by Tom Morris about 8 years ago
I'm not sure this is ever going to work well for programs like blobxfer. After it creates a file with 64 MB (or whatever) of 0s, it's then going to fill it in one 4 MB block at a time, in random order, as the many read requests that it has outstanding complete. Each write operation consists of an open, seek, write, close sequence. This seems like it would cause chaos with the content addressing.
Updated by Tom Clegg almost 8 years ago
- Subject changed from [FUSE] Azure blobxfer doesn't work with writable keep mount to [FUSE] Determine why Azure blobxfer doesn't work with writable keep mount
- Story points set to 1.0
Updated by Tom Morris almost 8 years ago
- Target version set to 2017-03-01 sprint
Updated by Tom Morris almost 8 years ago
- Subject changed from [FUSE] Determine why Azure blobxfer doesn't work with writable keep mount to [FUSE] Determine why bcl2fastq doesn't work with writable keep mount
Updated by Peter Amstutz over 7 years ago
- Status changed from Resolved to In Progress
Updated by Peter Amstutz over 7 years ago
- Target version changed from 2017-03-01 sprint to Arvados Future Sprints
Updated by Peter Amstutz over 7 years ago
- Status changed from In Progress to Resolved
This works now & is used in production.
Updated by Tom Morris over 6 years ago
- Target version deleted (
Arvados Future Sprints)