Project

General

Profile

Actions

Bug #10035

closed

[FUSE] Determine why bcl2fastq doesn't work with writable keep mount

Added by Tom Morris over 7 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
-
Story points:
1.0

Description

I'm not sure there's anything we can do about this other than to make a private copy of blobxfer which plays more nicely with writable keep mounts, so this is primarily a reminder to document the current behavior. The file space preallocation code fragment below fails with an "I/O error" on the file close. The full output is below that.

        if allocatesize > 0:
            filedesc.seek(allocatesize - 1)
            filedesc.write(b'\0')
        filedesc.close()

$ time blobxfer rawseqdata 160819-e00504-0013-ahnht2ccxx keep/home/foo --saskey=$SASKEY --download --remoteresource=. --disable-urllib-warnings
!!! WARNING: DISABLING URLLIB3 WARNINGS !!! =====================================
azure blobxfer parameters [v0.11.4] =====================================
platform: Linux-3.19.0-49-generic-x86_64-with-Ubuntu-14.04-trusty
python interpreter: CPython 2.7.6
package versions: az.common=1.1.4 az.sml=0.20.4 az.stor=0.33.0 crypt=1.5 req=2.11.1
subscription id: None
management cert: None
transfer direction: Azure->local
local resource: keep/home/foo
include pattern: None
remote resource: .
max num of workers: 6
timeout: None
storage account: rawseqdata
use SAS: True
upload as page blob: False
auto vhd->page blob: False
upload to file share: False
container/share name: 160819-e00504-0013-ahnht2ccxx
container/share URI: https://rawseqdata.blob.core.windows.net/160819-e00504-0013-ahnht2ccxx
compute block MD5: False
compute file MD5: True
skip on MD5 match: True
chunk size (bytes): 4194304
create container: False
keep mismatched MD5: False
recursive if dir: True
component strip on up: 1
remote delete: False
collate to: disabled
local overwrite: True
encryption mode: disabled
RSA key file: disabled
RSA key type: disabled =======================================

script start time: 2016-09-13 22:18:11
attempting to copy entire container 160819-e00504-0013-ahnht2ccxx to keep/home/foo
generating local directory structure and pre-allocating space
created local directory: keep/home/foo/HiSeq/160819_E00504_0013_AHNHT2CCXX/Data/Intensities/BaseCalls/L005/C163.1
remote blob: HiSeq/160819_E00504_0013_AHNHT2CCXX/Data/Intensities/BaseCalls/L005/C163.1/s_5_2215.bcl.gz length: 2300498 bytes, md5: DWpLFJdWfz1sdF0LG2bOkg==
Traceback (most recent call last):
File "/home/tfmorris/venv/bin/blobxfer", line 11, in <module>
sys.exit(main())
File "/home/tfmorris/venv/local/lib/python2.7/site-packages/blobxfer.py", line 2525, in main
localfile, blob, False, blobdict[blob])
File "/home/tfmorris/venv/local/lib/python2.7/site-packages/blobxfer.py", line 1858, in generate_xferspec_download
filedesc.close()
IOError: [Errno 5] Input/output error

real 12m16.266s
user 6m56.856s
sys 0m2.144s


Subtasks 1 (1 open0 closed)

Task #11115: ReviewNew09/13/2016Actions

Related issues

Related to Arvados - Bug #11510: [SDK] Support writes to offsets beyond end of fileResolvedPeter Amstutz04/20/2017Actions
Actions #1

Updated by Tom Morris over 7 years ago

Does seek() need to be added to the list of unsupported operations here or is something else going on?:

http://doc.arvados.org/user/tutorials/tutorial-keep-mount.html

The failing code snippet quoted in the original report is here:

https://github.com/Azure/blobxfer/blob/3222c6047e73cbc01adac987d399eb49fb6573fc/blobxfer.py#L1858

Actions #2

Updated by Peter Amstutz over 7 years ago

I suspect the problem is that blobxfer (and probably also Aspera downloader) calls truncate(2) or ftruncate(2) to allocate space for the resize. If the size specified in these syscalls are larger than the file, it's supposed to be padded with zeros.

However, the "truncate" operation implemented in ArvadosFile only supports shrinking the file. So when blobxfer tries to seek() to the end position, it fails because file wasn't successfully resized.

I recommend we improve ArvadosFile.truncate to support increasing file size and then retest blobxfer and aspera.

Actions #3

Updated by Peter Amstutz over 7 years ago

We should confirm that it is calling truncate() before we go and write new code, though.

Actions #4

Updated by Tom Morris over 7 years ago

I'm not sure this is ever going to work well for programs like blobxfer. After it creates a file with 64 MB (or whatever) of 0s, it's then going to fill it in one 4 MB block at a time, in random order, as the many read requests that it has outstanding complete. Each write operation consists of an open, seek, write, close sequence. This seems like it would cause chaos with the content addressing.

Actions #5

Updated by Tom Clegg about 7 years ago

  • Subject changed from [FUSE] Azure blobxfer doesn't work with writable keep mount to [FUSE] Determine why Azure blobxfer doesn't work with writable keep mount
  • Story points set to 1.0
Actions #6

Updated by Tom Morris about 7 years ago

  • Target version set to 2017-03-01 sprint
Actions #7

Updated by Tom Morris about 7 years ago

  • Subject changed from [FUSE] Determine why Azure blobxfer doesn't work with writable keep mount to [FUSE] Determine why bcl2fastq doesn't work with writable keep mount
Actions #8

Updated by Peter Amstutz about 7 years ago

  • Assigned To set to Peter Amstutz
Actions #9

Updated by Peter Amstutz about 7 years ago

  • Status changed from New to Resolved
Actions #10

Updated by Peter Amstutz about 7 years ago

  • Status changed from Resolved to In Progress
Actions #11

Updated by Peter Amstutz about 7 years ago

  • Target version changed from 2017-03-01 sprint to Arvados Future Sprints
Actions #12

Updated by Peter Amstutz over 6 years ago

  • Status changed from In Progress to Resolved

This works now & is used in production.

Actions #13

Updated by Tom Morris over 5 years ago

  • Target version deleted (Arvados Future Sprints)
Actions

Also available in: Atom PDF