Feature #5778
open[FUSE] Support efficient copy at command line
Description
Keep can perform efficient copy-on-write of files and directories, but POSIX doesn't provide an API for this. We've decided not to abuse standard hardlinks: while similar (in the "fast copy" sense), hardlinks offer incompatible semantics ("two filenames refer to the same data; writes to either file are reflected in both").
Possible approaches for exposing COW capability through arv-mount:
- Use BTRFS clone ioctl() (requires support for handling ioctl() in llfuse). User can use
cp --reflink
- Use s3fs approach of writing a special xattr() to a special place to request a COW link. User uses a custom command to communicate with the file system.
Meanwhile, the following workaround is possible without modifying the FUSE driver (and could be provided as a "copy" CLI program):
- Determine source and target collections, perform the operation using Arvados SDK. Results show up in target directory on refresh.
Updated by Tom Clegg over 9 years ago
- Description updated (diff)
- Target version set to Arvados Future Sprints
Updated by Ward Vandewege over 3 years ago
- Target version deleted (
Arvados Future Sprints)
Updated by Joshua Randall about 3 years ago
For a limited use-case in which you want to use arv-mount to drive the actual copying (i.e. which file(s) to copy from one collection to another), I guess the (partial) workaround might be:
- duplicate input collections using the CLI or SDK into temporary collections
- use arv-mount read-write with the input collections mounted by ID
- mv (rename) the files of interest from the input collections to the output collection
- (optionally) delete the temporary duplicated collections
Does this make sense, or is a simpler workaround possible today?
It seems like another option to consider to enable this use-case without the external duplication step might be to have some sort of flag for arv-mount that allows renames to succeed against sources on read-only collections (i.e. when the input is specified by PDH)?
Currently an attempt to do that fails with "Operation not permitted" - that makes sense as the PDH mount point is read-only even when using `--read-write`, and clearly that is the correct default behaviour, but I thought it might be a compromise to offer an arv-mount option that would allow a user to opt-in to allowing an `mv` command to succeed against a fundamentally read only source without actually modifying that source (obviously).
I guess of the other options mentioned in this story, the one that enables `cp --reflink` seems the most user-friendly. Is it possible with llfuse today?
Updated by Peter Amstutz about 3 years ago
It's been a rather long time since we looked into this, but the issue at the time was that the way cp --reflink
was communicated to the file system wasn't propagated to FUSE.
I don't know if that was a limitation of the FUSE kernel interface, libfuse, or llfuse (probably not the last one). It is quite possible the situation has improved at some point in the last 5 years.
My preferred solution is still to reinterpret hard link requests as copy-on-write, it seems like a program that relies on POSIX semantics that closely is going to run into other more fundamental problems running on top of arv-mount before "expected modifications made to a hard linked file to show up in both files" becomes a problem.
Do you have a use case for this?