Project

General

Profile

Actions

Idea #17849

open

FUSE driver v2

Added by Peter Amstutz over 2 years ago. Updated 12 days ago.

Status:
New
Priority:
Normal
Assigned To:
-
Target version:
Story points:
-
Release:
Release relationship:
Auto

Description

Background:

Python+llfuse was expedient and has done lots of good work for us, but it's not promising as a long term (fast+reliable+maintainable) solution.

Implementation: TBD:
  • Approach for handling websocket "update" events
  • Selectable mechanisms/options for syncing to server (fflush, fsync, close) (on a shell node, flush-on-close, flush-periodically, or flush-after-idle-time might be best; in crunch-run, flush-on-exit might be best)
  • Desired behavior when updates conflict (write error? clobber? create "oops,clobbered" file?)
Other current bugs/limitations:
  • Old keep block signatures don't get refreshed, so reading a collection that's been cached for too long returns an I/O error
  • Not command-line compatible with arv-mount
  • Logging is not great
  • No docs
  • No way to control overall cache size (currently collectionfs can use lots of RAM in certain non-sequential write scenarios; we need the ability to trade speed for space efficiency in memory-constrained environments)
  • No warnings given when cache is thrashing
  • No application level instrumentation (just optional Go pprof)
  • Special .arvados#collection file is incomplete (has manifest_text but not uuid, pdh)
  • No automatic flush on sigint/sigterm
  • No warning given when trying to exit but filesystem can't be unmounted yet (filehandle is open, or a process's cwd is in the mount)
  • Mac port has a race bug (see notes below)
  • Windows port is untested
  • Cross-compiling recipe for Mac/Windows ports is fragile
  • chmod is a no-op (chmod 0700 succeeds, but the file mode will still be 0755)

Related issues

Related to Arvados - Bug #16727: [FUSE] [cgofuse] Refresh signatures / reload collection instead of using expired blob signaturesResolvedTom Clegg01/27/2022Actions
Related to Arvados - Idea #12308: [FUSE] Golang-based fuse driverResolvedTom CleggActions
Related to Arvados - Feature #18960: Config option to make crunch-run use Go FUSE driver when all mounts are read-onlyNewActions
Related to Arvados - Feature #18961: Go FileSystem / FUSE mount supports block prefetchIn ProgressTom CleggActions
Related to Arvados - Feature #21578: Add debug logging option to arvados-client mountIn ProgressTom Clegg03/18/2024Actions
Actions #2

Updated by Peter Amstutz over 2 years ago

  • Related to Bug #16727: [FUSE] [cgofuse] Refresh signatures / reload collection instead of using expired blob signatures added
Actions #3

Updated by Peter Amstutz over 2 years ago

  • Related to Idea #12308: [FUSE] Golang-based fuse driver added
Actions #4

Updated by Peter Amstutz over 2 years ago

  • Start date set to 01/01/2022
  • Due date set to 03/31/2022
Actions #5

Updated by Peter Amstutz over 2 years ago

  • Start date changed from 01/01/2022 to 04/01/2022
  • Due date changed from 03/31/2022 to 07/31/2022
Actions #6

Updated by Peter Amstutz about 2 years ago

  • Description updated (diff)
Actions #7

Updated by Peter Amstutz about 2 years ago

  • Start date changed from 04/01/2022 to 05/01/2022
  • Due date changed from 07/31/2022 to 08/31/2022
Actions #8

Updated by Peter Amstutz almost 2 years ago

  • Related to Feature #18960: Config option to make crunch-run use Go FUSE driver when all mounts are read-only added
Actions #9

Updated by Peter Amstutz almost 2 years ago

  • Related to Feature #18961: Go FileSystem / FUSE mount supports block prefetch added
Actions #10

Updated by Peter Amstutz almost 2 years ago

  • Start date changed from 05/01/2022 to 06/01/2022
  • Due date changed from 08/31/2022 to 09/30/2022
Actions #11

Updated by Peter Amstutz almost 2 years ago

  • Start date changed from 06/01/2022 to 08/31/2022
  • Due date changed from 09/30/2022 to 11/30/2022
Actions #12

Updated by Peter Amstutz almost 2 years ago

  • Start date changed from 08/31/2022 to 09/01/2022
Actions #13

Updated by Peter Amstutz over 1 year ago

  • Start date changed from 09/01/2022 to 10/01/2022
  • Due date changed from 11/30/2022 to 12/31/2022
Actions #14

Updated by Peter Amstutz over 1 year ago

  • Start date changed from 10/01/2022 to 12/01/2022
  • Due date changed from 12/31/2022 to 02/28/2023
Actions #15

Updated by Peter Amstutz about 1 year ago

  • Start date changed from 12/01/2022 to 03/01/2023
  • Due date changed from 02/28/2023 to 07/31/2023
Actions #16

Updated by Peter Amstutz about 1 year ago

  • Start date changed from 03/01/2023 to 06/01/2023
  • Due date changed from 07/31/2023 to 10/31/2023
Actions #17

Updated by Peter Amstutz 11 months ago

  • Due date changed from 10/31/2023 to 09/30/2023
Actions #18

Updated by Peter Amstutz 11 months ago

  • Due date changed from 09/30/2023 to 07/31/2023
Actions #19

Updated by Peter Amstutz 9 months ago

  • Start date changed from 06/01/2023 to 10/01/2023
  • Due date changed from 07/31/2023 to 12/31/2023
Actions #20

Updated by Peter Amstutz 6 months ago

  • Start date changed from 10/01/2023 to 01/01/2024
  • Due date changed from 12/31/2023 to 03/31/2024
Actions #21

Updated by Peter Amstutz 2 months ago

  • Start date changed from 01/01/2024 to 05/01/2024
  • Due date changed from 03/31/2024 to 10/31/2024
Actions #22

Updated by Peter Amstutz 2 months ago

  • Start date changed from 05/01/2024 to 04/01/2024
  • Due date changed from 10/31/2024 to 08/31/2024
Actions #23

Updated by Peter Amstutz 18 days ago

  • Target version set to Future
Actions #24

Updated by Peter Amstutz 12 days ago

  • Description updated (diff)
Actions #25

Updated by Brett Smith 12 days ago

Notes from recent discussions, especially the 2024-03-06 engineering meeting

Reasons to start prioritizing this:

  • Heavier user of arv-mount is consistently hitting concurrency and performance issues
  • python-llfuse is in maintenance mode - They did a release in November 2023 that should keep us set for a while but the writing's on the wall.

Potential milestones:

  1. Go mount can serve crunch-run's purposes
  2. Go mount can serve users' purposes for read-only mounts
    • Support as many command line options as practical (see below)
    • Bounded memory use
    • Disk caching of data
  3. Go mount adds --read-write support

Quoting lib/crunchrun/crunchrun.go, here are the exact options crunch-run can currently call arv-mount with:

  • --foreground
  • --read-write (I think because of --mount-tmp)
  • --storage-classes
  • --crunchstat-interval
  • --allow-other (since the compute work may run as another user inside the container)
  • --disk-cache
  • --disk-cache-dir
  • --file-cache
  • --ram-cache
  • --mount-tmp
  • --mount-by-pdh
  • --disable-event-listening (not totally clear why, just trying to reduce network traffic?)
  • --mount-by-id
  • --unmount-timeout, --unmount (cleanup after the job is done, IMO I think we could implement this with standard tools, see below)
  • --version (basic "can run" check)

Implementation notes about specific options:

  • Please use a GNU-style argument parser so --long-options still work. Our users use Linux, not Plan 9, there's no good reason to force them to s/--/-/g in all their tooling.
  • By the time we get to phase #2 and start offering this to users, for a combination of reasons of "it's required" or "it's too useful not too have" or "it's very low effort to implement after you have the previous options," we expect to support everything except possibly the following:
    • --read-write (you'll need to support writable tmp mounts for crunch-run but updating projects and collections is the next phase)
    • --foreground: IMO the new mount should always run foreground, this should become a noop option for compatibility and we tell users "if you want it to daemonize run it with systemd-run"
    • --replace, --subtype, --unmount-*: I think the main reason these options exist is because the code exists to support --exec so we might as well expose it. If we end up reimplementing that same pattern, that's fine. But if there's no need, I think it's low risk to remove these options and tell users to use fusermount -u, findmnt, and other existing tools. I don't think we need to reimplement those just for compatibility.
  • Consider this alternative, more ergonomic spelling for --exec: arvados-client mount MOUNTDIR [COMMAND ARG ...]
  • Consider #19934 for --exec (start the process with the mount point as the working directory)
Actions #26

Updated by Tom Clegg 8 days ago

  • Related to Feature #21578: Add debug logging option to arvados-client mount added
Actions

Also available in: Atom PDF