Story #12308

[FUSE] Golang-based fuse driver

Added by Tom Clegg about 3 years ago. Updated 5 months ago.

Status:
In Progress
Priority:
Normal
Assigned To:
Category:
FUSE
Target version:
-
Start date:
09/22/2017
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-

Description

Background:

Python+llfuse was expedient and has done lots of good work for us, but it's not promising as a long term (fast+reliable+maintainable) solution.

Implementation: TBD:
  • Approach for handling websocket "update" events
  • Selectable mechanisms/options for syncing to server (fflush, fsync, close) (on a shell node, flush-on-close, flush-periodically, or flush-after-idle-time might be best; in crunch-run, flush-on-exit might be best)
  • Desired behavior when updates conflict (write error? clobber? create "oops,clobbered" file?)
Other current bugs/limitations:
  • Old keep block signatures don't get refreshed, so reading a collection that's been cached for too long returns an I/O error
  • Not command-line compatible with arv-mount
  • Logging is not great
  • No docs
  • No way to control overall cache size (currently collectionfs can use lots of RAM in certain non-sequential write scenarios; we need the ability to trade speed for space efficiency in memory-constrained environments)
  • No warnings given when cache is thrashing
  • No application level instrumentation (just optional Go pprof)
  • Special .arvados#collection file is incomplete (has manifest_text but not uuid, pdh)
  • No automatic flush on sigint/sigterm
  • No warning given when trying to exit but filesystem can't be unmounted yet (filehandle is open, or a process's cwd is in the mount)
  • Mac port has a race bug (see notes below)
  • Windows port is untested
  • Cross-compiling recipe for Mac/Windows ports is fragile
  • chmod is a no-op (chmod 0700 succeeds, but the file mode will still be 0755)

Subtasks

Task #16098: Review 12308-cgofuseResolvedTom Clegg


Related issues

Related to Arvados - Feature #12876: [CLI] arvados-client command-line toolResolved

Related to Arvados Epics - Story #16082: Port client tools to GoNew01/01/202103/31/2021

Related to Arvados - Bug #16727: [FUSE] [cgofuse] Refresh signatures / reload collection instead of using expired blob signaturesIn Progress

Blocked by Arvados - Story #10996: [SDK] Writable file-like interface for collections in Go SDKResolved

Blocked by Arvados - Bug #11249: [SDKs] Writable collection files returned by Go SDK should be seekableResolved

Associated revisions

Revision db791b7a
Added by Tom Clegg 7 months ago

Merge branch '12308-cgofuse'

refs #12308

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

Revision 7b4082f9 (diff)
Added by Tom Clegg 7 months ago

Use new version tag to avoid tag whose checksum changed.

refs #12308

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

History

#1 Updated by Tom Clegg almost 3 years ago

  • Description updated (diff)

#2 Updated by Tom Clegg almost 3 years ago

  • Status changed from New to In Progress

work in progress here

12308-go-fuse @ aa18bbe2333f293d329efdae4a13ff79b03a1d8c

go get -d git.curoverse.com/arvados.git/lib/crunchstat                              ;# clone arvados to your gopath 
(cd $GOPATH/src/git.curoverse.com/arvados.git && git checkout origin/12308-go-fuse)
go get git.curoverse.com/arvados.git/cmd/arvados-client

# (set up your ARVADOS_API_HOST and ARVADOS_API_TOKEN env vars)

$GOPATH/arvados-client mount --experimental /tmp/mnt &

cd /tmp/mnt/by_id/$some_existing_collection_uuid/ && git clone file:///home/path/to/arvados.git && sync .

#3 Updated by Tom Clegg almost 3 years ago

  • Description updated (diff)

#4 Updated by Tom Clegg almost 3 years ago

  • Description updated (diff)

#5 Updated by Tom Clegg over 2 years ago

  • Related to Feature #12876: [CLI] arvados-client command-line tool added

#6 Updated by Abram Connelly over 2 years ago

Using the new arv mount functionality, I created a new keep mount via:

$ mkdir keepgo
$ arvados-client mount --experimental keepgo/ -d 2> arv-mount-experimental.log

Going into the keepgo/home and doing some ls commands intermittantly gives the following error message:

abram@lightning-dev1:~/keepgo/home$ ls
ls: reading directory '.': Input/output error
00-example-shell.cwl input
Saved at 2015-05-06 21:13:44 UTC by crunch@01b0dfdb2f15  
...

Where, when the error occurs, it gives a partial list of the directory.

I don't see anything of note in the log so I haven't provided it here.

#7 Updated by Tom Clegg over 2 years ago

The non-fuse-related code here has been extracted and used in #13111.

Fuse parts are rebased against #13111, now 12308-cgofuse @ c5633c850d664d2f78e0efccf9ec9734b4e32de5.

#8 Updated by Peter Amstutz 8 months ago

  • Target version set to 2020-02-12 Sprint

#9 Updated by Peter Amstutz 8 months ago

#10 Updated by Peter Amstutz 8 months ago

  • Assigned To set to Peter Amstutz

#11 Updated by Peter Amstutz 8 months ago

  • Assigned To changed from Peter Amstutz to Tom Clegg

#12 Updated by Tom Clegg 8 months ago

  • Description updated (diff)

#13 Updated by Tom Clegg 8 months ago

  • Description updated (diff)

#14 Updated by Tom Clegg 8 months ago

12308-cgofuse @ 9a4fcabed1adeff0044d419977d5136c5cb1db3e -- https://ci.arvados.org/view/Developer/job/developer-run-tests/1715/

Adds a "mount" subcommand to arvados-client, with limitations noted in the issue description above.

Our currently supported platforms (linux/amd64) work fine with the usual build process; source:cmd/arvados-client/Makefile has a recipe for cross-compiling binaries for linux/macos/windows on i386/amd64 using their respective fuse/fuse-like libraries.

#15 Updated by Lucas Di Pentima 8 months ago

Gave it a light first pass look, a couple of issues:

#16 Updated by Tom Clegg 8 months ago

#17 Updated by Lucas Di Pentima 8 months ago

I have been testing a binary compiled for OSX, using osxfuse version 3.10.4 and OSX 10.14.6

The test consists on cloning arvados' repository from github.

  • Against ce8i5 (5 Mbps uplink) fails crashing the program without debug info
  • Using arvbox
    • Unbounded virtual network: OK, 30secs
    • 50 Mbps up/down: OK, 36 secs
    • 25 Mbps up/down: OK, 50 secs
    • 12 Mbps up/down: OK, 65 secs
    • 5 Mbps up/down: FAILED (with crash and no debug info): 2min14secs, , got to write 43 of the 73 MB before crashing.

#18 Updated by Lucas Di Pentima 8 months ago

Repeating the test of cloning arvados' repo, I got one crash with debug information after 6 seconds, the error I got from the 'git' command execution was 'fatal: write error: Device not configured.':

fatal error: concurrent map read and map write

goroutine 50 [running, locked to thread]:
runtime.throw(0x468fd7a, 0x21)
    /usr/local/go/src/runtime/panic.go:774 +0x72 fp=0xc0020d6a58 sp=0xc0020d6a28 pc=0x4031912
runtime.mapaccess1_faststr(0x45c5d80, 0xc0003ca180, 0xc002270081, 0x5, 0xc00228c0e0)
    /usr/local/go/src/runtime/map_faststr.go:21 +0x44f fp=0xc0020d6ac8 sp=0xc0020d6a58 pc=0x40152df
git.arvados.org/arvados.git/sdk/go/arvados.(*treenode).Child(0xc000104300, 0xc002270081, 0x5, 0xc002269c50, 0x0, 0x40306ba, 0x44cd4fc, 0xc000104360)
    /ext-go/2/src/git.arvados.org/arvados.git/sdk/go/arvados/fs_base.go:259 +0x56 fp=0xc0020d6b38 sp=0xc0020d6ac8 pc=0x44b3256
git.arvados.org/arvados.git/sdk/go/arvados.(*vdirnode).Child(0xc0000c2cc0, 0xc002270081, 0x5, 0x0, 0xc0020d6c00, 0x40f1883, 0xc0022700b1, 0xa)
    /ext-go/2/src/git.arvados.org/arvados.git/sdk/go/arvados/fs_site.go:191 +0x9a fp=0xc0020d6b88 sp=0xc0020d6b38 pc=0x44c452a
git.arvados.org/arvados.git/sdk/go/arvados.rlookup.func1(0xc0020d6cc0, 0xc0020d6c80, 0x0, 0x0, 0x0, 0x0)
    /ext-go/2/src/git.arvados.org/arvados.git/sdk/go/arvados/fs_base.go:606 +0xc1 fp=0xc0020d6c10 sp=0xc0020d6b88 pc=0x44c78d1
git.arvados.org/arvados.git/sdk/go/arvados.rlookup(0x4758e00, 0xc0000c2cc0, 0xc002270080, 0x3b, 0x4758e00, 0xc0000c2cc0, 0xc0003c0288, 0x88)
    /ext-go/2/src/git.arvados.org/arvados.git/sdk/go/arvados/fs_base.go:607 +0x1bd fp=0xc0020d6ca0 sp=0xc0020d6c10 pc=0x44b5e7d
git.arvados.org/arvados.git/sdk/go/arvados.(*fileSystem).Stat(0xc00012a9a0, 0xc002270080, 0x3b, 0x0, 0x0, 0x0, 0x0)
    /ext-go/2/src/git.arvados.org/arvados.git/sdk/go/arvados/fs_base.go:439 +0x4f fp=0xc0020d6cf0 sp=0xc0020d6ca0 pc=0x44b492f
git.arvados.org/arvados.git/lib/mount.(*keepFS).Getattr(0xc0003c0240, 0xc002270080, 0x3b, 0xc00226fd40, 0xffffffffffffffff, 0x0)
    /ext-go/2/src/git.arvados.org/arvados.git/lib/mount/fs.go:240 +0x16d fp=0xc0020d6d70 sp=0xc0020d6cf0 pc=0x4545f2d
github.com/arvados/cgofuse/fuse.hostGetattr(0x7002b60, 0x70000ffd2be0, 0x0)
    /go/pkg/mod/github.com/arvados/cgofuse@v1.2.0/fuse/host.go:119 +0x123 fp=0xc0020d6e18 sp=0xc0020d6d70 pc=0x453a2b3
github.com/arvados/cgofuse/fuse.go_hostGetattr(...)
    /go/pkg/mod/github.com/arvados/cgofuse@v1.2.0/fuse/host_cgo.go:717
github.com/arvados/cgofuse/fuse._cgoexpwrap_ecb3c7988e70_go_hostGetattr(0x7002b60, 0x70000ffd2be0, 0x12800028)
    _cgo_gotypes.go:592 +0x35 fp=0xc0020d6e40 sp=0xc0020d6e18 pc=0x45406a5
runtime.call32(0x0, 0x70000ffd2a10, 0x70000ffd2aa0, 0x18)
    /usr/local/go/src/runtime/asm_amd64.s:539 +0x3b fp=0xc0020d6e70 sp=0xc0020d6e40 pc=0x405fa3b
runtime.cgocallbackg1(0x0)
    /usr/local/go/src/runtime/cgocall.go:314 +0x1b7 fp=0xc0020d6f58 sp=0xc0020d6e70 pc=0x4005a37
runtime.cgocallbackg(0x0)
    /usr/local/go/src/runtime/cgocall.go:191 +0xc1 fp=0xc0020d6fc0 sp=0xc0020d6f58 pc=0x40057e1
runtime.cgocallback_gofunc(0x0, 0x0, 0x0, 0x0)
    /usr/local/go/src/runtime/asm_amd64.s:793 +0x9b fp=0xc0020d6fe0 sp=0xc0020d6fc0 pc=0x406100b
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1357 +0x1 fp=0xc0020d6fe8 sp=0xc0020d6fe0 pc=0x4061731

goroutine 1 [syscall, 5 minutes]:
github.com/arvados/cgofuse/fuse._Cfunc_hostMount(0x3, 0xc0003bc6c0, 0x4f01310, 0x0)
    _cgo_gotypes.go:515 +0x4d
github.com/arvados/cgofuse/fuse.c_hostMount.func1(0xc000000003, 0xc0003bc6c0, 0x4f01310, 0x4f01310)
    /go/pkg/mod/github.com/arvados/cgofuse@v1.2.0/fuse/host_cgo.go:701 +0x97
github.com/arvados/cgofuse/fuse.c_hostMount(0xc000000003, 0xc0003bc6c0, 0x4f01310, 0xc0001a52c0)
    /go/pkg/mod/github.com/arvados/cgofuse@v1.2.0/fuse/host_cgo.go:701 +0x3d
github.com/arvados/cgofuse/fuse.(*FileSystemHost).Mount(0xc000168e80, 0x0, 0x0, 0xc00015f190, 0x1, 0x1, 0x0)
    /go/pkg/mod/github.com/arvados/cgofuse@v1.2.0/fuse/host.go:666 +0x4ca
git.arvados.org/arvados.git/lib/mount.(*cmd).RunCommand(0x4b38dd0, 0xc000184120, 0x28, 0xc00015f180, 0x2, 0x2, 0x4743220, 0xc0000b0000, 0x4743240, 0xc0000b0008, ...)
    /ext-go/2/src/git.arvados.org/arvados.git/lib/mount/command.go:81 +0x5c4
git.arvados.org/arvados.git/lib/cmd.Multi.RunCommand(0xc00015f0b0, 0x7ffeefbff4b0, 0x22, 0xc00015f170, 0x3, 0x3, 0x4743220, 0xc0000b0000, 0x4743240, 0xc0000b0008, ...)
    /ext-go/2/src/git.arvados.org/arvados.git/lib/cmd/cmd.go:89 +0x280
main.main()
    /ext-go/2/src/git.arvados.org/arvados.git/cmd/arvados-client/cmd.go:65 +0xe1

goroutine 6 [select]:
git.arvados.org/arvados.git/sdk/go/keepclient.(*cachedSvcList).poll(0xc00000ebc0)
    /ext-go/2/src/git.arvados.org/arvados.git/sdk/go/keepclient/discover.go:94 +0x154
created by git.arvados.org/arvados.git/sdk/go/keepclient.(*KeepClient).discoverServices
    /ext-go/2/src/git.arvados.org/arvados.git/sdk/go/keepclient/discover.go:150 +0x542

goroutine 35 [syscall, 5 minutes]:
os/signal.signal_recv(0x0)
    /usr/local/go/src/runtime/sigqueue.go:144 +0x96
os/signal.loop()
    /usr/local/go/src/os/signal/signal_unix.go:23 +0x22
created by os/signal.init.0
    /usr/local/go/src/os/signal/signal_unix.go:29 +0x41

goroutine 48 [chan receive, 5 minutes]:
github.com/arvados/cgofuse/fuse.(*FileSystemHost).Mount.func3(0xc000168e80, 0xc0001a52c0)
    /go/pkg/mod/github.com/arvados/cgofuse@v1.2.0/fuse/host.go:653 +0x41
created by github.com/arvados/cgofuse/fuse.(*FileSystemHost).Mount
    /go/pkg/mod/github.com/arvados/cgofuse@v1.2.0/fuse/host.go:652 +0x44b

goroutine 42 [select]:
git.arvados.org/arvados.git/sdk/go/keepclient.(*cachedSvcList).poll.func1(0xc0001a4d80, 0xc0001a4de0, 0xc00000ebc0)
    /ext-go/2/src/git.arvados.org/arvados.git/sdk/go/keepclient/discover.go:77 +0x1a9
created by git.arvados.org/arvados.git/sdk/go/keepclient.(*cachedSvcList).poll
    /ext-go/2/src/git.arvados.org/arvados.git/sdk/go/keepclient/discover.go:73 +0xaf

goroutine 51 [runnable, locked to thread]:
github.com/arvados/cgofuse/fuse._Cfunc_hostCstatFromFusestat(0x70000f9aebc0, 0x0, 0x0, 0x1000041ed, 0x14000001f6, 0x0, 0x8, 0x5e3c87a4, 0x3441ab98, 0x5e3c87a4, ...)
    _cgo_gotypes.go:430 +0x45
github.com/arvados/cgofuse/fuse.c_hostCstatFromFusestat(...)
    /go/pkg/mod/github.com/arvados/cgofuse@v1.2.0/fuse/host_cgo.go:669
github.com/arvados/cgofuse/fuse.copyCstatFromFusestat(0x70000f9aebc0, 0xc0022d2000)
    /go/pkg/mod/github.com/arvados/cgofuse@v1.2.0/fuse/host.go:80 +0x151
github.com/arvados/cgofuse/fuse.hostGetattr(0x6910210, 0x70000f9aebc0, 0x0)
    /go/pkg/mod/github.com/arvados/cgofuse@v1.2.0/fuse/host.go:120 +0x14b
github.com/arvados/cgofuse/fuse.go_hostGetattr(...)
    /go/pkg/mod/github.com/arvados/cgofuse@v1.2.0/fuse/host_cgo.go:717
github.com/arvados/cgofuse/fuse._cgoexpwrap_ecb3c7988e70_go_hostGetattr(0x6910210, 0x70000f9aebc0, 0x4f7c3f0)
    _cgo_gotypes.go:592 +0x35

goroutine 15611 [IO wait]:
internal/poll.runtime_pollWait(0x6d79f38, 0x72, 0xffffffffffffffff)
    /usr/local/go/src/runtime/netpoll.go:184 +0x55
internal/poll.(*pollDesc).wait(0xc000104118, 0x72, 0x800, 0x83c, 0xffffffffffffffff)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitRead(...)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc000104100, 0xc0000e4900, 0x83c, 0x83c, 0x0, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_unix.go:169 +0x22b
net.(*netFD).Read(0xc000104100, 0xc0000e4900, 0x83c, 0x83c, 0x203000, 0x580020000000000, 0x0)
    /usr/local/go/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc000154058, 0xc0000e4900, 0x83c, 0x83c, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/net.go:184 +0x68
crypto/tls.(*atLeastReader).Read(0xc000476180, 0xc0000e4900, 0x83c, 0x83c, 0xc0020d38c0, 0x4019f3e, 0xc0020d38a0)
    /usr/local/go/src/crypto/tls/conn.go:780 +0x60
bytes.(*Buffer).ReadFrom(0xc0000a8958, 0x47429c0, 0xc000476180, 0x400c9d5, 0x45dd5a0, 0x465c1a0)
    /usr/local/go/src/bytes/buffer.go:204 +0xb4
crypto/tls.(*Conn).readFromUntil(0xc0000a8700, 0x6d7a0d8, 0xc000154058, 0x5, 0xc000154058, 0x12)
    /usr/local/go/src/crypto/tls/conn.go:802 +0xec
crypto/tls.(*Conn).readRecordOrCCS(0xc0000a8700, 0x0, 0x0, 0x3)
    /usr/local/go/src/crypto/tls/conn.go:609 +0x124
crypto/tls.(*Conn).readRecord(...)
    /usr/local/go/src/crypto/tls/conn.go:577
crypto/tls.(*Conn).Read(0xc0000a8700, 0xc0017ee000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    /usr/local/go/src/crypto/tls/conn.go:1255 +0x161
net/http.(*persistConn).Read(0xc0022cc5a0, 0xc0017ee000, 0x1000, 0x1000, 0xc000096120, 0xc0020d3c20, 0x40075a5)
    /usr/local/go/src/net/http/transport.go:1752 +0x75
bufio.(*Reader).fill(0xc0001419e0)
    /usr/local/go/src/bufio/bufio.go:100 +0x103
bufio.(*Reader).Peek(0xc0001419e0, 0x1, 0x0, 0x0, 0x1, 0xc008c5e800, 0x0)
    /usr/local/go/src/bufio/bufio.go:138 +0x4f
net/http.(*persistConn).readLoop(0xc0022cc5a0)
    /usr/local/go/src/net/http/transport.go:1905 +0x1d6
created by net/http.(*Transport).dialConn
    /usr/local/go/src/net/http/transport.go:1574 +0xafe

goroutine 15612 [select]:
net/http.(*persistConn).writeLoop(0xc0022cc5a0)
    /usr/local/go/src/net/http/transport.go:2204 +0x123
created by net/http.(*Transport).dialConn
    /usr/local/go/src/net/http/transport.go:1575 +0xb23

#19 Updated by Lucas Di Pentima 8 months ago

Running 3 concurrent 'git clone' commands was enough to make it crash again with the same error.

#20 Updated by Tom Clegg 8 months ago

  • Description updated (diff)

#21 Updated by Lucas Di Pentima 8 months ago

My latest 'black box' tests from last week were done with the linux binary and the issues found on the Mac version didn't happened: running multiple concurrent git clone operations worked great, also writing through a simulated slow link didn't crash the client.

Will give the code another look.

#22 Updated by Lucas Di Pentima 8 months ago

Code review:

  • lib/mount/command.go
    • Line 29: Typo on comment
    • Line 60: Could that be simplified by using arvadosclient.MakeArvadosClient()?
  • lib/mount/fs.go
    • Line 65: Why should lookupFH() lock the filesystem while just reading data? — NOTE: As I’m learning, concurrent reads AND writes are not possible on maps, but could we use RWMutex to allow concurrent reads?
    • Line 255: Does this conditional make the return value to be “Not implemented” when the fh is a regular file or dir? A comment would help future readers.
    • Line 257: The else clause isn’t necessary. In which case the flow would reach this? Asking because if we’re not implementing this, returning 0 is telling the SO that the operation succeeded.

#23 Updated by Tom Clegg 8 months ago

  • Target version changed from 2020-02-12 Sprint to 2020-02-26 Sprint

#24 Updated by Tom Clegg 8 months ago

  • Description updated (diff)
  • lib/mount/command.go
    • Line 29: Typo on comment

Fixed

  • Line 60: Could that be simplified by using arvadosclient.MakeArvadosClient()?

Not sure I follow. We could use MakeArvadosClient(), but we'd need to call NewClientFromEnv() anyway to get the client var, and once we have that, it seems cleaner to derive the ac/kc clients from it...

  • lib/mount/fs.go
    • Line 65: Why should lookupFH() lock the filesystem while just reading data? — NOTE: As I’m learning, concurrent reads AND writes are not possible on maps, but could we use RWMutex to allow concurrent reads?

Yes, good point -- reads probably outnumber writes, so RWMutex is probably better / less blocking. Changed.

  • Line 255: Does this conditional make the return value to be “Not implemented” when the fh is a regular file or dir? A comment would help future readers.

Added comments. It's ENOSYS when changing mode from file to dir, or vice versa.

  • Line 257: The else clause isn’t necessary. In which case the flow would reach this? Asking because if we’re not implementing this, returning 0 is telling the SO that the operation succeeded.

Added comments. This is the case where chmod is expected to succeed (it's only changing permission bits). It's a no-op because we don't save permission bits. We could return ENOSYS if the mode isn't 0755. I suspect that would make lots of things (like tar xzf) fail instead of doing the obvious thing, though. Perhaps we should have "strict/loose" modes? Meanwhile I've added this to the list of shortcomings.

12308-cgofuse @ 3a2006d29fc38596a4dfb19b331bf2c86a9185ae -- https://ci.arvados.org/view/Developer/job/developer-run-tests/1728/

#25 Updated by Lucas Di Pentima 8 months ago

Thanks for the clarifications and comments, they'll be helpful.
About arvadosclient.MakeArvadosClient() I missed that the client var was used elsewhere, sorry.

This LGTM, thanks!

#26 Updated by Tom Clegg 7 months ago

  • Target version deleted (2020-02-26 Sprint)

#27 Updated by Tom Clegg 5 months ago

  • Description updated (diff)

#28 Updated by Tom Clegg about 1 month ago

  • Related to Bug #16727: [FUSE] [cgofuse] Refresh signatures / reload collection instead of using expired blob signatures added

Also available in: Atom PDF