Project

General

Profile

Idea #12308

Updated by Tom Clegg almost 5 years ago

Background: 

 Python+llfuse was expedient and has done lots of good work for us, but it's not promising as a long term (fast+reliable+maintainable) solution. 

 Implementation: 
 Replacement strategy: 

 * use collection-backed filesystem from #12483, plus #12483 
 * add a more general arvados-backed filesystem ("by_id" directory, etc, same as the one exported via webdav) from #13111 etc) 
 * present as fuse using a library like https://godoc.org/github.com/hanwen/go-fuse/fuse or https://godoc.org/bazil.org/fuse or https://godoc.org/github.com/billziss-gh/cgofuse/fuse 

 The arvados-to-filesystem mapping should be implemented as a native Go interface, with a separate thin layer attaching that to the FUSE library. This way we can export the same filesystem behavior through other interfaces. In particular, we will want at least: 
 * bazil.org/fuse is a popular choice for doing fuse with go 
 * billziss-gh/cgofuse cross-compiles to Linux, Windows, and OSX (but is probably not as good as bazil on linux) 
 * keep-web should export the same hierarchy via webdav 

 Rather than add a separately packaged client program, we should package this as a the first built-in subcommand ("mount") of the source:cmd/arvados-client program a new eventually-all-encompassing CLI tool "arvados". [[CLI client]] 

 TBD: 
 * Approach for handling websocket "update" events 
 * Selectable mechanisms/options for syncing to server (fflush, fsync, close) (on a shell node, flush-on-close, flush-periodically, or flush-after-idle-time flush-on-close might be best; in crunch-run, flush-on-exit might be best) 
 * Desired behavior when updates conflict (write error? clobber? create "oops,clobbered" file?) 

 Other current bugs/limitations: 
 * Not command-line compatible with arv-mount 
 * Logging is not great 
 * Appears to leak memory when using large collection 
 * No docs 
 * No way to control Control overall cache size (currently collectionfs can use lots of RAM in certain non-sequential write scenarios; we need the ability to trade speed for space efficiency in memory-constrained environments) 
 * No warnings given when cache is thrashing 
 * No application level instrumentation (just optional Go pprof) 

Back