Project

General

Profile

Feature #22320

Updated by Tom Clegg 5 months ago

From [[Efficient block packing for small WebDAV uploads]] 
 * filehandle method only needs to be supported when target is a dirnode (repacking a file could be useful, e.g., fuse driver, but not needed for webdav) 
 * traverse dir/filesystem, finding opportunities to merge small (<32MiB) blocks into larger (>=32MiB) blocks 
 * optionally (opts.Underutilized) merge segments from underutilized blocks into [larger] fully-utilized blocks -- note this shouldn't be used for single-directory repacking, because the unreferenced portions of blocks might be referenced by files elsewhere in the collection 
 * optionally (opts.CachedOnly) skip blocks that aren't in the local cache; see diskCacheProber below 
 * optionally (opts.Full) generate optimal repacking based on assumption that no further files will be written (we might postpone implementing this at first, since it's not needed for webdav) 
 * optionally (opts.DryRun) don't apply changes, just report what would happen (for tests and possibly a future Workbench feature that hints when explicit repack is advisable) 
 * remember which segments got remapped, so the changes can be pushed later; see Sync below 
 * repacking algorithm performance goal: reasonable amortized cost & reasonably well-packed collection when called after each file in a set of sequential/concurrent small file writes 
 ** e.g., after writing 64 100-byte files, there should be fewer than 64 blocks, but the first file's data should have been rewritten far fewer than 64 times 
 ** test suite should confirm decent performance in some pathological cases 

 Add @diskCacheProber@ type that allows caller to efficiently check whether a block is in local cache 
 * copy an existing DiskCache and change its KeepGateway changed to a gateway that fails reads/writes 
 * to check whether a block is in cache, ask the DiskCache to read 0 bytes 
 * avoids the cost of transferring any data or connecting to a backend 
 * edge case: this will also return true for a block that is currently being read from a backend into the cache -- this is arguably not really "in cache" and reading the data could still be slow or return a backend error, however, it should be OK to treat it as available for repacking purposes. 

 Update (collectionFileSystem)Sync() to invoke @replace_segments@ if the collection has been repacked

Back