Project

General

Profile

Bug #21120

Updated by Brett Smith 7 months ago

@arv-mount --help@ says: 

 <pre> 
 
   --file-cache FILE_CACHE                                                                                                                 
                                                                                                                     
                         File data cache size, in bytes (default 8 GiB for disk-based cache or 256 MiB with RAM-only cache) 
 </pre> 

 But what the code actually does for disk-based cache (in @sdk/python/arvados/keep.py@) is: 

 <pre><code class="python"> 
 
                 # Each block uses two file descriptors, one used to 
 
                 # open it initially and hold the flock(), and a second 
 
                 # hidden one used by mmap(). 
 
                 # 
 
                 # Set max slots to 1/8 of maximum file handles.    This 
 
                 # means we'll use at most 1/4 of total file handles. 
 
                 # 
 
                 # NOFILE typically defaults to 1024 on Linux so this 
 
                 # is 128 slots (256 file handles), which means we can 
 
                 # cache up to 8 GiB of 64 MiB blocks.    This leaves 
 
                 # 768 file handles for sockets and other stuff. 
 
                 # 
 
                 # When we want the ability to have more cache (e.g. in 
 
                 # arv-mount) we'll increase rlimit before calling 
 
                 # this. 
 
                 self._max_slots = int(resource.getrlimit(resource.RLIMIT_NOFILE)[0] / 8) 
 [...] 
 
                 fs = os.statvfs(self._disk_cache_dir) 
 
                 # Calculation of available space incorporates existing cache usage 
 
                 existing_usage = arvados.diskcache.DiskCacheSlot.cache_usage(self._disk_cache_dir) 
 
                 avail = (fs.f_bavail * fs.f_bsize + existing_usage) / 4 
 
                 maxdisk = int((fs.f_blocks * fs.f_bsize) * 0.10) 
 
                 # pick smallest of: 
 
                 # 10% of total disk size 
 
                 # 25% of available space 
 
                 # max_slots * 64 MiB 
 
                 self.cache_max = min(min(maxdisk, avail), (self._max_slots * 64 * 1024 * 1024)) 
 </code></pre> 

 This means there are a lot of situations where the default size is not "8 GiB." If there's not much space on the filesystem, it'll be smaller. If the administrator has bumped up the @NOFILE@ limit (which I suspect is common-ish in HPC-type environments), it'll be larger. 

 Expand the help documentation to reflect this.

Back