Project

General

Profile

Bug #19368

Updated by Tom Clegg over 1 year ago

While investigating slow HEAD requests I noticed a few inefficiencies in logUploadOrDownload that could be causing substantial delays: 
 * if request method is anything other than PUT, POST, or GET, logUploadOrDownload() doesn't log anything -- but it does all the work to determine the collection ID and properties that would be logged if the method were different. 
 * similarly, if Collections.WebDAVLogEvents is disabled, it spends time determining the collection ID and properties to log, before noticing that logging is disabled. 
 * determineCollection() walks the filesystem tree from root to requested target, looking for a special @.arvados#collection@ file at each level, which may incur several unnecessary API calls (e.g., @/by_id/$projectid/.arvados#collection@ will try to look up a subproject or collection with that name) 
 * (most importantly?) generating the magic @.arvados#collection@ file involves writing a new manifest for the entire directory tree, which (a) is not actually needed here, (b) can be very large and therefore slow to generate, and (c) due to the root-to-leaf approach is always generated for the entire collection, even though reading the magic file from the same directory as the requested target would often generate a much smaller manifest and still return the correct collection UUID. 

 Proposed improvements: 
 * Return early from logUploadOrDownload if nothing will be logged due to method or config 
 * Introduce an @.arvados#uuid@ special file that just returns the UUID of the relevant collection or project represented by that directory 
 * Start from the target and walk up towards root, instead of the other way around 

Back