Project

General

Profile

Story #2752

Updated by Tom Clegg over 8 years ago

Two main approaches are available. (Eventually we want both. For now, we want whichever is most accessible/efficient from a development perspective.) approaches: 

 HEAD-before-PUT mode 
 * Simple to implement reliably 
 * Depends on HEAD actually working (might not be true yet when proxy use case is otherwise ready to go) 
 * Does not depend on local filesystem features like inode/ctime 
 * Does not depend on arv-put running in the same user account (or even host) each time 
 * Still requires re-reading the local data 
 * Interacts interestingly with Keep permission mechanism (needs some combination of storing partial manifests and caching permission signatures locally) 

 Local checkpoint 
 * Save state (in @$HOME/.cache@?) while running. 
 ** Separate cache per arvados_api_host (don't get confused when uploading to two different sites) 
 ** Be attentive to race conditions (e.g., refuse to run two resumable transfers that would use the same cached data) 
 ** List of files in order written 
 ** For each file (at minimum) store name, inode, ctime, size 
 ** List of blobs successfully written to Keep (including size) 
 * When resuming, skip what's already done (unless @--no-resume@ is given). 
 ** If blob locators have permission signatures, check their expiry times before deciding to re-use. Warn the user (once per arv-put) if blobs are being re-uploaded for this reason. 

Back