Project

General

Profile

Idea #3761

Updated by Tom Clegg about 9 years ago

Currently, when receiving its first pull list, keepstore sets up a WorkQueue instance called pullq. At the same time it should also start a pull worker goroutine: <pre><code class="go">go RunPullWorker(pullq.NextItem)</code></pre> 

 The resulting goroutine will run forever, processing pull requests on the WorkQueue one at a time. 

 "RunPullWorker" will: 
 * Get the next pull request. 
 * For each server, try Pull(). Stop when one succeeds. 
 * Repeat. 

 "Pull" will: 
 * Generate a random API token[1]. 
 * Generate a permission signature using the random token, timestamp ~60 seconds in the future, and desired block hash. 
 * Using this token & signature, retrieve the given block from the given keepstore server. 
 * Verify checksum and write to storage, just as if it had been provided by a client in a PUT transaction. I.e., PutBlock(). 

 RunPullWorker() and Pull() will look something like this: 

 <pre><code class="go"> 
 func RunPullWorker(nextItem <-chan interface{}) { 
   for item := range nextItem { 
     pullReq := item.(PullRequest) 
     for _, addr := range pullReq.Servers { 
       err := pw.Pull(pullReq.Locator, addr) 
       if err == nil { 
         break 
       } 
     } 
   } 
 } 

 func (pw *PullWorker) Pull(addr string, locator string) (err error) { 
   log.Printf("Pull %s/%s starting", locator, addr) 
   defer func() { 
     if err == nil { 
       log.Printf("Pull %s/%s success", addr, locator) 
     } else { 
       log.Printf("Pull %s/%s error: %s", addr, locator, err) 
     } 
   }() 
   // (will also need to set auth headers and add a signature token to the locator here) 
   resp, err = http.Get("http://%s/%s", addr, locator) 
   if err { return } 
   data, err = ioutil.ReadAll(resp.Body) 
   if err { return } 
   err = PutBlock(data, locator) 
   return 
 } 
 </code></pre> 

 PullWorker doesn't need to worry about: 
 * Retrying (Data Manager will tell us to do the pull again, if it's still needed) 
 * Concurrency (we can add concurrency safely & easily by starting multiple PullWorkers) 
 * Noticing when the pull list changes, or is empty (WorkQueue already does all this: we just read from the channel, and something will arrive when there's something for this thread to do) 
 * Detecting whether a given pull request is useless, e.g., data already present, before pulling (instead, trust Data Manager to give us useful pull lists, and be OK with an occasional superfluous GET) 

 fn1. Currently, Keep doesn't actually verify API tokens, just the permission signature, so a random token is just as effective as a real one. 

Back