Project

General

Profile

Bug #17293

Updated by Peter Amstutz about 3 years ago

Customer reported that a long-lived (multi-day) workflow failed and was automatically restarted.    This turned out to be due to getting 404 errors when trying to update the log collection. 

 It was determined that the log collection had been put in the trash.    It seems like there is a bug where the @trash_at/delete_at@ times were intended to be pushed into the future each time the log collection is updated, but that doesn't actually work: 

 crunchrun.go: 

 <pre> 
 func (runner *ContainerRunner) saveLogCollection(final bool) (response arvados.Collection, err error) { 
 ... 
	 if final { 
		 updates["is_trashed"] = true 
	 } else { 
		 exp := time.Now().Add(crunchLogUpdatePeriod * 24) 
		 updates["trash_at"] = exp 
		 updates["delete_at"] = exp 
	 } 
 </pre> 


 trashable.rb: 

 <pre> 
   def default_trash_interval 
     ... 

     elsif delete_at_changed? && delete_at >= trash_at 
       # Fix delete_at if needed, so it's not earlier than the expiry 
       # time on any permission tokens that might have been given out. 

       # In any case there are no signatures expiring after now+TTL. 
       # Also, if the existing trash_at time has already passed, we 
       # know we haven't given out any signatures since then. 
       earliest_delete = [ 
         @validation_timestamp, 
         trash_at_was, 
       ].compact.min + Rails.configuration.Collections.BlobSigningTTL 

       # The previous value of delete_at is also an upper bound on the 
       # longest-lived permission token. For example, if TTL=14, 
       # trash_at_was=now-7, delete_at_was=now+7, then it is safe to 
       # set trash_at=now+6, delete_at=now+8. 
       earliest_delete = [earliest_delete, delete_at_was].compact.min 

       # If delete_at is too soon, use the earliest possible time. 
       if delete_at < earliest_delete 
         self.delete_at = earliest_delete 
       end 
     end 
 </pre> 

Back