Project

General

Profile

Bug #21891

Updated by Peter Amstutz about 1 month ago

Reported very slow copying files from collections to collections in output phase (~15s per file). 

 Example log: 

 <pre> 
 2024-06-11T20:59:18.902328206Z copying "PT001379858_A.dedup.flagstat" from d75eba3924f3985128d7a2dbb7f59cf6+1786208/PT001379858_A.dedup.flagstat 
 2024-06-11T20:59:35.158068026Z copying "PT003617020_A.insertsizemetricalgo.metrics.txt" from d75eba3924f3985128d7a2dbb7f59cf6+1786208/PT003617020_A.insertsizemetricalgo.metrics.txt 
 2024-06-11T20:59:50.741935364Z copying "PT005899462_A.insertsizemetricalgo.metrics.txt" from d75eba3924f3985128d7a2dbb7f59cf6+1786208/PT005899462_A.insertsizemetricalgo.metrics.txt 
 </pre> 

 We've had performance issues with the copying phase in the past, but I assumed it was because it was loading collections from the API server slowly.    But maybe the problem is actually the efficiency of the @Extract()@ method? 

 Also, appending strings repeatedly the way it is being done here is a notorious anti-pattern -- unless Go strings have some special buffer behavior, string appends like this require allocating a new buffer and copying the whole string in order to append text to the end, which tends to dominate runtime as the string gets larger. 

 <pre> 
		 cp.logger.Printf("copying %q from %v/%v", outputRelPath, srcMount.PortableDataHash, strings.TrimPrefix(srcRelPath, "./")) 
		 mft, err := cp.getManifest(srcMount.PortableDataHash) 
		 if err != nil { 
			 return err 
		 } 
		 cp.manifest += mft.Extract(srcRelPath, dest).Text 
 </pre> 

Back