Bug #21891
Updated by Peter Amstutz 10 months ago
Reported very slow copying files from collections to collections in output phase (~15s per file).
Example log:
<pre>
2024-06-11T20:59:18.902328206Z copying "PT001379858_A.dedup.flagstat" from d75eba3924f3985128d7a2dbb7f59cf6+1786208/PT001379858_A.dedup.flagstat
2024-06-11T20:59:35.158068026Z copying "PT003617020_A.insertsizemetricalgo.metrics.txt" from d75eba3924f3985128d7a2dbb7f59cf6+1786208/PT003617020_A.insertsizemetricalgo.metrics.txt
2024-06-11T20:59:50.741935364Z copying "PT005899462_A.insertsizemetricalgo.metrics.txt" from d75eba3924f3985128d7a2dbb7f59cf6+1786208/PT005899462_A.insertsizemetricalgo.metrics.txt
</pre>
We've had performance issues with the copying phase in the past, but I assumed it was because it was loading collections from the API server slowly. But maybe the problem is actually the efficiency of the @Extract()@ method?
Also, appending strings repeatedly the way it is being done here is a notorious anti-pattern -- unless Go strings have some special buffer behavior, string appends like this require allocating a new buffer and copying the whole string in order to append text to the end, which tends to dominate runtime as the string gets larger.
<pre>
cp.logger.Printf("copying %q from %v/%v", outputRelPath, srcMount.PortableDataHash, strings.TrimPrefix(srcRelPath, "./"))
mft, err := cp.getManifest(srcMount.PortableDataHash)
if err != nil {
return err
}
cp.manifest += mft.Extract(srcRelPath, dest).Text
</pre>