Project

General

Profile

Actions

Bug #21891

closed

Very slow copying from collection into output in crunch-run copy phase

Added by Peter Amstutz 5 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Story points:
-
Release:
Release relationship:
Auto

Description

Reported very slow copying files from collections to collections in output phase (~15s per file).

Example log:

2024-06-11T20:59:18.902328206Z copying "PT001379858_A.dedup.flagstat" from d75eba3924f3985128d7a2dbb7f59cf6+1786208/PT001379858_A.dedup.flagstat
2024-06-11T20:59:35.158068026Z copying "PT003617020_A.insertsizemetricalgo.metrics.txt" from d75eba3924f3985128d7a2dbb7f59cf6+1786208/PT003617020_A.insertsizemetricalgo.metrics.txt
2024-06-11T20:59:50.741935364Z copying "PT005899462_A.insertsizemetricalgo.metrics.txt" from d75eba3924f3985128d7a2dbb7f59cf6+1786208/PT005899462_A.insertsizemetricalgo.metrics.txt

We've had other reports of performance issues with the copying phase where it is copying from collections in the past, but I assumed it was because it was loading collections from the API server, and that was slow. In this case, these are from the same manifest, so it should be using manifestCache (unless that's not working, for some reason, but monitoring also doesn't indicate substantial network traffic, so it doesn't appear to be spending most of its time fetching stuff).

Maybe the problem is actually the efficiency of the Extract() method?

Also, appending strings repeatedly the way it is being done here is a notorious anti-pattern -- unless Go strings have some special buffer behavior, string appends like this require allocating a new buffer and copying the whole string in order to append text to the end, which tends to dominate runtime as the string gets larger.

        cp.logger.Printf("copying %q from %v/%v", outputRelPath, srcMount.PortableDataHash, strings.TrimPrefix(srcRelPath, "./"))
        mft, err := cp.getManifest(srcMount.PortableDataHash)
        if err != nil {
            return err
        }
        cp.manifest += mft.Extract(srcRelPath, dest).Text

Files

flame-a60b48f5c1.png (51.7 KB) flame-a60b48f5c1.png Tom Clegg, 06/21/2024 12:46 PM
flame-bb806a2834.png (98.2 KB) flame-bb806a2834.png Tom Clegg, 06/21/2024 12:46 PM
flame-2914826a9a.png (143 KB) flame-2914826a9a.png Tom Clegg, 06/23/2024 01:14 PM
flame-acc75dd485.png (124 KB) flame-acc75dd485.png Tom Clegg, 06/23/2024 01:14 PM
flame-4f883aa130.png (149 KB) flame-4f883aa130.png Tom Clegg, 06/23/2024 01:35 PM

Subtasks 1 (0 open1 closed)

Task #21914: Review 21891-output-copying-performanceResolvedTom Clegg06/25/2024Actions

Related issues

Related to Arvados - Bug #22226: crunch-run segfault+panic when output directory includes a file mounted from a collectionResolvedTom Clegg10/28/2024Actions
Actions

Also available in: Atom PDF