Project

General

Profile

Actions

Bug #20826

closed

Long arv-mount Queue time for large input jobs

Added by Alex Coleman 9 months ago. Updated 9 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Story points:
-

Description

I ran two instances of the same job, and both had a subworkflow step that stayed queued for over an hour, stuck at the Running [arv-mount --foreground --read-write ...] stage.

Container requests UUIDs: pirca-xvhdp-rynp7k752rvbebv, pirca-xvhdp-ne70ju4ksqxjvyu.

Other relevant information may be that it was running on 270 GiB of input data.

Expected behavior: I would expect that it would not stayed queued this long.

Actions #1

Updated by Peter Amstutz 9 months ago

  • Target version set to Future
Actions #2

Updated by Peter Amstutz 9 months ago

in run_agc.cwl

  InitialWorkDirRequirement:
    listing:
    - entry: $(inputs.fileList)
      writable: true
    - entry: $(inputs.refFile)
      writable: true
    - entry: $(inputs.outAgc)
      writable: true

Why are you doing this? the 1 hour startup time is because it's downloading all the files in order for them to be locally writable.

Ihe inputs shouldn't need to be writable, and shouldn't even need to be in InitialWorkDirRequirement?

Actions #3

Updated by Peter Amstutz 9 months ago

  • Target version deleted (Future)
  • Status changed from New to Resolved

We established this is working as designed, the data is being staged to the working directory as "writable" which causes all the data to be copied (so that it can be modified by the process). So all that time is being spent doing data copy in preparation to run. This was due to some issues with the upstream agc tool that Alex is addressing.

In theory, crunch-run could do this more efficiently (by preparing the collection via manifest manipulation) but that requires more substantial effort and should be written as a separate ticket.

Actions

Also available in: Atom PDF