Project

General

Profile

Actions

Bug #12606

closed

Symlink in output points to invalid location -- no such file or directory

Added by Brad Chapman over 6 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
-
Story points:
-

Description

Hi all;
I'm running into an issue on runs with symlinked outputs, specifically for strelka2 variant calling. The run finishes cleanly and then fails when uploading results:

017-11-17T11:31:37.852090800Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-ploidy.vcf.gz (403 bytes) 2017-11-17T11:31:37.854757100Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-ploidy.vcf.gz.tbi (385 bytes) 2017-11-17T11:31:37.854791000Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-regions-merged.bed (285572 bytes) 2017-11-17T11:31:37.854875100Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-regions-merged.bed.gz (94913 bytes) 2017-11-17T11:31:37.854917900Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-regions-merged.bed.gz.tbi (13964 bytes) 2017-11-17T11:31:37.854941700Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-regions.bed (285572 bytes) 2017-11-17T11:31:37.855083700Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-work/results/stats/runStats.tsv (170 bytes) 2017-11-17T11:31:37.856529200Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-work/results/stats/runStats.xml (451 bytes) 2017-11-17T11:31:37.856914000Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-work/results/variants/genome.S1.vcf.gz (24603795 bytes) 2017-11-17T11:31:37.868911500Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-work/results/variants/genome.S1.vcf.gz.tbi (34744 bytes) 2017-11-17T11:31:37.868975600Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-work/results/variants/genome.vcf.gz (24603795 bytes) 2017-11-17T11:31:37.878724000Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-work/results/variants/genome.vcf.gz.tbi (34744 bytes) 2017-11-17T11:31:37.878796700Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-work/results/variants/variants.vcf.gz (4136633 bytes) 2017-11-17T11:31:37.880521500Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-work/results/variants/variants.vcf.gz.tbi (59036 bytes) 2017-11-17T11:31:37.880587900Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-work/runWorkflow.py (7795 bytes) 2017-11-17T11:31:37.882003000Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-work/runWorkflow.py.config.pickle (4785 bytes) 2017-11-17T11:31:37.882031100Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-work/workflow.error.log.txt (0 bytes) 2017-11-17T11:31:37.882054800Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-work/workflow.exitcode.txt (2 bytes) 2017-11-17T11:31:37.882076800Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-work/workflow.warning.log.txt (0 bytes) 2017-11-17T11:31:37.882133000Z While uploading output files: Symlink in output "/strelka2/chr19/NA24385-chr19_0_16089283-block.vcf.gz" points to invalid location "genome.S1.vcf.gz": lstat /tmp/603101599/strelka2/chr19/genome.S1.vcf.gz: no such file or directory 2017-11-17T11:31:37.882144900Z Cancelled

This is an example project and run that show the problem:

https://cloud.curoverse.com/container_requests/qr1hi-xvhdp-j10b0yribnprohf
https://cloud.curoverse.com/container_requests/qr1hi-xvhdp-ewo8ck7owxbuyud#Log

It appears to move files, then fail when a symlink that points to those files is uploaded later. Here is what the directory structure for one of those strelka2 runs looks like:

├── Test1-chrM_0_1000-block-ploidy.vcf.gz
├── Test1-chrM_0_1000-block-ploidy.vcf.gz.tbi
├── Test1-chrM_0_1000-block-regions.bed
├── Test1-chrM_0_1000-block-regions-merged.bed
├── Test1-chrM_0_1000-block-regions-merged.bed.gz
├── Test1-chrM_0_1000-block-regions-merged.bed.gz.tbi
├── Test1-chrM_0_1000-block.vcf.gz -> Test1-chrM_0_1000-block-work/results/variants/genome.vcf.gz
├── Test1-chrM_0_1000-block.vcf.gz.tbi -> Test1-chrM_0_1000-block-work/results/variants/genome.vcf.gz.tbi
└── Test1-chrM_0_1000-block-work
    ├── results
    │   ├── stats
    │   │   ├── runStats.tsv
    │   │   └── runStats.xml
    │   └── variants
    │       ├── genome.S1.vcf.gz
    │       ├── genome.S1.vcf.gz.tbi
    │       ├── genome.vcf.gz -> genome.S1.vcf.gz
    │       ├── genome.vcf.gz.tbi -> genome.S1.vcf.gz.tbi
    │       ├── variants.vcf.gz
    │       └── variants.vcf.gz.tbi
    ├── runWorkflow.py
    ├── runWorkflow.py.config.pickle
    ├── workflow.error.log.txt
    ├── workflow.exitcode.txt
    └── workflow.warning.log.txt

Thanks for any tips or clues on how to work around or fix.


Subtasks 2 (1 open1 closed)

Task #12637: ReviewClosedLucas Di Pentima11/17/2017Actions
Task #12609: DiagnoseNewPeter Amstutz11/17/2017Actions

Related issues

Related to Arvados - Bug #12183: [crunch-run] Handle symlinks with absolute paths into output directoryResolvedPeter Amstutz09/28/2017Actions
Related to Arvados - Bug #13100: [crunch-run] Replace custom manifest-writing code with collectionFSResolvedTom Clegg03/15/2018Actions
Actions #1

Updated by Brad Chapman over 6 years ago

Sorry the error message got formatted all on one line for some reason. Here it is in a more readable format:

2017-11-17T11:31:37.852090800Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-ploidy.vcf.gz (403 bytes)
2017-11-17T11:31:37.854757100Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-ploidy.vcf.gz.tbi (385 bytes)
2017-11-17T11:31:37.854791000Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-regions-merged.bed (285572 bytes)
2017-11-17T11:31:37.854875100Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-regions-merged.bed.gz (94913 bytes)
2017-11-17T11:31:37.854917900Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-regions-merged.bed.gz.tbi (13964 bytes)
2017-11-17T11:31:37.854941700Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-regions.bed (285572 bytes)
2017-11-17T11:31:37.855083700Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-work/results/stats/runStats.tsv (170 bytes)
2017-11-17T11:31:37.856529200Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-work/results/stats/runStats.xml (451 bytes)
2017-11-17T11:31:37.856914000Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-work/results/variants/genome.S1.vcf.gz (24603795 bytes)
2017-11-17T11:31:37.868911500Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-work/results/variants/genome.S1.vcf.gz.tbi (34744 bytes)
2017-11-17T11:31:37.868975600Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-work/results/variants/genome.vcf.gz (24603795 bytes)
2017-11-17T11:31:37.878724000Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-work/results/variants/genome.vcf.gz.tbi (34744 bytes)
2017-11-17T11:31:37.878796700Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-work/results/variants/variants.vcf.gz (4136633 bytes)
2017-11-17T11:31:37.880521500Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-work/results/variants/variants.vcf.gz.tbi (59036 bytes)
2017-11-17T11:31:37.880587900Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-work/runWorkflow.py (7795 bytes)
2017-11-17T11:31:37.882003000Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-work/runWorkflow.py.config.pickle (4785 bytes)
2017-11-17T11:31:37.882031100Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-work/workflow.error.log.txt (0 bytes)
2017-11-17T11:31:37.882054800Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-work/workflow.exitcode.txt (2 bytes)
2017-11-17T11:31:37.882076800Z Uploading strelka2/chr19/NA24385-chr19_0_16089283-block-work/workflow.warning.log.txt (0 bytes)
2017-11-17T11:31:37.882133000Z While uploading output files: Symlink in output "/strelka2/chr19/NA24385-chr19_0_16089283-block.vcf.gz" points to invalid location "genome.S1.vcf.gz": lstat /tmp/603101599/strelka2/chr19/genome.S1.vcf.gz: no such file or directory
2017-11-17T11:31:37.882144900Z Cancelled

Actions #2

Updated by Peter Amstutz over 6 years ago

I think what it is trying to tell you that the output file strelka2/chr19/NA24385-chr19_0_16089283-block.vcf.gz is a symlink to genome.S1.vcf.gz, but genome.S1.vcf.gz doesn't exist.

It looks like it might be resolving the relative symlink incorrectly, which would be a bug. I'll have to investigate.

Actions #3

Updated by Brad Chapman over 6 years ago

Thanks Peter. I'll work around this by avoiding the symlinks and copying the files out of the strelka2 specific directory into the out directory. If you end up digging into it, I'd be interested to know if it's the relative symlinks or the nested double symlink that causes issues, in case I do any symlinking elsewhere. I'll report back after testing with the copies output.

Actions #4

Updated by Peter Amstutz over 6 years ago

  • Assigned To set to Peter Amstutz
  • Target version set to 2017-11-22 Sprint
Actions #5

Updated by Peter Amstutz over 6 years ago

  • Target version changed from 2017-11-22 Sprint to 2017-12-06 Sprint
Actions #6

Updated by Peter Amstutz over 6 years ago

  • Target version changed from 2017-12-06 Sprint to 2017-12-20 Sprint
Actions #7

Updated by Peter Amstutz over 6 years ago

  • Target version changed from 2017-12-20 Sprint to Arvados Future Sprints
Actions #8

Updated by Tom Clegg about 6 years ago

  • Related to Bug #13100: [crunch-run] Replace custom manifest-writing code with collectionFS added
Actions #9

Updated by Peter Amstutz about 6 years ago

Here's a minimal CWL test case:

class: CommandLineTool
cwlVersion: v1.0
inputs: []
outputs:
  out:
    type: File
    outputBinding:
      glob: Test1-chrM_0_1000-block.vcf.gz
requirements:
  ShellCommandRequirement: {}
arguments:
  - shellQuote: false
    valueFrom: |
      mkdir -p Test1-chrM_0_1000-block-work/results/variants
      echo genome | gzip -c > Test1-chrM_0_1000-block-work/results/variants/genome.S1.vcf.gz
      cd Test1-chrM_0_1000-block-work/results/variants
      ln -s genome.S1.vcf.gz genome.vcf.gz
      cd ../../..
      ln -s Test1-chrM_0_1000-block-work/results/variants/genome.vcf.gz Test1-chrM_0_1000-block.vcf.gz

This fails on current master but passes on #13100 branch.

Actions #10

Updated by Tom Clegg about 6 years ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100
Actions #11

Updated by Tom Morris over 5 years ago

  • Target version deleted (Arvados Future Sprints)
Actions

Also available in: Atom PDF