Project

General

Profile

Actions

Bug #14723

closed

[CWL] File that should appear multiple times in output collection only appears once

Added by Peter Amstutz about 5 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Eric Biagiotti
Category:
-
Target version:
Story points:
-
Release relationship:
Auto

Description

A user has reported a situation where, in producing CWL output, there are directories that should contain the same file, however the file only appears in one directory. The theory is that the source files get mapped to exactly one destination in the output collection (picked arbitrarily) instead of copied to every location in the destination collection.

Specifically, make_output_collection uses NoFollowPathMapper() to map each file to exactly one destination. It probably ought to work the other way, mapping each destination to an input file.


Subtasks 1 (0 open1 closed)

Task #15037: Review 14723-cwl-multiple-file-targetsResolvedPeter Amstutz04/11/2019Actions
Actions #1

Updated by Peter Amstutz about 5 years ago

  • Status changed from New to In Progress
Actions #2

Updated by Peter Amstutz about 5 years ago

  • Status changed from In Progress to New
Actions #3

Updated by Peter Amstutz about 5 years ago

  • Description updated (diff)
Actions #4

Updated by Peter Amstutz about 5 years ago

  • Description updated (diff)
Actions #5

Updated by Peter Amstutz about 5 years ago

  • Description updated (diff)
Actions #6

Updated by Peter Amstutz about 5 years ago

  • Target version set to To Be Groomed
Actions #7

Updated by Tom Morris about 5 years ago

  • Assigned To set to Eric Biagiotti
  • Target version changed from To Be Groomed to 2019-04-10 Sprint
Actions #8

Updated by Eric Biagiotti almost 5 years ago

  • Status changed from New to In Progress
Actions #9

Updated by Eric Biagiotti almost 5 years ago

Am able to reproduce with the following workflow:

class: ExpressionTool
cwlVersion: v1.0

inputs:
  inputfile: File
outputs:
  out: Directory[]
requirements:
  InlineJavascriptRequirement: {}
expression: |
  ${
    var dirs = [];
    dirs.push({"class": "Directory",
               "basename": "testdir1",
               "listing": [inputs.inputfile]});
    dirs.push({"class": "Directory",
               "basename": "testdir2",
               "listing": [inputs.inputfile]});
    return {"out": dirs};
  }

and input file:

inputfile:
  class: File
  path: test.txt
Actions #10

Updated by Eric Biagiotti almost 5 years ago

  • Target version changed from 2019-04-10 Sprint to 2019-04-24 Sprint
Actions #11

Updated by Eric Biagiotti almost 5 years ago

The 14723-cwl-multiple-file-targets branch at 810fc3e7aab72aa5ae951141955b56f6d4e47cfd now maps files from their target destination instead. The exception is for literals, which store file contents instead of a resolved location.

I also added a test for this and found a bug where an ExpressionTool with a file array output of literals with the same name would not use name conflict resolution. I fixed this and also added a test.

Unit Tests: https://ci.curoverse.com/view/Developer/job/developer-run-tests/1185/
C/I Tests: https://ci.curoverse.com/view/CWL/job/arvados-cwl-conformance-tests/89/

Actions #13

Updated by Peter Amstutz almost 5 years ago

Please add a comment about how StagingPathMapper, instead of internally mapping source → target, maps targets → source.

Also how mapper() is overridden to maintain the behavior of mapping source → target

Also how literals are a special case, they are still mapped by source (identifier) → target

                if v.resolved == src or (v.type == "CreateFile" and k == src):
                    return v

Let's make this logic exclusive:

if (v.type != "CreateFile" and v.resolved == src) or (v.type == "CreateFile" and k == src):

The code blocks at arvcontainer.py:150 and executor.py:444 are very similar if not identical in function, I wonder if they could be refactored to use the same logic.

Unrelated to this immediate bug, but I noticed this in executor.py:464

            except IOError as e:
                logger.warning("While preparing output collection: %s", e)

Seems like that should be logger.error() and/or re-raise the exception.

Actions #14

Updated by Eric Biagiotti almost 5 years ago

Peter Amstutz wrote:

Please add a comment about how StagingPathMapper, instead of internally mapping source → target, maps targets → source.

Also how mapper() is overridden to maintain the behavior of mapping source → target

Also how literals are a special case, they are still mapped by source (identifier) → target

Done

[...]

Let's make this logic exclusive:

[...]

Fixed

The code blocks at arvcontainer.py:150 and executor.py:444 are very similar if not identical in function, I wonder if they could be refactored to use the same logic.

I don't see a huge benefit to refactoring this at the moment, at least as a part of this bug. Maybe we can add a ticket for this?

Unrelated to this immediate bug, but I noticed this in executor.py:464

[...]

Seems like that should be logger.error() and/or re-raise the exception.

Fixed

Latest at 454ee2b8f0385c542b6f1165a3baf2820425e1a3.
Unit Tests: https://ci.curoverse.com/view/Developer/job/developer-run-tests/1195/
C/I Tests: Run and passed on my machine.

Actions #15

Updated by Peter Amstutz almost 5 years ago

Eric Biagiotti wrote:

Latest at 454ee2b8f0385c542b6f1165a3baf2820425e1a3.
Unit Tests: https://ci.curoverse.com/view/Developer/job/developer-run-tests/1195/
C/I Tests: Run and passed on my machine.

LGTM

Actions #16

Updated by Eric Biagiotti almost 5 years ago

  • Status changed from In Progress to Resolved
Actions #17

Updated by Tom Morris almost 5 years ago

  • Release set to 15
Actions

Also available in: Atom PDF