Bug #10978

[CWL] Avoid using "+" char in mount paths

Added by Peter Amstutz almost 2 years ago. Updated over 1 year ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Story points:
0.5

Description

The "+" character is misinterpreted by some tools such as older versions of Picard. Consider changing it to "-" in keep mount paths for improved compatibility with poorly behaved tools.

History

#1 Updated by Peter Amstutz almost 2 years ago

  • Description updated (diff)

#2 Updated by Tom Clegg almost 2 years ago

We could also consider naming the mount dirs according to purpose instead of content -- e.g., mount at "/mnt/bamFile" instead of "/keep/d41d8cd98f00b204e9800998ecf8427e+0".

That might also make container logs easier to read: "/mnt/tumor/sample1234.bam" and "/mnt/normal/sample5678.bam" instead of "/keep/pdh78af66a/sample1234.bam" and "/keep/pdh83b2f9a/sample5678.bam".

#3 Updated by Tom Morris almost 2 years ago

[Tom replied after I'd composed this reply, but before I submitted it. I like this idea, but still think the questions below are valid.]

Am I wrong in thinking that any change to the "containers API" automatically invalidates all jobs which have been run to-date from a reusability point of view?

Can you expand on the pros and cons that you perceive for such a change?

#4 Updated by Peter Amstutz almost 2 years ago

Tom Clegg wrote:

We could also consider naming the mount dirs according to purpose instead of content -- e.g., mount at "/mnt/bamFile" instead of "/keep/d41d8cd98f00b204e9800998ecf8427e+0".

That might also make container logs easier to read: "/mnt/tumor/sample1234.bam" and "/mnt/normal/sample5678.bam" instead of "/keep/pdh78af66a/sample1234.bam" and "/keep/pdh83b2f9a/sample5678.bam".

Yes, we could take the input parameter name into account when determining the mount path. However it would require a bit of work since that information currently isn't easily available to the part of the code that decides where to mount things.

#5 Updated by Peter Amstutz almost 2 years ago

Tom Morris wrote:

[Tom replied after I'd composed this reply, but before I submitted it. I like this idea, but still think the questions below are valid.]

Am I wrong in thinking that any change to the "containers API" automatically invalidates all jobs which have been run to-date from a reusability point of view?

You are not wrong. This would change mount points which would invalidate job reuse.

Can you expand on the pros and cons that you perceive for such a change?

Pros: solves a user problem with a commonly-used tool.

Cons: accommodating badly-behaved tools is a slippery slope, latest version of tool doesn't have the problem, reasonable workarounds (e.g. putting the code in Docker instead of Keep) are available.

#6 Updated by Tom Clegg almost 2 years ago

Just a thought: if it's too much work under the hood to use symbolic names like "bamFile", we could consider "/mnt/d41d8cd9".

(Having "-" and "+" PDH forms in various places seems like a confusing road to go down. Keep-web is forced into it because it really needs to communicate the whole PDH, but in this case we don't actually need it to be a PDH at all: it just has to be deterministic so re-use works, and it's best if it doesn't make the logs too hard for a human to read.)

#7 Updated by Tom Clegg almost 2 years ago

  • Subject changed from [CWL] Consider changing keep mounts to use "-" instead of "+" in containers API to [CWL] Avoid using "+" char in mount paths

#8 Updated by Tom Morris over 1 year ago

  • Target version set to Arvados Future Sprints

Also available in: Atom PDF