Project

General

Profile

Actions

Feature #6012

closed

Finding locator information for files in keep mounts is more difficult than it should be

Added by Abram Connelly almost 9 years ago. Updated about 4 years ago.

Status:
Closed
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Story points:
-

Description

I have a keep mount. It has a 'shared' and 'home' directory. Finding the portable data hash string for files in those directories requires me to do more work than I think should be necessary.

For example, the following script formats the relevant files in a sub project off of my home project the way I would like:

#!/bin/bash

bd="$HOME/keep/home/69 1kg CGI Var files" 

pushd "$bd" > /dev/null 2>&1

while read line
do
  d=`echo "$line" | cut -f1 -d'/'`
  pdh=`jq -r .portable_data_hash "$d/.arvados#collection"`
  echo $pdh/$line
done < <( find . -type f -name var-*-ASM.tsv.bz2 | sed 's;^\./;;' )

popd > /dev/null 2>&1

Which produces output of the form:

c195c45507bc5c12b1d15c10a6ca9e9e+2492/GS12891-1100-37-ASM/GS00362-DNA_G02/ASM/var-GS12891-1100-37-ASM.tsv.bz2
1929f04d0c04d6c66d636fa01dee87b4+3469/NA18501-200-37-ASM/GS00392-DNA_D02/ASM/var-NA18501-200-37-ASM.tsv.bz2
5e5a19fce70619915fae3db2de4a1270+2491/GS12880-1100-37-ASM/GS00362-DNA_D01/ASM/var-GS12880-1100-37-ASM.tsv.bz2
96343fcc8f3ede03e54daf55b4d76472+3469/NA18517-200-37-ASM/GS00392-DNA_B03/ASM/var-NA18517-200-37-ASM.tsv.bz2
...

The use case might be so specific that it might not warrant any further development but this is becoming a common idiom for me. Specifically when I have projects that have other sub-projects or lists of collections that I need to iterate over, I often need to back out and recreate the portable data hash for use my pipelines. Requiring project UUIDs runs into the same issue.

Some possible fixes that come to mind are:

  • provide a "magic file" like .arvados#collection but in each of the sub directories that holds relevant information about the parent project UUID or portable data hash.
  • provide a helper tool to give the project UUID or portable data hash. For example arv get_keep_file_portable_data_hash my_filename.
Actions #1

Updated by Peter Amstutz about 4 years ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF