Idea #6380
closedReading a new collection from keep takes extra time
Description
When testing out how samtools merge looks at files, I ran some tests here:
#!/bin/bash echo starting local time ./samtools merge 22.bam *22.bam rm 22.bam echo starting arv-mount time ~/keep/by_id/0b5dd5ad3fd555dbb9ef81a027b69dec+18147/samtools merge 22.bam *22.bam rm 22.bam echo starting read-keep time ~/keep/by_id/0b5dd5ad3fd555dbb9ef81a027b69dec+18147/samtools merge 22.bam \ ~/keep/by_id/ff037b7792b5f287b8553db679714717+185949/xaa.22.bam \ ~/keep/by_id/ff037b7792b5f287b8553db679714717+185949/xab.22.bam \ ~/keep/by_id/ff037b7792b5f287b8553db679714717+185949/xac.22.bam \ ~/keep/by_id/ff037b7792b5f287b8553db679714717+185949/xad.22.bam \ ~/keep/by_id/ff037b7792b5f287b8553db679714717+185949/xae.22.bam \ ~/keep/by_id/ff037b7792b5f287b8553db679714717+185949/xaf.22.bam \ ~/keep/by_id/ff037b7792b5f287b8553db679714717+185949/xag.22.bam \ ~/keep/by_id/ff037b7792b5f287b8553db679714717+185949/xah.22.bam \ ~/keep/by_id/ff037b7792b5f287b8553db679714717+185949/xai.22.bam rm 22.bam echo starting new-coll time ~/keep/by_id/0b5dd5ad3fd555dbb9ef81a027b69dec+18147/samtools merge 22.bam \ ~/keep/by_id/84585b846972161cd8b106226bc1ba0a+817/*
... and got these results.
starting local
real 0m22.562s
user 0m20.788s
sys 0m0.416s
starting arv-mount
real 0m22.754s
user 0m20.796s
sys 0m0.380s
starting read-keep
real 0m22.560s
user 0m20.580s
sys 0m0.416s
starting new-coll
real 2m35.678s
user 0m25.852s
sys 0m1.392s
here, 84585b846972161cd8b106226bc1ba0a+817 is a new collection I created using workbench. ff037b7792b5f287b8553db679714717+185949 is a collection previously accessed. (All commands are calling the same files)
I have concerns that when running a job with a docker image that has never accessed a collection, the time to load that collection will scale with the amount of files in that collection.