Project

General

Profile

Actions

Bug #8383

closed

Some download links are broken

Added by Abram Connelly about 8 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
-
Target version:
-
Story points:
-

Description

The following public file:

http://evidence.pgp-hms.org/genome_download.php?download_genome_id=546fcaf82dff949832160d7969ae7d55aa024c21&download_nickname=Microbiome+data+for+PGP+kit+%232182+%22Goddu%22+-+Goddu.fna.gz

reports an error of:

Error: Unable to open file for download!
Actions #1

Updated by Abram Connelly about 8 years ago

Here are two full links that are failing:

http://evidence.pgp-hms.org/genome_download.php?download_genome_id=546fcaf82dff949832160d7969ae7d55aa024c21&download_nickname=Microbiome+data+for+PGP+kit+%232182+%22Goddu%22+-+Goddu.fna.gz

http://evidence.pgp-hms.org/genome_download.php?download_genome_id=546fcaf82dff949832160d7969ae7d55aa024c21&download_nickname=Microbiome+data+for+PGP+kit+%232182+%22Goddu%22+-+Goddu.txt

From looking at the source, I believe this has to do with the assumption that there is only one file and/or block in the manifest for old-style locators (for example, with a '+K@ant' suffix at the end). The above link has multiple files in the manifest and the regex matching to take out the filename from the manifest isn't working.

Consulting https://github.com/curoverse/get-evidence/blob/2aa78e2913c02ffb8c376a9c026e956a3a6475b7/public_html/genome_download.php#L94:

      if (preg_match('/^(\.[^\s]*) .* 0:(\d+):(\S+)$/', $manifest, $regs)) {
        //$passthru_command = "whget ".escapeshellarg("$locator/**/$regs[2]");
        $subdir = preg_replace( '/^\.\/?/', '', $regs[1] );
        if ( $subdir != "" ) { $subdir = $subdir . "/"; }
        $passthru_command = "arv-get --no-progress ".escapeshellarg("$pdh/$subdir$regs[3]");
        $fsize = $regs[2];
        $ext = preg_replace ('/^.*?((\.\w{3})?(\.[bg]z2?)?)$/', '\1', $regs[2]);
      }

The manifest for the above collection is:

. 7d1d6dcad72711dfff71c79e1d380c1e+2286661+K@ant 0:2286229:Goddu.fna.gz 2286229:432:Goddu.txt

I believe the regex fails to find the first file since the regex is designed to match the first file entry, starting at file offset 0 and making sure there are not more characters after the first manifest file/block entry.

So I think this is a combination of the files being in the 'old' style (that is, having something like a '+K@ant' suffix) and having a faulty regex that doesn't recognize files that have more than one file or block.

Actions #2

Updated by Abram Connelly about 8 years ago

I was mistaken about how some of these things work on the backend. "New style" links are a symlink on the file system, in the /home/trait/upload/ID directory that points the the 'fully qualified' location, meaning it has the subdirectory and file in it. For example, the following is the symlink for the 'input.locator' file on for the following link: http://evidence.pgp-hms.org/genome_download.php?download_genome_id=8e2fb8975d5a05735c56505e1697ad1fa1df73ab&download_nickname=CGI+sample%3A+GS03052-DNA_B01 :

input.locator -> 5236ab958ba6dbe909796ddafce8e570+32508/ASM/var-GS000037338-ASM.tsv.bz2

Whereas the 'old style' link does not have the subdirectory and filename after it. For example, the above symlink in the 'old style' might look like:

input.locator -> 5236ab958ba6dbe909796ddafce8e570+32508

It's unclear to me how to handle multiple filenames. Should we create another subdirectory, one for each file of interest in the /home/trait/upload directory and change the links in Tapestry and GET-Evidence to point to these individual sub-directories?

Actions #3

Updated by Ward Vandewege about 8 years ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100
Actions

Also available in: Atom PDF