Bug #2937

Updated by Tom Clegg over 5 years ago

Last sprint, we made a collections download page that's meant to be wget-friendly: it requires no authorization, and it doesn't link to any resources other than the files in the Collection. This is #2764.

Unfortunately, we just discovered that the page is not friendly when the collection contains files in directories. wget is not automatically creating destination subdirectories, but it does attempt to save files under them, which fails:

--2014-05-29 17:34:50-- https://workbench.4xphq.arvadosapi.com/collections/download/3bcb4a087ce4f1db3126b81204f16eef+92/5dvoynjlty21p41ts0yno9g0izrins8delmexuh9wuvndvhcmw/testcoll/alice.txt
Reusing existing connection to workbench.4xphq.arvadosapi.com:443.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/plain]
5dvoynjlty21p41ts0yno9g0izrins8delmexuh9wuvndvhcmw/testcoll: Not a directory5dvoynjlty21p41ts0yno9g0izrins8delmexuh9wuvndvhcmw/testcoll/alice.txt: Not a directory

Cannot write to `5dvoynjlty21p41ts0yno9g0izrins8delmexuh9wuvndvhcmw/testcoll/alice.txt' (Not a directory).

We need to figure out a solution to this. So far there doesn't seem to be a wget command-line switch that will do it. Based on a mirror of my personal site, it seems like wget does make the directories if they're linked to as directories, so making empty links on the page that do that will solve the issue. Maybe they can 404, so we trick wget into making the directories without saving anything else that would clutter the Collection download? This needs more investagation/testing.

# Offer a sensible "subdirectory view" at @/collections/download/{uuid}/{token}/foo/@, i.e., showing links to only files whose path starts with @./foo/@
# At any given level, show contents of the collection/subtree like this:
#* foo/
#* foo/bar.txt
#* foo/baz/
#* foo/baz/waz.txt
#* ...