Feature #22076
openkeep-web can create a zipfile on the fly of a collection
Description
Accessed by making a POST request to the root of the (WebDAV endpoint) for the collection on keep-web. Works by PDH or UUID.
Should work the same whether using the "inline" or "download only" endpoint. Must be the collection root, not a subdirectory.
Indicate that it should be a zipfile by providing the header Accept: application/zip
(confirm that is the right MIME type).
The POST body is either empty (get the whole collection), or a JSON array of strings which are paths within the collection to be included in the zip.
These are files or directories, if a path goes to a directory then it gets the entire contents of that directory. If there is both a reference to a subdirectory and to a specific file within that subdirectory, it gets the whole subdirectory (the file reference is redundant).
The list of files/directories should be sorted so they always download in the same order.
The zip file should be streamed to avoid excessive copying or use of staging storage.
If any of the file paths requested do not exist in the collection, return an error.
Check with customer¶
We probably do not need to support Range requests, this will be confirmed with customer.
We probably don't need to compress the files, but need to check.
Consider including the ".arvados#collection" file in the zip with the collection metadata.
Answers from customer (Feb 3)¶
- We probably do not need to support Range requests, this will be confirmed with customer.
Answer: no, the intended use case is people downloading using a browser, which typically don't implement resumable downloads, which would be the main reason to implement Range requests.
- We probably don't need to compress the files, but need to check.
Convinced them that there are tradeoffs and isn't necessary for the initial implementation, enabling compression could be a future improvement.
- Consider including the ".arvados#collection" file in the zip with the collection metadata.
Agreed that the usefulness of including metadata about the Arvados collection outweighs the risk of end user confusion.
I noted it probably should not be called exactly ".arvados#collection" since that is a special file that is used by some tools to detect directories backed by arv-mount. We should think about what to call it instead -- maybe "collection.json" ?