Project

General

Profile

Actions

Feature #12216

closed

[keep-web] machine-readable file listings

Added by Tom Clegg over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
Story points:
2.0

Description

Currently, keep-web serves human-readable directory listings using an HTML template but does not offer machine-readable listings.

Machine-readable listings will permit clients to browse data stored in Keep without having to parse collections' manifest_text. For example, to facilitate collection-browsing for Java programs, we would need to port the manifest-parsing code to Java.

This should be considered a step toward full WebDAV support in keep-web: if possible, the listing API should be compatible with WebDAV clients. Presumably, the easiest path is to implement a webdav.Filesystem backed by Keep, and use a webdav.Handler to serve PROPFIND requests.

refs

Subtasks 1 (0 open1 closed)

Task #12443: Review 12216-webdav-listResolvedTom Clegg10/11/2017Actions

Related issues

Related to Arvados - Feature #12090: Collections/data access APIResolved08/08/2017Actions
Related to Arvados - Idea #11876: [R SDK] Create a Bioconductor/R SDKClosedFuad Muhic06/20/2017Actions
Actions #1

Updated by Tom Clegg over 6 years ago

  • Subject changed from [keep-web] send file listings as JSON if requested by client to [keep-web] machine-readable file listings
Actions #2

Updated by Tom Morris over 6 years ago

  • Target version set to Arvados Future Sprints
  • Story points set to 2.0
Actions #3

Updated by Peter Amstutz over 6 years ago

We should also consider providing an S3-compatible API.

Actions #4

Updated by Tom Morris over 6 years ago

  • Target version changed from Arvados Future Sprints to 2017-10-25 Sprint
Actions #5

Updated by Tom Clegg over 6 years ago

  • Assigned To set to Tom Clegg
Actions #6

Updated by Tom Clegg over 6 years ago

  • Status changed from New to In Progress
Actions #7

Updated by Peter Amstutz over 6 years ago

Does this include browsing projects? (Probably not, but for the desktop filesystem mount use case, it probably should). Ideally it would provide the same FS view as arv-mount.

Actions #8

Updated by Tom Clegg over 6 years ago

12216-webdav-list @ a23fa06e9849f2ab76fa271624e22a245c2abc47
  • test case using cadaver client (run-tests.sh now needs "apt install cadaver")
  • manual testing with mount.davfs works (it prompts for user&pass; user can be anything, pass is api token)
Shortcomings/TODO:
  • I gave up trying to make cadaver do http authentication -- I'm guessing the compile-time option to support .netrc is not enabled in the debian package, and it seems to ignore u:p in http://u:p@host:port/
  • "find /mnt" is slow on a collection with many directories. It does one http request per directory instead of using depth>1, and keep-web uses lots of CPU. I think it's doing lots of un-optimized manifest wrangling. My plan is to fix this by caching the http.FileSystem instead of just the collection.
  • There's no new functionality like browsing available collections. You still need to specify the collection ID in the URL in one of the various supported ways. The only difference is that now a webdav client can get the directory listings that used to be available only to a human or html-scraper.
Actions #9

Updated by Lucas Di Pentima over 6 years ago

As far as I can see, this looks good.

I've encountered the cached listing behavior I mentioned on the chat, where a listing gets cached and changes are not reflected. If this is client dependent, maybe it would be safe to force listing cache invalidation to avoid hard to debug issues with webdav clients?

Actions #10

Updated by Tom Clegg over 6 years ago

Lucas Di Pentima wrote:

I've encountered the cached listing behavior I mentioned on the chat, where a listing gets cached and changes are not reflected. If this is client dependent, maybe it would be safe to force listing cache invalidation to avoid hard to debug issues with webdav clients?

I'm guessing you're seeing something like this:
  1. Get directory listing from keep-web → receive version 1
  2. Update collection using REST API → current version is 2
  3. Get directory listing from keep-web → receive version 1, but expect version 2

(Is this a more general issue, or is there also something I'm missing that makes cached webdav directory listings more confusing than cached file content?)

A couple of ideas
  • listen for cache invalidation events, either from arvados-ws or more directly from postgresql
  • option for a separate TTL config for the uuid→pdh cache (could be set to zero, or something else shorter than the pdh->manifest cache TTL)
Actions #11

Updated by Tom Clegg over 6 years ago

Some follow-up fixes: 12216-webdav-list @ 337de2e3dfeacc5054cb644513be61f5d35585ae
  • allow Authorization header in cross-origin requests (see commit message 337de2e3d)
  • fix crash on some dir-listing reqs with no trailing slash
  • huge performance improvement in 991d7d796 (webdav does a lot more file-opening than I thought -- before this, we were parsing the whole manifest multiple times for each file returned in a dir listing!)
Actions #12

Updated by Lucas Di Pentima over 6 years ago

Latest updates lgtm, lazy file opening is a cool idea!
Regarding cache invalidation, I was seeing something like you describe: being connected with cadaver client, asked a listing, then uploaded something with arv-put and asked a listing again, resulting in the same output. In my opinion, a uuid->pdh TTL config would be enough and simpler to implement than an event handler.

Actions #13

Updated by Tom Clegg over 6 years ago

12216-webdav-list @ ec0c244be178aed7af0cf990a256dda557034b68
  • merged master
  • separate TTL for uuid->pdh cache (default 5 seconds)
Actions #14

Updated by Lucas Di Pentima over 6 years ago

Updates at ec0c244be178aed7af0cf990a256dda557034b68 LGTM.
Local keep-web tests didn't complain so I suppose we're not testing those TTLs.
Are cache parameters configurable via keep-web.yml, they're part of the config struct but don't know if they get picked up from the file. If they're configurable, maybe we should document the difference between both somewhere.

Actions #15

Updated by Anonymous over 6 years ago

  • Status changed from In Progress to Resolved

Applied in changeset arvados|commit:1b5e5a3ef2c174358693f83849f05ed8276be657.

Actions #16

Updated by Tom Clegg over 6 years ago

There's an experiment using browser-side JS to get directory listings:

spike-wb-browse-collection @ 20dbebcdd863589f47bce138418cfcacd5f32b2e

Actions

Also available in: Atom PDF