Feature #12216
closed
[keep-web] machine-readable file listings
Added by Tom Clegg over 6 years ago.
Updated over 6 years ago.
Description
Currently, keep-web serves human-readable directory listings using an HTML template but does not offer machine-readable listings.
Machine-readable listings will permit clients to browse data stored in Keep without having to parse collections' manifest_text. For example, to facilitate collection-browsing for Java programs, we would need to port the manifest-parsing code to Java.
This should be considered a step toward full WebDAV support in keep-web: if possible, the listing API should be compatible with WebDAV clients. Presumably, the easiest path is to implement a webdav.Filesystem backed by Keep, and use a webdav.Handler to serve PROPFIND requests.
refs
- Subject changed from [keep-web] send file listings as JSON if requested by client to [keep-web] machine-readable file listings
- Target version set to Arvados Future Sprints
- Story points set to 2.0
We should also consider providing an S3-compatible API.
- Target version changed from Arvados Future Sprints to 2017-10-25 Sprint
- Assigned To set to Tom Clegg
- Status changed from New to In Progress
Does this include browsing projects? (Probably not, but for the desktop filesystem mount use case, it probably should). Ideally it would provide the same FS view as arv-mount.
12216-webdav-list @
a23fa06e9849f2ab76fa271624e22a245c2abc47
- test case using cadaver client (run-tests.sh now needs "apt install cadaver")
- manual testing with mount.davfs works (it prompts for user&pass; user can be anything, pass is api token)
Shortcomings/TODO:
- I gave up trying to make cadaver do http authentication -- I'm guessing the compile-time option to support .netrc is not enabled in the debian package, and it seems to ignore u:p in
http://u:p@host:port/
- "find /mnt" is slow on a collection with many directories. It does one http request per directory instead of using depth>1, and keep-web uses lots of CPU. I think it's doing lots of un-optimized manifest wrangling. My plan is to fix this by caching the http.FileSystem instead of just the collection.
- There's no new functionality like browsing available collections. You still need to specify the collection ID in the URL in one of the various supported ways. The only difference is that now a webdav client can get the directory listings that used to be available only to a human or html-scraper.
As far as I can see, this looks good.
I've encountered the cached listing behavior I mentioned on the chat, where a listing gets cached and changes are not reflected. If this is client dependent, maybe it would be safe to force listing cache invalidation to avoid hard to debug issues with webdav clients?
Lucas Di Pentima wrote:
I've encountered the cached listing behavior I mentioned on the chat, where a listing gets cached and changes are not reflected. If this is client dependent, maybe it would be safe to force listing cache invalidation to avoid hard to debug issues with webdav clients?
I'm guessing you're seeing something like this:
- Get directory listing from keep-web → receive version 1
- Update collection using REST API → current version is 2
- Get directory listing from keep-web → receive version 1, but expect version 2
(Is this a more general issue, or is there also something I'm missing that makes cached webdav directory listings more confusing than cached file content?)
A couple of ideas
- listen for cache invalidation events, either from arvados-ws or more directly from postgresql
- option for a separate TTL config for the uuid→pdh cache (could be set to zero, or something else shorter than the pdh->manifest cache TTL)
Some follow-up fixes: 12216-webdav-list @
337de2e3dfeacc5054cb644513be61f5d35585ae
- allow Authorization header in cross-origin requests (see commit message 337de2e3d)
- fix crash on some dir-listing reqs with no trailing slash
- huge performance improvement in 991d7d796 (webdav does a lot more file-opening than I thought -- before this, we were parsing the whole manifest multiple times for each file returned in a dir listing!)
Latest updates lgtm, lazy file opening is a cool idea!
Regarding cache invalidation, I was seeing something like you describe: being connected with cadaver client, asked a listing, then uploaded something with arv-put
and asked a listing again, resulting in the same output. In my opinion, a uuid->pdh TTL config would be enough and simpler to implement than an event handler.
Updates at ec0c244be178aed7af0cf990a256dda557034b68 LGTM.
Local keep-web tests didn't complain so I suppose we're not testing those TTLs.
Are cache parameters configurable via keep-web.yml
, they're part of the config struct but don't know if they get picked up from the file. If they're configurable, maybe we should document the difference between both somewhere.
- Status changed from In Progress to Resolved
Applied in changeset arvados|commit:1b5e5a3ef2c174358693f83849f05ed8276be657.
Also available in: Atom
PDF