Story #14382

Review Java SDK PR

Added by Tom Morris about 2 months ago. Updated 2 days ago.

Status:
In Progress
Priority:
High
Assigned To:
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Story points:
-

Description

Please do a design and code review of the PR for the new Java SDK in the arvados-sdk-java repo

History

#1 Updated by Tom Morris about 1 month ago

  • Tracker changed from Task to Story
  • Target version changed from 2018-10-31 sprint to 2018-11-14 Sprint

#2 Updated by Tom Morris about 1 month ago

  • Target version changed from 2018-11-14 Sprint to 2018-11-28 Sprint

#3 Updated by Tom Clegg 29 days ago

Comments from a quick and incomplete look:

keepweb-host and keepweb-port should not be in the client config. The library needs to get these from the discovery document at runtime.

Should default to configuring with the usual set of environment vars at runtime: ARVADOS_API_HOST, ARVADOS_API_HOST_INSECURE, ARVADOS_API_TOKEN (HOST can be either host:port, or just host, implying :443)

Should change "Authorization: OAuth2 %s" to "Authorization: Bearer %s"

The Group class appears to represent a ContainerRequest.

Are the models auto-generated?

Could CollectionList, GroupList, etc. be implemented as a generic -- ResourceList<Collection>, etc.?

In ArvadosFacade
  • group_class of a subproject is "project", not "sub-project"
  • showGroups...() should either list all groups regardless of class, or should be called showProjects...()
  • when looking up collections in a specific project, filters should be "owner_uuid","=",uuid -- not "owner_uuid","like",uuid
  • comment says "...contain passed String in their name" but that's not true -- passing "a" will not match "abc" -- should clarify that the passed string is used as a "like" pattern, not a substring

It looks like "download file from Keep" uses the webdav service, but "upload" uses the lower level logic.

In principle the lower-level Keep client logic can achieve better performance than webdav, but not the way it's implemented here -- if I'm reading correctly, it makes two copies of the source data in the local filesystem before even starting to send anything to Keep. I expect this would work well for lots of small files, but would not work at all for large files.

In order to be a viable alternative to webdav, the uploader...
  • must be able to upload a 64G file without consuming 64G of temp space.
  • must be pipelined, at least to the block level (don't read the entire input file before starting to send the first 64M chunk).
  • (if it must use temp files) must use a tempdir strategy that works with multiple processes and users (e.g., mkstemp in a private tempdir or a sticky shared tempdir), must use umask 077 when creating temp files, and must clean up its temp files in both success & error cases.
  • should use the defaultCollectionReplication value found in the discovery document at runtime as the default # copies, if not overridden by the application.

There are lots of things to get right (and maintain) in a low-level keep client. It might be better to stick with webdav here. Even the lots-of-small-files case can be addressed by doing concurrent uploads, if we add some support on the webdav side to prevent concurrent updates from clobbering one another.

Commit message and build.gradle claim this is 2.0.0. Surely it's 0.x.y? In any case it should be no higher than the Arvados release it was built with/for, so at some point we can start including it in Arvados releases.

#4 Updated by Tom Clegg 28 days ago

We'll need an integration test suite:
  • Provide a recipe for installing all prerequisites on a stock debian 9 system (docker run -it debian:9)
  • Provide a command line that runs all tests and exits zero IFF all tests pass
  • Tests use ARVADOS_API_HOST env var, which will point to a testing server (typically looks like "0.0.0.0:12345"), and obey ARVADOS_API_HOST_INSECURE env var
  • Tests use ARVADOS_API_TOKEN env var as an admin token

#5 Updated by Tom Clegg 18 days ago

  • Status changed from New to In Progress

#6 Updated by Tom Clegg 16 days ago

  • Assigned To deleted (Tom Clegg)
  • Target version changed from 2018-11-28 Sprint to 2018-12-12 Sprint

#7 Updated by Ward Vandewege 16 days ago

  • Target version changed from 2018-12-12 Sprint to 2018-11-28 Sprint

#8 Updated by Tom Clegg 16 days ago

  • Target version changed from 2018-11-28 Sprint to 2018-12-12 Sprint

#9 Updated by Tom Morris 16 days ago

  • Assigned To set to Tom Clegg

#10 Updated by Tom Morris 16 days ago

  • Assigned To changed from Tom Clegg to Tom Morris

#11 Updated by Tom Morris 2 days ago

  • Target version changed from 2018-12-12 Sprint to 2018-12-21 Sprint

Also available in: Atom PDF