Review Java SDK PR
Please do a design and code review of the PR for the new Java SDK in the arvados-sdk-java repo
Comments from a quick and incomplete look:
keepweb-host and keepweb-port should not be in the client config. The library needs to get these from the discovery document at runtime.
Should default to configuring with the usual set of environment vars at runtime: ARVADOS_API_HOST, ARVADOS_API_HOST_INSECURE, ARVADOS_API_TOKEN (HOST can be either host:port, or just host, implying :443)
Should change "Authorization: OAuth2 %s" to "Authorization: Bearer %s"
The Group class appears to represent a ContainerRequest.
Are the models auto-generated?
Could CollectionList, GroupList, etc. be implemented as a generic -- ResourceList<Collection>, etc.?In ArvadosFacade
- group_class of a subproject is "project", not "sub-project"
- showGroups...() should either list all groups regardless of class, or should be called showProjects...()
- when looking up collections in a specific project, filters should be "owner_uuid","=",uuid -- not "owner_uuid","like",uuid
- comment says "...contain passed String in their name" but that's not true -- passing "a" will not match "abc" -- should clarify that the passed string is used as a "like" pattern, not a substring
It looks like "download file from Keep" uses the webdav service, but "upload" uses the lower level logic.
In principle the lower-level Keep client logic can achieve better performance than webdav, but not the way it's implemented here -- if I'm reading correctly, it makes two copies of the source data in the local filesystem before even starting to send anything to Keep. I expect this would work well for lots of small files, but would not work at all for large files.In order to be a viable alternative to webdav, the uploader...
- must be able to upload a 64G file without consuming 64G of temp space.
- must be pipelined, at least to the block level (don't read the entire input file before starting to send the first 64M chunk).
- (if it must use temp files) must use a tempdir strategy that works with multiple processes and users (e.g., mkstemp in a private tempdir or a sticky shared tempdir), must use umask 077 when creating temp files, and must clean up its temp files in both success & error cases.
- should use the defaultCollectionReplication value found in the discovery document at runtime as the default # copies, if not overridden by the application.
There are lots of things to get right (and maintain) in a low-level keep client. It might be better to stick with webdav here. Even the lots-of-small-files case can be addressed by doing concurrent uploads, if we add some support on the webdav side to prevent concurrent updates from clobbering one another.
Commit message and build.gradle claim this is 2.0.0. Surely it's 0.x.y? In any case it should be no higher than the Arvados release it was built with/for, so at some point we can start including it in Arvados releases.
- Provide a recipe for installing all prerequisites on a stock debian 9 system (
docker run -it debian:9)
- Provide a command line that runs all tests and exits zero IFF all tests pass
- Tests use ARVADOS_API_HOST env var, which will point to a testing server (typically looks like "0.0.0.0:12345"), and obey ARVADOS_API_HOST_INSECURE env var
- Tests use ARVADOS_API_TOKEN env var as an admin token