Project

General

Profile

Actions

Idea #14382

closed

Review Java SDK PR

Added by Tom Morris over 5 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
High
Assigned To:
Category:
-
Target version:
Start date:
Due date:
Story points:
-
Release relationship:
Auto

Description

Please do a design and code review of the PR for the new Java SDK in the arvados-sdk-java repo

Actions #1

Updated by Tom Morris over 5 years ago

  • Tracker changed from Task to Idea
  • Target version changed from 2018-10-31 sprint to 2018-11-14 Sprint
Actions #2

Updated by Tom Morris over 5 years ago

  • Target version changed from 2018-11-14 Sprint to 2018-11-28 Sprint
Actions #3

Updated by Tom Clegg over 5 years ago

Comments from a quick and incomplete look:

keepweb-host and keepweb-port should not be in the client config. The library needs to get these from the discovery document at runtime.

Should default to configuring with the usual set of environment vars at runtime: ARVADOS_API_HOST, ARVADOS_API_HOST_INSECURE, ARVADOS_API_TOKEN (HOST can be either host:port, or just host, implying :443)

Should change "Authorization: OAuth2 %s" to "Authorization: Bearer %s"

The Group class appears to represent a ContainerRequest.

Are the models auto-generated?

Could CollectionList, GroupList, etc. be implemented as a generic -- ResourceList<Collection>, etc.?

In ArvadosFacade
  • group_class of a subproject is "project", not "sub-project"
  • showGroups...() should either list all groups regardless of class, or should be called showProjects...()
  • when looking up collections in a specific project, filters should be "owner_uuid","=",uuid -- not "owner_uuid","like",uuid
  • comment says "...contain passed String in their name" but that's not true -- passing "a" will not match "abc" -- should clarify that the passed string is used as a "like" pattern, not a substring

It looks like "download file from Keep" uses the webdav service, but "upload" uses the lower level logic.

In principle the lower-level Keep client logic can achieve better performance than webdav, but not the way it's implemented here -- if I'm reading correctly, it makes two copies of the source data in the local filesystem before even starting to send anything to Keep. I expect this would work well for lots of small files, but would not work at all for large files.

In order to be a viable alternative to webdav, the uploader...
  • must be able to upload a 64G file without consuming 64G of temp space.
  • must be pipelined, at least to the block level (don't read the entire input file before starting to send the first 64M chunk).
  • (if it must use temp files) must use a tempdir strategy that works with multiple processes and users (e.g., mkstemp in a private tempdir or a sticky shared tempdir), must use umask 077 when creating temp files, and must clean up its temp files in both success & error cases.
  • should use the defaultCollectionReplication value found in the discovery document at runtime as the default # copies, if not overridden by the application.

There are lots of things to get right (and maintain) in a low-level keep client. It might be better to stick with webdav here. Even the lots-of-small-files case can be addressed by doing concurrent uploads, if we add some support on the webdav side to prevent concurrent updates from clobbering one another.

Commit message and build.gradle claim this is 2.0.0. Surely it's 0.x.y? In any case it should be no higher than the Arvados release it was built with/for, so at some point we can start including it in Arvados releases.

Actions #4

Updated by Tom Clegg over 5 years ago

We'll need an integration test suite:
  • Provide a recipe for installing all prerequisites on a stock debian 9 system (docker run -it debian:9)
  • Provide a command line that runs all tests and exits zero IFF all tests pass
  • Tests use ARVADOS_API_HOST env var, which will point to a testing server (typically looks like "0.0.0.0:12345"), and obey ARVADOS_API_HOST_INSECURE env var
  • Tests use ARVADOS_API_TOKEN env var as an admin token
Actions #5

Updated by Tom Clegg over 5 years ago

  • Status changed from New to In Progress
Actions #6

Updated by Tom Clegg over 5 years ago

  • Assigned To deleted (Tom Clegg)
  • Target version changed from 2018-11-28 Sprint to 2018-12-12 Sprint
Actions #7

Updated by Ward Vandewege over 5 years ago

  • Target version changed from 2018-12-12 Sprint to 2018-11-28 Sprint
Actions #8

Updated by Tom Clegg over 5 years ago

  • Target version changed from 2018-11-28 Sprint to 2018-12-12 Sprint
Actions #9

Updated by Tom Morris over 5 years ago

  • Assigned To set to Tom Clegg
Actions #10

Updated by Tom Morris over 5 years ago

  • Assigned To changed from Tom Clegg to Tom Morris
Actions #11

Updated by Tom Morris over 5 years ago

  • Target version changed from 2018-12-12 Sprint to 2018-12-21 Sprint
Actions #12

Updated by Tom Morris over 5 years ago

  • Target version changed from 2018-12-21 Sprint to 2019-01-16 Sprint
Actions #13

Updated by Tom Morris over 5 years ago

  • Target version changed from 2019-01-16 Sprint to 2019-01-30 Sprint
Actions #14

Updated by Tom Morris about 5 years ago

  • Target version changed from 2019-01-30 Sprint to 2019-02-13 Sprint
Actions #15

Updated by Tom Morris about 5 years ago

  • Target version changed from 2019-02-13 Sprint to 2019-02-27 Sprint
Actions #16

Updated by Tom Morris about 5 years ago

  • Target version changed from 2019-02-27 Sprint to 2019-03-13 Sprint
Actions #17

Updated by Peter Amstutz about 5 years ago

  • Assigned To changed from Tom Morris to Peter Amstutz
Actions #18

Updated by Tom Morris about 5 years ago

  • Release set to 15
Actions #19

Updated by Peter Amstutz about 5 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF