Project

General

Profile

Idea #11876

Updated by Tom Morris over 6 years ago

As an R programmer I'd like to have the ability to query the Arvados APIs directly from R using a package which integrates well with and is published with the rest of the Bioconductor packages. The SDK should ideally allow me to do everything a Python programmer can do using the Python SDK, but as a first step should allow me to allow to find collections and files in Keep using filtering on metadata, load the files into R, process them and then write the results back to a collection. As an optional second stage, it'd be useful to be able to submit CWL jobs and monitor their progress. 

 The SDK should work on Windows, OS X, and Linux, which implies that depending on arv-mount Possible starting points for file reading design inspiration include Illumina's BaseSpaceR package and writing is not an acceptable option. 

 There are two relevant packages, SevenBridges "sevenbridges" and Illumina's "BaseSpaceR", which could be used for a) design ideas and b) starting points for implementation (they are both Apache licensed). sevenbridges package. 

 http://bioconductor.org/packages/release/bioc/html/BaseSpaceR.html 
 https://developer.basespace.illumina.com/docs/content/documentation/sdk-samples/r-sdk-overview 
 http://bioconductor.org/packages/release/bioc/html/sevenbridges.html 
 https://github.com/sbg/sevenbridges-r 
 http://bioconductor.org/packages/release/bioc/html/BaseSpaceR.html 
 https://developer.basespace.illumina.com/docs/content/documentation/sdk-samples/r-sdk-overview 

 A potential supporting component might be googleAuthR http://code.markedmondson.me/googleAuthR/ which could be used in a similar way to googleComputeEngineR https://cloudyr.github.io/googleComputeEngineR/ and other packages which are layered on it. 

 ----------- 

 Design sketch - create new client library for API server functions, but use HTTP or arv-mount access to get data rather than implementing Keep support. googleAuthR can be used for API generation and response parsing, but needs to be reworked to not assume Google authentication or endpoints. Instead of the OAuth2 dance, it needs to be able to use an API token. 

 The full Arvados API currently consists of 24 object types (plus 24 list types for those objects) and 223 methods. There are create, delete, destroy, get, list, show, and update methods for each object type and then another 26 methods which are used once or twice each. 

 Test comparisons: Python Java - 8K lines in 472 20 tests, Golang - 4K lines in 213 tests,  

 Python - 8K lines in 472 tests 

 Object Types to be supported in initial version: 

 ApiClient  
 ApiClientAuthorization  
 AuthorizedKey  
 Collection  
 Container  
 ContainerRequest  
 Group (user groups & projects) 
 Link  
 Log ? 
 User ? 
 Workflow  

 Object Types not needed initially: 

 ApiClient  
 ApiClientAuthorization  
 AuthorizedKey  
 Human  
 Job  
 JobTask  
 KeepDisk  
 KeepService  
 Link  
 Log  
 Node  
 PipelineInstance  
 PipelineTemplate  
 Repository  
 Specimen  
 Trait  
 User  
 UserAgreement  
 VirtualMachine  


  
 Workflow  

 Miscellaneous Arvados API methods: 

    1 accessible  
    1 activate  
    1 auth  
    2 cancel  
    1 contents  
    1 create_system_auth  
    3 current  
    1 get_all_logins  
    1 get_all_permissions  
    1 get_permissions  
    2 lock  
    1 logins  
    1 new  
    2 ping  
    1 provenance  
    1 queue  
    1 queue_size  
    1 setup   
    1 sign  
    1 signatures  
    1 system  
    1 trash  
    1 unlock  
    1 unsetup  
    1 untrash  
    1 used_by  

Back