Story #12706

[SDK] R SDK support for Collections

Added by Peter Amstutz over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
01/17/2018
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-

Description

As a first step, the R SDK should allow me to allow to find collections and files in Keep using filtering on metadata, load the files into R, process them and then write the results back to a collection.

For this, we will provide a high level API. The low-level Arvados API access can be provided either by googleAuthR (as described in #11876) or by hand. If not using googleAuthR, the low-level API should not be accessible by the user, so that we can replace it with an auto-generated API later.

High level requirements:

  • User can get a specific collection by UUID or portable_data_hash (PDH).
  • User can get a list of collections, with standard Arvados filters.
  • User can create a new, empty collection in a specific project (project is owner_uuid)
  • Collection object supports these operations (using WebDAV unless otherwise noted)
    • Update collection name (via Arvados API)
    • Open a file or directory that already exists and get a File or Directory object
    • Read the listing of a Directory
    • Get size of a file
    • Read the contents of a File. API should support reading a portion of the file at a certain offset and length
    • Put some text or bytes to file (replaces entire file)
    • Create a new File object under a certain path
    • Delete a File under a certain path
    • Move/rename a file or directory from one path to another within the same collection

If such a thing exists, implement R equivalent of "file-like objects" so that open Collection File objects can be used as input to R functions.

Writable WebDAV support is in progress and should be available soon. Start by working on Arvados API access and reading from WebDAV.


Subtasks

Task #12751: ReviewResolvedPeter Amstutz


Related issues

Related to Arvados - Story #11876: [R SDK] Create a Bioconductor/R SDKClosed06/20/2017

History

#1 Updated by Peter Amstutz over 2 years ago

  • Description updated (diff)
  • Assigned To set to Fuad Muhic

#2 Updated by Peter Amstutz over 2 years ago

  • Related to Story #11876: [R SDK] Create a Bioconductor/R SDK added

#3 Updated by Tom Clegg over 2 years ago

re "file-like objects": I take this to mean we want something like

f = collection.open("foo/bar.txt")
f.write("baz")
f.close()

...but we do not (yet) need to optimize away the webdav round-trips by having an in-memory representation of a collection's directory structure. Is that correct?

#4 Updated by Peter Amstutz over 2 years ago

Tom Clegg wrote:

re "file-like objects": I take this to mean we want something like

[...]

...but we do not (yet) need to optimize away the webdav round-trips by having an in-memory representation of a collection's directory structure. Is that correct?

Yes. But we should probably look at the R standard library for working with files looks like to see what the expectations are.

#5 Updated by Peter Amstutz over 2 years ago

WebDAV support also requires discovering the address of the keep-web server, see https://dev.arvados.org/issues/11876#note-18

#6 Updated by Tom Morris over 2 years ago

  • Target version changed from 2017-12-06 Sprint to 2017-12-20 Sprint

#7 Updated by Tom Morris over 2 years ago

  • Target version changed from 2017-12-20 Sprint to 2018-01-17 Sprint

#8 Updated by Peter Amstutz over 2 years ago

  • Status changed from New to Resolved

Also available in: Atom PDF