Project

General

Profile

Actions

Storing and Organizing Data » History » Revision 22

« Previous | Revision 22/33 (diff) | Next »
Tom Clegg, 04/10/2014 02:03 PM


Storing and Organizing Data

Rough demo outline

  1. Automatic ingest from a POSIX directory to Keep
    • Access to existing staging area (e.g., remote NFS share) is arranged ahead of time as an admin/setup task
    • Optional(?) User can manage staging areas hosted inside Arvados
    • Someone ("3rd-party") uploads some files to the staging area via SFTP or whatever
    • 3rd-party does an API call to {something - ingestor app? directly to arvados api endpoint?}. This might be a short bash script culminating in a curl command. In the API call, the 3rd-party provides a label (e.g., a sample ID) and a list of files, checksums, and an arbitrary "properties" hash containing whatever the 3rd-party wants.
    • Ingestor daemon reads the data from the staging area and writes it into Keep; creates one collection per API call made by uploader
    • In Workbench the imported Datasets appear as Collections in the designated project
    • After data has been copied into Keep, ingestor deletes the files from the staging area (this had better be configurable!).
      ...
  2. My data gets into the right project as specified by the uploader (API call)
    • How is the staging-area ↔ project mapping specified, and how/where is it encoded/stored?
      ...
  3. Subscribe to notifications (by email and/or Workbench dashboard): when files start/finish uploading; when files are shared with customer; when files are downloaded by third party
    • For now, use existing Logs table + automatic logging of create/update/delete operations
      ...
  4. Move/copy collections between projects (Project RX1234, or Customer X’s files), tag them in destination project with the appropriate string (e.g., sample ID) -- defaulting to existing tag used in source project (e.g., provided at time of upload).
    • UI for presenting Groups as Projects/Folders: create, view, rename, share, delete
    • UI for copying/moving objects between folders
    • How to avoid confusion about "is this one object in two places, or are there two objects?" Note GDocs has a bit of both, "My Drive" / "Shared with me" vs. regular folders
      ...
  5. “Anyone with this secret link can view/download” mode. Enable, disable, change magic link. Use cases: browser + “wget -r”.
    • Perhaps the secret in the secret link is an ApiClientAuthorization token, belonging to the person creating the link, scoped to a single project/collection
    • How do we implement "Anonymous user, not logged in"?
      ...
  6. See log/overview of who has accessed your shared data (incl. “anonymous user” if using secret-link-to-share); when shared/unshared; when each upload started/finished -- for a single project, and across all projects

Updated by Tom Clegg almost 10 years ago · 22 revisions