Project

General

Profile

Storing and Organizing Data » History » Version 31

Tom Clegg, 05/07/2014 02:19 PM

1 7 Tom Clegg
h1. Storing and Organizing Data
2 3 Tom Clegg
3 7 Tom Clegg
Rough demo outline
4 3 Tom Clegg
5 8 Tom Clegg
# Automatic ingest from a POSIX directory to Keep
6 24 Tom Clegg
#* Ingestor's access to staging area (could be remote NFS or sshfs mount) is arranged ahead of time
7
#* 3rd-party's access to staging area is arranged ahead of time
8 25 Tom Clegg
#* Ingestor runs in a screen session. Command line parameters provide project (group/folder) ID and a tag that indicates "this is for *me* to ingest".
9 1
#* Someone ("3rd-party") uploads some files to the staging area via SFTP or whatever
10 24 Tom Clegg
#* 3rd-party does an API call to "ingest-notify app". This might be a short bash script culminating in a curl command. In the API call, the 3rd-party provides a label (e.g., a sample ID) and a list of files, checksums, and an arbitrary "properties" hash containing whatever the 3rd-party wants.
11
#* Ingest-notify app generates a "data in staging area is ready to ingest" event via API server.
12
#* Ingestor waits of a "data in staging area is ready to ingest" notification via API server.
13
#* Ingestor reads the data from the staging area and writes it into Keep (creates one collection per API call made by 3rd-party).
14 26 Tom Clegg
#* Ingestor (or arv-put on behalf of ingestor?) makes API calls while working, to indicate progress (bytes done/todo). @arvados.v1.logs.create(object_uuid=uuid_of_upload_object)@
15 9 Tom Clegg
#* In Workbench the imported Datasets appear as Collections in the designated project
16 24 Tom Clegg
#* After data has been copied into Keep, ingestor deletes the files from the staging area (if @--delete-after@ flag given).
17 17 Tom Clegg
...
18
# My data gets into the right project as specified by the uploader (API call)
19
#* How is the staging-area ↔ project mapping specified, and how/where is it encoded/stored?
20
...
21 1
# Subscribe to notifications (by email and/or Workbench dashboard): when files start/finish uploading; when files are shared with customer; when files are downloaded by third party
22 27 Tom Clegg
#* For now, use existing Logs table + automatic logging of create/update/delete operations + "progress" event from arv-put (see above)
23
#* "Show project" page shows recent activity: one progress bar for each unfinished upload, one entry for each start/finish event.
24 28 Tom Clegg
#* Dashboard page shows recent activity from all of my projects.
25 17 Tom Clegg
...
26 1
# Move/copy collections between projects (Project RX1234, or Customer X’s files), tag them in destination project with the appropriate string (e.g., sample ID) -- defaulting to existing tag used in source project (e.g., provided at time of upload).
27 17 Tom Clegg
#* UI for presenting Groups as Projects/Folders: create, view, rename, share, delete
28
#* UI for copying/moving objects between folders
29
#* How to avoid confusion about "is this one object in two places, or are there two objects?" Note GDocs has a bit of both, "My Drive" / "Shared with me" vs. regular folders
30
...
31 29 Tom Clegg
# Share project with other users/groups
32
...
33 1
# “Anyone with this secret link can view/download” mode. Enable, disable, change magic link. Use cases: browser + “wget -r”.
34
#* Perhaps the secret in the secret link is an ApiClientAuthorization token, belonging to the person creating the link, scoped to a single project/collection
35 17 Tom Clegg
#* How do we implement "Anonymous user, not logged in"?
36
...
37 6 Tom Clegg
# See log/overview of who has accessed your shared data (incl. “anonymous user” if using secret-link-to-share); when shared/unshared; when each upload started/finished -- for a single project, and across all projects
38 23 Tom Clegg
...
39 30 Tom Clegg
# Pilot alternate Workbench group/dashboard view
40 23 Tom Clegg
...
41 31 Tom Clegg
42
43
h2. Retrospective notes
44
45
* Went well - still some merge-race at the end
46
* Lots of branches going in
47
* Not a lot of merge conflicts
48
* Big spec change (rejecting "ingestor" story in favor of future "remote arv-put")
49
* Some in-sprint deployment dependency stuff (crunch+docker, websockets)
50
* Please tag commits with story numbers. Currently "refs #1234" works well because redmine understands it.