Version 1 - History - Arvados Summit Fall 2013 Breakout 1 - Arvados

1

Jonathan Sheffi

h1. Arvados Summit Fall 2013 Breakout 1

2

3

h2. User stories (Jonathan & Ward facilitating)

4

5

* As an admin, if I change my DB structure, I want Arvados to help me update the config

6

* As an admin, I want to see the mapping of another dataset to my own

7

* When I run a job, I want to be able to work as Draft or Final/Real results

8

* As a consumer of genomic data, I want to visualize my data

9

* As a commercial leader of a clinical lab, I want to be able to trace quote to cash for diagnostic tests

10

* I want to be able to know where any file is.

11

* As a patient or participant, I want to be able to export my data to another study.

12

* As someone who works with data, I want the genotypic and phenotypic data I use to conform to a standard ontology.

13

* As a clinician, I want to quantify the uncertainty of the data & analysis underlying my report, so that I and the patient understand the clinical decision more fully.

14

* As a clinician, I want to run the same experiment on multiple data sets.

15

* As a lab director and oncologist, I want exome raw reads to called variants to take 15 minutes.

16

* As a data miner, I want to be able to query *all* public data without downloading it.

17

* As a researcher, I want to be able to set up a standard pipeline for a particular type of data set.

18

* As an informatician, I want all my data to conform to a standard format so that I can analyze across multiple data sets.

19

* As a clinician, I want to collect & track inbound case data, such as referral letters, ICD-9 diagnosis codes, case summaries, consents, medical reports, and insurance pre-verifications.

20

* As an informatician, I want to be able to track & manage ICD-9/10 data.

21

* As a lab director or clinician, I want to share a report with another clinician at another institution.

22

* As a clinician, if I discover a mutation, I want to share that with an analytical tool or aggregator of data (e.g. GeneInsight).

23

* As a user, I want to associate ‘keepalive’ metadata to my intermediate data

24

* As Arvados, I record profiling information that data expiration for intermediate data can be based on

25

* As an informatician, I can easily manipulate VCF files in parallel (as easy as GNV parallel)

26

* As a compliance officer, I have structured insight into the consents for my data

27

* As a researcher, I want to be able to collaborate on big datasets without having to copy them.

28

* As an informatician, I want to associate metadata with (a section of) my pipelines.

29

* As a new user, I can browse pipelines for metadata, see how ‘popular’ datasets and pipelines are [‘social features’]

30

31

h2. Technical discussion (Tom facilitating)

32

33

* Test for functionality

34

* Documentation

35

** What can Keep do?

36

** High-level functional description

37

** How would one replace an existing storage system with Keep?

38

** How to migrate?

39

** How to MapReduce?

40

** Examples

41

* Databases as input to job

42

* Permissions

43

* Audit trail

44

* Prioritizing jobs - squeaky wheel

45

* Monitoring - activity & status

46

* Checkpointing

47

* Self-starter kit

Project

General

Profile

Arvados

Arvados Summit Fall 2013 Breakout 1 » History » Version 1