Project

General

Profile

API HistoricalForcasting data for CR » History » Version 2

Nico César, 06/18/2020 08:44 AM

1 1 Nico César
h1. API Historical/Forcasting data for CR
2
3
Goal: create a pipeline forecaster and visualization for historical data. This should expose APIs that can be used in the ContainerRequest visualization and 
4
also could be use to provide extra information for the current running CR
5
6
Glossary:
7
8 2 Nico César
* Checkpoint: is a generic name that currently corresponds to a step name followed by a "family". The reason behind having a family is to cluster all the executions (including the scattered steps that have the pattern: name_2, name_3,..., name_229) for runs that the token has access to with similar characteristics.
9 1 Nico César
10 2 Nico César
* Family: A common name like "gatk" or "haplotypecaller" can be used as a step name. In practice 2 executions would create 2 different populations (in terms of checkpoints) depending on the parameters of the CommandLineTool
11 1 Nico César
12 2 Nico César
* Datapoint: a concrete data that can be plotted as historical data. Currently we're bounding together the container request and the associated container to have a unified view of the times involved. This should not get confused with forecast data since can be used separately
13 1 Nico César
14
h2. API 
15
16
GET /container-request/aaaaa-xvhdp-123456789abc/checkpoints
17
18
Output:
19
20
<pre>
21
{
22
  "checkpoints": [
23
    {
24
      "name": "merge-tilelib@family22",
25
      "dependencies": [
26
        "createsglf"
27
      ],
28
      "time_average": 8254.534873,
29
      "time_count": 1,
30
      "time_min": 8254.534873,
31
      "time_min_comment": "duration:merge-tilelib#su92l-dz642-cc7799yfwi5jmd9",
32
      "time_max": 8254.534873,
33
      "time_max_comment": "duration:merge-tilelib#su92l-dz642-cc7799yfwi5jmd9"
34
    },
35
    {
36
      "name": "createsglf@family22",
37
      "dependencies": [],
38
      "time_average": 4741.290203,
39
      "time_count": 58,
40
      "time_min": 82.138309,
41
      "time_min_comment": "duration:createsglf_57#su92l-dz642-3u3g4bq1yh4pqje",
42
      "time_max": 5818.898387,
43
      "time_max_comment": "duration:createsglf_8#su92l-dz642-8d094xhqciin5m2"
44
    },
45
...
46
],
47
"time_average": <average time for the CR family>,
48
</pre>
49
50
51
GET /container-request/aaaaa-xvhdp-123456789abc/datapoints
52
53
Output:
54
55
<pre>
56
[
57
  {
58
    "step_name": "createsglf",
59
    "start_1": "2020-01-15 19:49:34.213 +0000",
60
    "end_1": "2020-01-15 21:19:39.001 +0000",
61
    "start_2": "2020-01-15 19:54:44.864 +0000",
62
    "end_2": "2020-01-15 21:19:39.001 +0000",
63
    "reuse": false,
64
    "status": "completed",
65
    "legend": "<p>createsglf</p><p>Container Request: <a href=\"https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-zfc3ffxk3slmkzv\">su92l-xvhdp-zfc3ffxk3slmkzv</a></p><p>Container duration: 1h24m54.137122s\n</p>"
66
  },
67
  {
68
    "step_name": "createsglf_2",
69
    "start_1": "2020-01-15 19:49:34.288 +0000",
70
    "end_1": "2020-01-15 21:29:11.399 +0000",
71
    "start_2": "2020-01-15 19:54:51.275 +0000",
72
    "end_2": "2020-01-15 21:29:11.399 +0000",
73
    "reuse": false,
74
    "status": "completed",
75
    "legend": "<p>createsglf_2</p><p>Container Request: <a href=\"https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-py99va9hnvuxzp5\">su92l-xvhdp-py99va9hnvuxzp5</a></p><p>Container duration: 1h34m20.123849s\n</p>"
76
  },
77
....
78
</pre>
79
80
GET /container-request/aaaaa-xvhdp-123456789abc/workflow-dot
81
82
Output:
83
84
<pre>
85
digraph cwlgraph {
86
rankdir=LR;
87
graph [compound=true];
88
89
subgraph cluster_0 {
90
label="#createcgf-wf.cwl";
91
node [style=filled];
92
shape=box
93
style="filled";
94
color="#dddddd";
95
"#createcgf-wf.cwl" [ label = "#createcgf-wf.cwl", style = invis ];
96
....
97
</pre>
98
99
100
h2. Frontend
101
102
Dot file can be rendered with  https://domparfitt.com/graphviz-react/ we already tested some big files 
103
104
h2. Schema and queries on the postgres DB 
105
106
TODO: Outline the transformation from the current local leveldb cache to some per-user caching table.  
107
TODO: list the queries to INSERT and SELECT the data for a particular checkpoint. 
108
109
110
h2. Permissions
111
112
One concern is permissions. we'll behave similar to everything else in Arvados: if it's a CR that the token doesn't have access to, then is a 404. This includes the idea of "sumarized data" as in the historical time and prices of the CRs