Project

General

Profile

Actions

API HistoricalForcasting data for CR » History » Revision 2

« Previous | Revision 2/6 (diff) | Next »
Nico César, 06/18/2020 08:44 AM


API Historical/Forcasting data for CR

Goal: create a pipeline forecaster and visualization for historical data. This should expose APIs that can be used in the ContainerRequest visualization and
also could be use to provide extra information for the current running CR

Glossary:

  • Checkpoint: is a generic name that currently corresponds to a step name followed by a "family". The reason behind having a family is to cluster all the executions (including the scattered steps that have the pattern: name_2, name_3,..., name_229) for runs that the token has access to with similar characteristics.
  • Family: A common name like "gatk" or "haplotypecaller" can be used as a step name. In practice 2 executions would create 2 different populations (in terms of checkpoints) depending on the parameters of the CommandLineTool
  • Datapoint: a concrete data that can be plotted as historical data. Currently we're bounding together the container request and the associated container to have a unified view of the times involved. This should not get confused with forecast data since can be used separately

API

GET /container-request/aaaaa-xvhdp-123456789abc/checkpoints

Output:

{
  "checkpoints": [
    {
      "name": "merge-tilelib@family22",
      "dependencies": [
        "createsglf" 
      ],
      "time_average": 8254.534873,
      "time_count": 1,
      "time_min": 8254.534873,
      "time_min_comment": "duration:merge-tilelib#su92l-dz642-cc7799yfwi5jmd9",
      "time_max": 8254.534873,
      "time_max_comment": "duration:merge-tilelib#su92l-dz642-cc7799yfwi5jmd9" 
    },
    {
      "name": "createsglf@family22",
      "dependencies": [],
      "time_average": 4741.290203,
      "time_count": 58,
      "time_min": 82.138309,
      "time_min_comment": "duration:createsglf_57#su92l-dz642-3u3g4bq1yh4pqje",
      "time_max": 5818.898387,
      "time_max_comment": "duration:createsglf_8#su92l-dz642-8d094xhqciin5m2" 
    },
...
],
"time_average": <average time for the CR family>,

GET /container-request/aaaaa-xvhdp-123456789abc/datapoints

Output:

[
  {
    "step_name": "createsglf",
    "start_1": "2020-01-15 19:49:34.213 +0000",
    "end_1": "2020-01-15 21:19:39.001 +0000",
    "start_2": "2020-01-15 19:54:44.864 +0000",
    "end_2": "2020-01-15 21:19:39.001 +0000",
    "reuse": false,
    "status": "completed",
    "legend": "<p>createsglf</p><p>Container Request: <a href=\"https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-zfc3ffxk3slmkzv\">su92l-xvhdp-zfc3ffxk3slmkzv</a></p><p>Container duration: 1h24m54.137122s\n</p>" 
  },
  {
    "step_name": "createsglf_2",
    "start_1": "2020-01-15 19:49:34.288 +0000",
    "end_1": "2020-01-15 21:29:11.399 +0000",
    "start_2": "2020-01-15 19:54:51.275 +0000",
    "end_2": "2020-01-15 21:29:11.399 +0000",
    "reuse": false,
    "status": "completed",
    "legend": "<p>createsglf_2</p><p>Container Request: <a href=\"https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-py99va9hnvuxzp5\">su92l-xvhdp-py99va9hnvuxzp5</a></p><p>Container duration: 1h34m20.123849s\n</p>" 
  },
....

GET /container-request/aaaaa-xvhdp-123456789abc/workflow-dot

Output:

digraph cwlgraph {
rankdir=LR;
graph [compound=true];

subgraph cluster_0 {
label="#createcgf-wf.cwl";
node [style=filled];
shape=box
style="filled";
color="#dddddd";
"#createcgf-wf.cwl" [ label = "#createcgf-wf.cwl", style = invis ];
....

Frontend

Dot file can be rendered with https://domparfitt.com/graphviz-react/ we already tested some big files

Schema and queries on the postgres DB

TODO: Outline the transformation from the current local leveldb cache to some per-user caching table.
TODO: list the queries to INSERT and SELECT the data for a particular checkpoint.

Permissions

One concern is permissions. we'll behave similar to everything else in Arvados: if it's a CR that the token doesn't have access to, then is a 404. This includes the idea of "sumarized data" as in the historical time and prices of the CRs

Updated by Nico César almost 4 years ago · 2 revisions