A picture is worth a thousand words

Arvados sprint updates for November: real-time job CPU and I/O stats, improved provenance graphs, and better pipeline composition and management tools.
Added by Tim Pierce over 6 years ago

Real-time job CPU and I/O graphs
We've just wrapped up another engineering sprint at Curoverse -- 35 bugs fixed and over a dozen major new features -- and are pretty excited to show you some of the new things you can do with Arvados.

Some of our most exciting new user interface features include:

  • Real-time CPU and I/O graphs for running jobs. Now, while you're watching the status of a running job, you can also see a graph of the job's CPU and I/O activity update in real time.
  • The new arv-run command provides a convenient shell-like syntax for composing and launching Arvados pipelines. Creating a new pipeline to run the same command in parallel on hundreds or thousands of input files is as simple as:
    arv-run grep -H -n ATTGGAGGAAAGATGAGTGAC -- *.fastq

    A full provenance graph for a completed pipeline
  • A file-like I/O interface for collections in the Python SDK. Within a pipeline, opening and reading files in a collection now uses a pattern that will feel very natural and familiar to Python programmers:
    c = arvados.CollectionReader(collection_id)
    with c.open(input_path) as infile:
      for line in infile:
  • Infinite scroll for the pipeline view page
  • As-you-type filename filtering in the collection view
  • Improved formatting for the provenance graph.

Under the hood, we've added loads of internal improvements to make the Arvados site administrator's job easier:

Provenance graph (detail)
  • Rendezvous hashing for Keep ensures that Keep clients and proxies store blocks evenly across Keep servers, and permits adding new Keep servers to an existing cluster without substantially degrading performance.
  • Better timeout handling in the Python SDK improves latency for Keep requests.
  • Consistent log formatting for the Keep server logs for easier automated analysis.
  • Many improvements to our new Node Manager:
    • Google Compute Engine support to give you more options for running compute nodes in the cloud.
    • Administrators can specify a minimum number of compute nodes to keep alive at all times
    • New nodes can be brought up automatically when all existing ones are busy.

We've already launched our next engineering sprint and are fast adding new features to Arvados, but we're looking forward to your feedback! As always, please feel free to check out Arvados from Github if you want to try it out, or get in touch with us in email or on IRC!

provenance_graph_full.png (175 KB) provenance_graph_full.png A full provenance graph for a completed pipeline Tim Pierce, 11/21/2014 07:43 PM Delete
realtime_job_graphs_2014-11-19.png (319 KB) realtime_job_graphs_2014-11-19.png Real-time job CPU and I/O graphs Tim Pierce, 11/21/2014 07:43 PM Delete
provenance_graph_detail.png (53.5 KB) provenance_graph_detail.png Provenance graph (detail) Tim Pierce, 11/21/2014 07:43 PM Delete