2022-10-05: Latest blog post Scientific Workflow and Data Management with the Arvados Platform

2023-12-12: Arvados 2.7.1 released!

Arvados

Unified Data and Workflow Management

Arvados is a modern open source platform for managing and processing large biomedical data.

By combining robust data and workflow management capabilities in a single platform, Arvados can organize and analyze petabytes of data and run reproducible and versioned computational workflows. Arvados supports the entire data life cycle, from acquisition to analysis, secure sharing, auditing, and reuse.

Data Management

Manage large biomedical data sets for everything from genomics to imaging. Version and track large data sets including both raw and metadata. Keep data accessible, avoid unnecessary data duplication, and ensure data integrity.

Workflow Engine

Run, manage and monitor scalable, portable, production-ready workflows. Use the Arvados orchestration system with Docker and Singularity to reliably run containerized workflows at scale.

Reproducibility

Track, record, and reproduce complex workflows on large datasets. Reliably re-run and verify previous workflows and retrieve workflow inputs and outputs.

Scale

Perform data analysis at scale, on-premises or in the cloud. Manage petabytes of data and scale on demand to run workflows that use thousands of cores of compute simultaneously.

Flexible Deployment

Run Arvados anywhere — in the cloud on AWS, Azure and GCP, as well as on premises and hybrid clusters. Work on sequestered data and avoid expensive data transfers using multi-cluster federation.

Secure Data and Sharing

Collaborate on projects by selectively and securely sharing data and workflows. Comply with data protection regulations.

With Arvados, bioinformaticians perform computationally intensive data analysis and machine learning, developers create biomedical applications, and IT administrators manage large compute and storage resources

A researcer in front of a monitor, with a DNA symbol in the background

Researchers

Use Arvados to manage large datasets and scale your analysis workflows.

  • Automatically create detailed records of every computation: inputs, outputs, container images, workflow versions, parameters, and compute requirements.
  • Organize all your data, workflows, computational runs, and results into projects.
  • Harness thousands of CPUs and GPUs for data analysis, and track and manage petabytes of data.
  • Tap the knowledge of the global scientific community and enable portability with Common Workflow Language standards.
  • Safely share your datasets and workflows with colleagues and collaborators.
A developer standing in front of a monitor, with a thought bubble above their head, containing a gear and branches

Developers

Use Arvados to productionize and scale workflows, create biomedical applications, and build secure services that integrate with existing infrastructure.

  • Work with petabytes of data with great performance, fault tolerance, versioning, and automatic data integrity checking.
  • Productionize your workflows using containerization, metadata, and dynamic scaling.
  • Track every version of every workflow run and avoid unnecessary re-running of workflow tasks.
  • Integrate, automate, and build on Arvados with your systems through command line tools, REST API and SDKs for Python, Java, Go, Ruby, and R.
  • Avoid being locked into a proprietary system or stuck in a black box by using 100% open source software.
An administrator operating an admin panel on a computer.

System Administrators

Use Arvados to run efficient and secure scalable workflows, provide flexible data management, meet end-user requirements, and manage costs.

  • Run in your own cloud or on-premise infrastructure, backed by both community and commercial support.
  • Ensure compliance with security and regulatory standards with robust data access permissions and support for Single-Sign-On.
  • Lower costs with automatic data duplication,starting/stopping compute nodes on demand, and using preemptible (spot) instances.
  • Track the history and performance of every workflow run on your system and trace detailed usage data back to each individual user.
  • Use 100% open source developed in public, backed by strong commercial support so you will never be locked into a proprietary system.
An executive giving a presentation in front of a monitor, with slides of a person and a pie graph

Executives

Achieve your goals with a proven highly scalable analysis platform while maintaining security and costs.

  • Get a proven, off the shelf solution for your big data and scientific analysis problems.
  • Boost your team with a platform that maximizes productivity by streamlining the process of productionizing, deploying, and scaling data analysis workflows.
  • Ensure security and regulatory compliance by controlling and auditing access to data.
  • Lower costs with automatic data duplication, starting/stopping compute nodes on demand, and using preemptible (spot) instances.
  • Unlock the potential of large, high-value datasets by establishing a data commons or data lake to make scientific data available across your organization.

Pharmaceutical companies, biotech startups, and research institutions are using Arvados today for clinical sequencing, drug development, and diagnostic testing.

A magnifying glass connected to cylindrical pill on the right, and a neural network on the left.

Drug Discovery

Accelerate new drug identification and validation

Pharmaceutical companies are using Arvados to harness big data for therapeutic uses and drug discovery. Arvados helps:

  • Develop reproducible, scalable and efficient processes for big data analysis.
  • Run sophisticated analytical and machine learning workflows using tools such as Alphafold and Tensorflow.
  • Efficiently track, organize and tag datasets using metadata.
Image processing app containing a cell in its center

Biomedical Imaging

Integrate management, analysis, and storage of biomedical data

The increasing volume and variety of biomedical image data brings new challenges for data management, analysis, and storage. Arvados helps:

  • Aggregate and manage petabytes of high-resolution imaging data.
  • Organize and process data through efficient automated workflows reducing error.
  • Create a multifunctional platform that integrates tools to analyze image data and serves interactive viewers while maintaining access control, metadata, and data organization.
A laptop surrounded by DNA helix, neural network, partitioned circle, and a cell sample.

Biomedical Data Commons

Remove silos and support new biomedical discoveries

Arvados helps create biomedical data commons that:

  • Aggregate and search across sequencing, omics, and imaging data hosted on different Arvados clusters.
  • Run scalable, interoperable workflows to analyze and harmonize petabytes of raw data.
  • Leverage FAIR guiding principles including the ability to identify datasets using persistent digital IDs data and metadata.
A microscope with a DNA helix in the background

NGS Sequencing and Clinical Testing

Run production, scalable, clinical workflows with large datasets

Arvados helps clinical labs, biotech start-ups and research institutions:

  • Automate an end-to-end workflow for high throughput sequencing from data collection to analysis.
  • Run production workflows for sequencing and clinical research applications including for whole genomes, exomes, and targeted panels.
  • Implement CAP and CLIA compliant bioinformatics solutions for clinical sequencing and diagnostic testing services.

100% Open Source

All parts of Arvados are 100% free and open software that is developed in public, backed by both strong commercial support and an active community. Join the Arvados community channel for live discussion, Arvados forum for community support, or one of our regular video chats. Check out the development wiki for getting started as a contributor.

Try It Out icon

Try It Out

Take a look at the public Arvados playground, a free-to-use installation of Arvados for evaluation and trial use.

Install icon

Install

Learn how to install Arvados from the installation page in the documentation. Arvados supports AWS, GCP and Azure cloud platforms as well as on-premises installs.

Get Help icon

Get Help

Both community and enterprise support are available for Arvados. Curii Corporation (info@curii.com) provides managed installations as well as commercial support.