May/June development review: Performance across the board

Added by Brett Smith almost 2 years ago

The Arvados development team just concluded another sprint, continuing to build on a lot of our work from last time. The work you share through Arvados is more discoverable than ever thanks to Workbench's new public project listing. This page provides a helpful overview of all the public projects available through a cluster. It's immediately available from any Workbench page, even for folks who aren't logged in, so it's easy for anyone to find and browse the listings. You can see it in action by checking out the public projects on our open beta.

We also continued to improve the performance of Arvados collections. We now have a broad test suite to report how different collection operations perform in the Arvados API server and Workbench. Using this data, we made a few performance optimizations to the API server's collections handling. In the end, we reduced API response times by 35% for most requests. You'll feel the difference whether you're working with data sets through our Python SDK, FUSE driver, or Workbench.

The performance improvements don't end with collections. We put our public beta cloud through some scalability tests by running eight GATK variant caller pipelines in parallel, using GATK Queue to distribute work across multiple compute nodes. This led us to make some configuration changes to help the cluster perform more consistently; logging improvements to help track down issues with jobs at this scale; and a few fixes for corner-case bugs in Crunch's job dispatch code. Ultimately, we demonstrated that the pipeline's run time stayed flat even with this much parallelization—a testament to Arvados' design for scale.

You don't have to take our word for it. If you want to see Arvados' scalability for yourself, sign up for the open beta and run some pipelines. If you run into questions, don't hesitate to get in touch with us by IRC or e-mail.