January development review: Gotta go fast!

Added by Brett Smith about 2 years ago

Arvados can already save researchers a lot of time just by keeping track of their data and analysis work. You never have to waste time backtracking what input data yielded an interesting result, or rerun some preprocessing step.

But we want Arvados to help you get your work done faster even when your project is brand new. This sprint, we improved the performance of several major components so Arvados can answer your questions quicker than ever.

First up, we audited the API server to make sure that it always uses optimized JSON libraries. Ruby's built-in json module is really handy, but many third-party modules are noticeably faster. In our tests, the Oj module can serialize large objects in about half the time. When performance is a critical concern, it's worth switching over—and since the API server's inputs and outputs are all JSON, we were definitely in that boat. This change single-handedly addressed several performance-related issues in our bug tracker.

We also gave our search functionality a shot in the arm by building smarter data indexes for Arvados objects. Now when you're searching for data in Arvados, we can both find a match in an object and build the results list much faster. You'll see the difference whether you're asking the API server for a filtered list, or using Workbench's search functionality.

Speaking of Workbench, we also improved the performance of our collection manifest parsing code. This will be most noticeable when you browse large collections.

Of course, raw performance isn't the only way to help you work faster. The Workbench interface also got some new features to save you time. On your account page, you can now share your code repositories with other users, just like projects. You could always do this through API requests, but it's a lot more convenient to let Workbench fill in the blanks. Now it's easier than ever to give other people the information they need to review and reproduce your results.

Last but not least is error handling. Nobody likes errors, but they're a fact of life. Several Arvados command-line tools gained improved error reporting to help you spend less time tracking down problems, and fix them sooner. If you make a mistake with arv edit, you'll get detailed information about the problem right in your editor so you can jump right to the source and patch it up. Meanwhile, when arv keep tools have trouble sending your request to Keep services, you'll get detailed information about each server's response.

This sprint's development may not have the prettiest screenshots, but they're changes every user can feel, and we're excited for people to try it all out. If you haven't tried our open beta lately, sign up and give it a spin. Or if you'd like to learn more about development, drop us a line by IRC or e-mail.