Cloud Operating Systems and Virtualization » History » Version 2

Anonymous, 04/12/2013 04:37 PM

1 1 Tom Clegg
h1. Cloud Operating Systems and Virtualization
2 1 Tom Clegg
3 2 Anonymous
Arvados is designed to run on a cloud OS. This is a new category of systems that is also sometimes called a cloud management platform. The cloud OS layer provides a number of key services that Arvados uses: 
4 1 Tom Clegg
5 1 Tom Clegg
* *Virtualization* - Allows administrators to create and provision virtual machines from a pool of hardware resources.
6 1 Tom Clegg
7 1 Tom Clegg
* *Networking* - Connectivity among compute nodes, and between compute and storage resources.
8 1 Tom Clegg
9 1 Tom Clegg
* *Self-service Provisioning* - Allows end users to provision resources for themselves.
10 1 Tom Clegg
11 2 Anonymous
* *Administration* - Provides tools for monitoring, managing, and administering clusters. 
12 1 Tom Clegg
13 1 Tom Clegg
* *Block Storage* - In some cloud OS, such as Amazon Web Services, Arvados uses ext4 filesystems on elastic block storage (EBS) volumes instead of physical disks as the backing store for the Keep storage system.
14 1 Tom Clegg
15 1 Tom Clegg
* *User Management* - Management of user accounts and permissions.
16 1 Tom Clegg
17 1 Tom Clegg
h2. Public, Private, and Hybrid Clouds
18 1 Tom Clegg
19 1 Tom Clegg
When people hear the word "cloud," they usually think of public clouds such as Amazon Web Services (AWS) and Google Cloud Platform. However, there is an emerging trend, especially in the biomedical industry, to implement private clouds. A private cloud uses a similar architecture to a public cloud, but it runs on systems owned by an organization in a data center they control. In this sense, cloud computing should be thought of as an architecture rather than simply a hosted service.
20 1 Tom Clegg
21 1 Tom Clegg
A number of features characterize cloud architectures: 
22 1 Tom Clegg
23 1 Tom Clegg
* Horizontally scaling hardware, usually with uniform nodes that combine compute and storage.
24 1 Tom Clegg
25 1 Tom Clegg
* Commodity hardware that significantly lowers costs for large scale computing projects.
26 1 Tom Clegg
27 1 Tom Clegg
* Virtualization, which makes it possible to dynamically allocate computing resources and to isolate applications and users. 
28 1 Tom Clegg
29 1 Tom Clegg
* Distributed computing technologies that scale horizontally across equipment and are elastic, such as block storage, object storage, distributed file systems, and MapReduce.
30 1 Tom Clegg
31 1 Tom Clegg
* Low-latency access to storage, which is achieved by putting storage and compute on the same node and by distributing computations to cores near the storage. This capability is not achieved in all cloud architectures, but it is ideal for I/O-intensive tasks like alignment and variant calling.
32 1 Tom Clegg
33 1 Tom Clegg
A cloud architecture allows IT leaders to take clusters of commodity computing equipment and use them in a wide variety of ways. 
34 1 Tom Clegg
35 1 Tom Clegg
For most informatics teams, the cloud architecture is a departure from the high-performance computing (HPC) model that they are most familiar with. It's a different configuration than the traditional combination of network attached storage (NAS) systems, storage area networks (SAN), and compute clusters with job queueing systems such as Sun Grid Engine.
36 1 Tom Clegg
37 1 Tom Clegg
A cloud architecture provides a variety of advantages for informatics data: 
38 1 Tom Clegg
39 1 Tom Clegg
* Lower total cost of ownership
40 1 Tom Clegg
41 1 Tom Clegg
* Significantly more flexible use of computing resources
42 1 Tom Clegg
43 1 Tom Clegg
* Self-service provisioning of compute and storage resources 
44 1 Tom Clegg
45 1 Tom Clegg
* Easier overall system administration 
46 1 Tom Clegg
47 1 Tom Clegg
* Faster and more efficient scaling 
48 1 Tom Clegg
49 2 Anonymous
A private cloud can be used in conjunction with other computing resources. For example, a private cloud can be used with a public cloud for burstable compute capacity and archival storage. A private cloud could also be integrated with existing NAS storage, using the NAS as a slower storage tier or the other storage systems for archiving. Hybrid approaches work well with the Keep content addressing architecture. No matter where data are stored, client software automatically verifies that it is the correct file by checking a cryptographic digest. A private cloud could also leverage a traditional HPC compute cluster for some jobs. (Currently this is not a high-priority use case for Arvados; development will depend on community demand.) 
50 1 Tom Clegg
51 1 Tom Clegg
h2. DIY Cloud OS 
52 1 Tom Clegg
53 1 Tom Clegg
At the Personal Genome Project (PGP), where we are currently running two Arvados clusters, we have implemented our own cloud OS with a combination of Xen, Ubuntu, and a number of other open source components. (We will be documenting this configuration more thoroughly for organizations that want to replicate it.)
54 1 Tom Clegg
55 1 Tom Clegg
h2. Cloud OS for Private Clouds 
56 1 Tom Clegg
57 1 Tom Clegg
A number of cloud OS solutions have emerged for private clouds. The leading proprietary solution is VMware's vCloud Suite, but there is a lot of momentum around open source efforts as well. The leading open source efforts include OpenStack, Eucalyptus, and CloudStack.
58 1 Tom Clegg
59 1 Tom Clegg
We plan to integrate OpenStack as a cloud OS to more easily run Arvados in a private cloud environment. Some of the core OpenStack components such as the Swift object file store and the Nebula computation system are not needed by Arvados: these subsystems are replaced by Arvados services that are optimized for biomedical data. However, Arvados will take advantage of the security, virtualization, and overall cluster management features provided by OpenStack.
60 1 Tom Clegg
61 1 Tom Clegg
h2. Public Cloud OS 
62 1 Tom Clegg
63 1 Tom Clegg
Public clouds like Amazon Web Services and Google Cloud Platform are currently much more widely adopted than private cloud OS solutions. These services offer similar capabilities and services under different names. Although public clouds are typically less appropriate for working with biomedical big data, they offer the important benefits of fast provisioning and dynamic scaling with no up-front commitment.
64 1 Tom Clegg
65 1 Tom Clegg
One of the key goals of Arvados is to provide a smooth transition of data and applications between heterogeneous public and private clouds, so informaticians can easily and safely make use of cloud platforms suitable for the type of work being done.