Project

General

Profile

History » History » Version 1

Anonymous, 04/06/2013 01:52 PM

1 1 Anonymous
h1. History
2
3
In 2006 researchers at "Dr. George Church's Lab":http://arep.med.harvard.edu/ at Harvard Medical School began work on the "Personal Genome Project.":http://www.personalgenomes.org/ The PGP was started to collect whole genome sequencing, environmental, and trait data from individuals who openly consented to have their data shared on the internet under an IRB approved study. The vision was to collect 100,000 genomes at the Harvard project and help dozens of other project launch around the world. From the beginning the team envisioned having data stored in data centers around the world that would need to be federated and shared. 
4
5
"Alexander Wait Zaranek PhD (Sasha)":http://openwetware.org/wiki/User:Alexander_Wait_Zaranek became Director of Informatics for the project and began developing an informatics platform that could accomplish the goals of the PGP leveraging the best thinking from Google and other organizations work with petabyte and exabyte scale data set distributed across data centers. 
6
7
Sasha worked with Tom Clegg and Ward Vanderwege to design the system and presented a paper describing the approach a the 2008 USENIX Annual Technical Conference: "Free Factories: Unified Infrastructure for Data Intensive Web Services":http://www.ncbi.nlm.nih.gov/pubmed/20514356  
8
9
Sasha, Tom, Ward and other engineers then built the system envisioned in the paper. Free Factories currently runs two clusters at Harvard Medical School that power the PGP. Together these clusters provide storage and computational resources for 300TB of data.