We enter life as a composite of some 200 cell types orchestrated into a single metazoan organism; within days, these are joined by trillions of bacterial, archaeal, and eukaryotic microbes resident on every surface of our bodies. While decades of microbial ecology and metazoan cellular biology have detailed many aspects of these entities, we have only recently begun to bridge models of unicellular organisms in monoculture with complex multicellular systems at the molecular level. The goal of this research is thus to develop computational methodology to model the molecular behavior of multicellular systems, particularly microbial communities and their interactions with metazoan tissues, by taking advantage of large experimental data repositories. The project focuses on characterizing the biological roles of gene products in such systems and on translating data from controlled experimental contexts so as to apply in multi-cell-type and multi-species moieties. This will require the development of data mining algorithms capable of efficiently leveraging thousands of experimental datasets from diverse organisms to model key aspects of multicellular biology: multi-species communities, cell types and lineages, and their structure and distribution within a community or tissue. For this purpose, the project will develop satellite models for machine learning in which core properties and parameters are modified on an as-needed basis. Predictions from these models will be experimentally validated by characterization of the organisms in and functional activity of the oral and gut microbiota and of individual under-characterized microbes and microbial interactions. Open-source and online implementations of developed tools will be available through the laboratory web site at http://huttenhower.sph.harvard.edu.

This project will provide a general framework for genomic data mining in multicellular systems made up of multiple species or cell types, with a simple interface for summarizing thousands of genome-scale datasets. The educational component will include an expansion of the Program in Quantitative Genomics, which includes a newly-developed computational biology curriculum, outreach through the Stanford South Africa Biomedical Informatics program, and ongoing collaborations with the Harvard University LS/HHMI and International Society for Computational Biology high school outreach programs. This will establish solid foundations in training and in computational methodology for understanding multicellular systems and interactions by mining large biological data collections.

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Application #
1053486
Program Officer
Jennifer Weller
Project Start
Project End
Budget Start
2011-04-01
Budget End
2016-03-31
Support Year
Fiscal Year
2010
Total Cost
$853,592
Indirect Cost
Name
Harvard University
Department
Type
DUNS #
City
Cambridge
State
MA
Country
United States
Zip Code
02138