Metagenomic sequencing projects generate thousands to millions of uncharacterized microbial genes that are almost completely ignored in all fields of research. Addressing this problem will fundamentally transform how scientists exploring microbial communities or new microbial isolates will interpret their genetic material and the function of that material. This potentially high payoff is balanced by a high risk in that microbial community information has not previously been mined in order to address this issue. In the absence of more extensive preliminary data, or one or more years of prior validation, this necessitates the application of previously untried approaches to prioritize and characterize the targeted microbial genes. Lastly, while the downstream methods to be applied here for gene function prediction will be adapted from eukaryotic model systems, this will require both application in a completely new area (culture-independent prokaryotes) and the intersection of multiple disciplines (computational gene function prediction, data integration, and network mining with microbial community studies and microbiology).

Current technologies generate novel nucleotide sequence information at a rate that greatly outpaces our capability to functionally characterize those sequences. From one third to more typically over three quarters of proteins in newly-sequenced prokaryotic genomes and communities cannot be functionally characterized. The increase in metagenomic sequencing results in millions of recently identified, completely uncharacterized microbial genes representing a significant need for efficient computational gene prioritization and characterization systems. This project will first leverage metagenomic sequences in a novel effort to prioritize the uncharacterized genes for further study in order to break from current approaches targeting genes from well-studied gene families. Second, integrative, network-based approaches will be used to accelerate and automate the assignment of putative function for subsequent validation in high-priority gene targets. Both new approaches will be implemented as freely available, documented software and distributed to the broader research community along with pilot datasets. A postdoctoral fellow, a graduate student and undergraduate students will receive cutting edge training in integrative experimental and computational approaches during the two-year project.

Project Start
Project End
Budget Start
2014-09-15
Budget End
2016-08-31
Support Year
Fiscal Year
2014
Total Cost
$256,392
Indirect Cost
Name
Harvard University
Department
Type
DUNS #
City
Cambridge
State
MA
Country
United States
Zip Code
02138