This project addresses an important challenge of developing sophisticated and novel machine learning techniques for complex real-world problems. New technologies allow us to determine the genomes of organisms co-existing within various ecosystems ranging from ocean, soil and human-body. Several researchers have embarked on studying the pathogenic role played by the microbiome, defined as the collection of microbial organisms within the human body, with respect to human health and disease conditions.

The research activities in this CAREER project will develop approaches for the identification of taxonomy, function and metabolic potential from the collective genomes samples. A key contribution will be the development of multi-task learning approaches that combine information across multiple hierarchical databases associated with the annotation problems. During research, the PI will investigate the best ways to capture the underlying hierarchical structure, prevalent within different annotation databases. The rationale underlying this proposed research is that there is a wealth of complementary information that exists across several manually curated biological databases. Associating microbiome with phenotype requires integration of various high-throughput omic data sources (genomic, metabolic, proteomic) that may not be uniformly available across all samples. The PI will develop data fusion classifiers within the multi-task learning paradigm to integrate heterogeneous, incomplete data sources for predicting phenotypes. This project will lead to the following key contributions: (i) Improved metagenome annotation models by integration of multiple prediction tasks and associated databases. (ii) Incorporation of hierarchical information within regularized multi-task learning. (iii) Integration of diverse and incomplete information sources. (iv) Scalable algorithms that use hash based feature representations and improve the learning rates.

This project is interdisciplinary and spans the fields of machine learning, bioinformatics, metagenomics, microbiology and environmental ecology. This project will foster the the synergy between teaching and research by providing an environment for all students to develop intellectually and professionally. The project integrates the research with an education plan focused on mentoring of high school, undergraduate and graduate students, curriculum development and laboratory visits. Planned activities include training of inter-disciplinary researchers, integration of microbiome analysis related projects within the classes, curriculum enhancement and implementation of new learning strategies. Open source software and tools will be developed as part of this project, that will enhance scientific understanding and discovery amongst a broad and diverse group of researchers.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1252318
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2013-03-01
Budget End
2019-09-30
Support Year
Fiscal Year
2012
Total Cost
$550,000
Indirect Cost
Name
George Mason University
Department
Type
DUNS #
City
Fairfax
State
VA
Country
United States
Zip Code
22030