Discovering hidden groups across tuberculosis patient and pathogen genotype data

Bennett, Kristin; Yener, Bulent

Abstract

The principal objective of this project is to develop methods that combine pathogen genotyping and patient epidemiology data that can be used in the control, understanding, and tracking of infectious diseases. This work focuses on the modeling of large international collections of patient epidemiology and strain data for the Mycobacterium tuberculosis complex (MTC), the causative agent of tuberculosis disease (TB), because of the urgent global need and the unique data availability due to the National TB genotyping program. Specifically, the project addresses the following problem: given MTC DNA fingerprinting and TB patient data being accumulated nationally and internationally, identify hidden groups capturing MTC genetic families and TB epidemiology using machine learning, and use these hidden groups to address problems in the control, understanding, prevention, and treatment of tuberculosis at city, state, national, and international levels. To address this objective, we identify several aims.
The first aim i s to gather and merge large databases of MTC patient-isolate genotypes as well as associated patient information from the New York City, New York State, United States, and the rest of the world.
The second aim i s to identify MTC strain families based on multiple genotype methods using graphical models constrained to reflect background knowledge.
The third aim i s to identify hidden host-pathogen groups within TB patient demographics and MTC genotypes using a combination of probabilistic graphical models and deterministic multi-way tensor analysis methods designed to capture the temporal dynamics of TB.
The fourth aim answers public health questions posed by TB experts by transforming the questions into quantifiable metrics applied to the hidden groups. The hidden group models and metrics will be embedded in analysis methods, and then evaluated by TB experts. The proposed models and analysis methods will capture and share knowledge embedded in large TB patient and MTC genotyping databases without necessarily sharing the actual data.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Research Project (R01)
Project #: 5R01LM009731-03
Application #: 7805478
Study Section: Biomedical Library and Informatics Review Committee (BLR)
Program Officer: Sim, Hua-Chuan

Project Start: 2008-04-15
Project End: 2012-04-14
Budget Start: 2010-04-15
Budget End: 2011-04-14
Support Year: 3
Fiscal Year: 2010
Total Cost: $339,537
Indirect Cost

Institution

Name: Rensselaer Polytechnic Institute
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 002430742

City: Troy
State: NY
Country: United States
Zip Code: 12180

Related projects


NIH 2011 R01 LM	Discovering hidden groups across tuberculosis patient and pathogen genotype data Bennett, Kristin P.; Yener, Bulent / Rensselaer Polytechnic Institute	$325,956
NIH 2010 R01 LM	Discovering hidden groups across tuberculosis patient and pathogen genotype data Bennett, Kristin P.; Yener, Bulent / Rensselaer Polytechnic Institute	$339,537
NIH 2009 R01 LM	Discovering hidden groups across tuberculosis patient and pathogen genotype data Bennett, Kristin P.; Yener, Bulent / Rensselaer Polytechnic Institute	$342,967
NIH 2009 R01 LM	Discovering hidden groups across tuberculosis patient and pathogen genotype data Bennett, Kristin P.; Yener, Bulent / Rensselaer Polytechnic Institute	$170,861
NIH 2009 R01 LM	Discovering hidden groups across tuberculosis patient and pathogen genotype data Bennett, Kristin P.; Yener, Bulent / Rensselaer Polytechnic Institute	$170,789
NIH 2008 R01 LM	Discovering hidden groups across tuberculosis patient and pathogen genotype data Bennett, Kristin P.; Yener, Bulent / Rensselaer Polytechnic Institute	$342,967

Publications

Aminian, Minoo; Couvin, David; Shabbeer, Amina et al. (2014) Predicting Mycobacterium tuberculosis complex clades using knowledge-based Bayesian networks. Biomed Res Int 2014:398484

Zaretzki, Jed; Bergeron, Charles; Huang, Tao-wei et al. (2013) RS-WebPredictor: a server for predicting CYP-mediated sites of metabolism on drug-like molecules. Bioinformatics 29:497-8

Ozcaglar, Cagri; Shabbeer, Amina; Kurepina, Natalia et al. (2012) Inferred spoligoforest topology unravels spatially bimodal distribution of mutations in the DR region. IEEE Trans Nanobioscience 11:191-202

Shabbeer, Amina; Cowan, Lauren S; Ozcaglar, Cagri et al. (2012) TB-Lineage: an online tool for classification and analysis of strains of Mycobacterium tuberculosis complex. Infect Genet Evol 12:789-97

Ozcaglar, Cagri; Shabbeer, Amina; Vandenberg, Scott L et al. (2012) Epidemiological models of Mycobacterium tuberculosis complex infections. Math Biosci 236:77-96

Zaretzki, Jed; Rydberg, Patrik; Bergeron, Charles et al. (2012) RS-Predictor models augmented with SMARTCyp reactivities: robust metabolic regioselectivity predictions for nine CYP isozymes. J Chem Inf Model 52:1637-59

Bergeron, Charles; Moore, Gregory; Zaretzki, Jed et al. (2012) Fast bundle algorithm for multiple-instance learning. IEEE Trans Pattern Anal Mach Intell 34:1068-79

Shabbeer, Amina; Ozcaglar, Cagri; Yener, Bülent et al. (2012) Web tools for molecular epidemiology of tuberculosis. Infect Genet Evol 12:767-81

Macías Parra, Mercedes; Kumate Rodríguez, Jesús; Arredondo García, José Luís et al. (2011) Mycobacterium tuberculosis Complex Genotype Diversity and Drug Resistance Profiles in a Pediatric Population in Mexico. Tuberc Res Treat 2011:239042

Ozcaglar, Cagri; Shabbeer, Amina; Kurepina, Natalia et al. (2011) Data-driven insights into deletions of Mycobacterium tuberculosis complex chromosomal DR region using spoligoforests. Proceedings (IEEE Int Conf Bioinformatics Biomed) :75-82

Showing the most recent 10 out of 17 publications

Comments

Be the first to comment on Kristin Bennett's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: