The principal objective of this project is to develop methods that combine pathogen genotyping and patient epidemiology data that can be used in the control, understanding, and tracking of infectious diseases. This work focuses on the modeling of large international collections of patient epidemiology and strain data for the Mycobacterium tuberculosis complex (MTC), the causative agent of tuberculosis disease (TB), because of the urgent global need and the unique data availability due to the National TB genotyping program. Specifically, the project addresses the following problem: given MTC DNA fingerprinting and TB patient data being accumulated nationally and internationally, identify hidden groups capturing MTC genetic families and TB epidemiology using machine learning, and use these hidden groups to address problems in the control, understanding, prevention, and treatment of tuberculosis at city, state, national, and international levels. To address this objective, we identify several aims.
The first aim i s to gather and merge large databases of MTC patient-isolate genotypes as well as associated patient information from the New York City, New York State, United States, and the rest of the world.
The second aim i s to identify MTC strain families based on multiple genotype methods using graphical models constrained to reflect background knowledge.
The third aim i s to identify hidden host-pathogen groups within TB patient demographics and MTC genotypes using a combination of probabilistic graphical models and deterministic multi-way tensor analysis methods designed to capture the temporal dynamics of TB.
The fourth aim answers public health questions posed by TB experts by transforming the questions into quantifiable metrics applied to the hidden groups. The hidden group models and metrics will be embedded in analysis methods, and then evaluated by TB experts. The proposed models and analysis methods will capture and share knowledge embedded in large TB patient and MTC genotyping databases without necessarily sharing the actual data.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
5R01LM009731-02
Application #
7612766
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Sim, Hua-Chuan
Project Start
2008-04-15
Project End
2012-04-14
Budget Start
2009-04-15
Budget End
2010-04-14
Support Year
2
Fiscal Year
2009
Total Cost
$342,967
Indirect Cost
Name
Rensselaer Polytechnic Institute
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
002430742
City
Troy
State
NY
Country
United States
Zip Code
12180
Aminian, Minoo; Couvin, David; Shabbeer, Amina et al. (2014) Predicting Mycobacterium tuberculosis complex clades using knowledge-based Bayesian networks. Biomed Res Int 2014:398484
Zaretzki, Jed; Bergeron, Charles; Huang, Tao-wei et al. (2013) RS-WebPredictor: a server for predicting CYP-mediated sites of metabolism on drug-like molecules. Bioinformatics 29:497-8
Ozcaglar, Cagri; Shabbeer, Amina; Kurepina, Natalia et al. (2012) Inferred spoligoforest topology unravels spatially bimodal distribution of mutations in the DR region. IEEE Trans Nanobioscience 11:191-202
Shabbeer, Amina; Cowan, Lauren S; Ozcaglar, Cagri et al. (2012) TB-Lineage: an online tool for classification and analysis of strains of Mycobacterium tuberculosis complex. Infect Genet Evol 12:789-97
Ozcaglar, Cagri; Shabbeer, Amina; Vandenberg, Scott L et al. (2012) Epidemiological models of Mycobacterium tuberculosis complex infections. Math Biosci 236:77-96
Zaretzki, Jed; Rydberg, Patrik; Bergeron, Charles et al. (2012) RS-Predictor models augmented with SMARTCyp reactivities: robust metabolic regioselectivity predictions for nine CYP isozymes. J Chem Inf Model 52:1637-59
Bergeron, Charles; Moore, Gregory; Zaretzki, Jed et al. (2012) Fast bundle algorithm for multiple-instance learning. IEEE Trans Pattern Anal Mach Intell 34:1068-79
Shabbeer, Amina; Ozcaglar, Cagri; Yener, Bülent et al. (2012) Web tools for molecular epidemiology of tuberculosis. Infect Genet Evol 12:767-81
Zaretzki, Jed; Bergeron, Charles; Rydberg, Patrik et al. (2011) RS-predictor: a new tool for predicting sites of cytochrome P450-mediated metabolism applied to CYP 3A4. J Chem Inf Model 51:1667-89
Abadia, Edgar; Zhang, Jian; Ritacco, Viviana et al. (2011) The use of microbead-based spoligotyping for Mycobacterium tuberculosis complex to evaluate the quality of the conventional method: providing guidelines for Quality Assurance when working on membranes. BMC Infect Dis 11:110

Showing the most recent 10 out of 17 publications