The principal objective of this project is to develop methods that combine pathogen genotyping and patient epidemiology data that can be used in the control, understanding, and tracking of infectious diseases. This work focuses on the modeling of large international collections of patient epidemiology and strain data for the Mycobacterium tuberculosis complex (MTC), the causative agent of tuberculosis disease (TB), because of the urgent global need and the unique data availability due to the National TB genotyping program. Specifically, the project addresses the following problem: given MTC DNA fingerprinting and TB patient data being accumulated nationally and internationally, identify hidden groups capturing MTC genetic families and TB epidemiology using machine learning, and use these hidden groups to address problems in the control, understanding, prevention, and treatment of tuberculosis at city, state, national, and international levels. To address this objective, we identify several aims.
The first aim i s to gather and merge large databases of MTC patient-isolate genotypes as well as associated patient information from the New York City, New York State, United States, and the rest of the world.
The second aim i s to identify MTC strain families based on multiple genotype methods using graphical models constrained to reflect background knowledge.
The third aim i s to identify hidden host-pathogen groups within TB patient demographics and MTC genotypes using a combination of probabilistic graphical models and deterministic multi-way tensor analysis methods designed to capture the temporal dynamics of TB.
The fourth aim answers public health questions posed by TB experts by transforming the questions into quantifiable metrics applied to the hidden groups. The hidden group models and metrics will be embedded in analysis methods, and then evaluated by TB experts. The proposed models and analysis methods will capture and share knowledge embedded in large TB patient and MTC genotyping databases without necessarily sharing the actual data.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Sim, Hua-Chuan
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Rensselaer Polytechnic Institute
Biostatistics & Other Math Sci
Schools of Arts and Sciences
United States
Zip Code
Aminian, Minoo; Couvin, David; Shabbeer, Amina et al. (2014) Predicting Mycobacterium tuberculosis complex clades using knowledge-based Bayesian networks. Biomed Res Int 2014:398484
Zaretzki, Jed; Bergeron, Charles; Huang, Tao-wei et al. (2013) RS-WebPredictor: a server for predicting CYP-mediated sites of metabolism on drug-like molecules. Bioinformatics 29:497-8
Ozcaglar, Cagri; Shabbeer, Amina; Kurepina, Natalia et al. (2012) Inferred spoligoforest topology unravels spatially bimodal distribution of mutations in the DR region. IEEE Trans Nanobioscience 11:191-202
Shabbeer, Amina; Cowan, Lauren S; Ozcaglar, Cagri et al. (2012) TB-Lineage: an online tool for classification and analysis of strains of Mycobacterium tuberculosis complex. Infect Genet Evol 12:789-97
Ozcaglar, Cagri; Shabbeer, Amina; Vandenberg, Scott L et al. (2012) Epidemiological models of Mycobacterium tuberculosis complex infections. Math Biosci 236:77-96
Zaretzki, Jed; Rydberg, Patrik; Bergeron, Charles et al. (2012) RS-Predictor models augmented with SMARTCyp reactivities: robust metabolic regioselectivity predictions for nine CYP isozymes. J Chem Inf Model 52:1637-59
Bergeron, Charles; Moore, Gregory; Zaretzki, Jed et al. (2012) Fast bundle algorithm for multiple-instance learning. IEEE Trans Pattern Anal Mach Intell 34:1068-79
Shabbeer, Amina; Ozcaglar, Cagri; Yener, Bülent et al. (2012) Web tools for molecular epidemiology of tuberculosis. Infect Genet Evol 12:767-81
Ozcaglar, Cagri; Shabbeer, Amina; Vandenberg, Scott et al. (2011) Sublineage structure analysis of Mycobacterium tuberculosis complex strains using multiple-biomarker tensors. BMC Genomics 12 Suppl 2:S1
Macías Parra, Mercedes; Kumate Rodríguez, Jesús; Arredondo García, José Luís et al. (2011) Mycobacterium tuberculosis Complex Genotype Diversity and Drug Resistance Profiles in a Pediatric Population in Mexico. Tuberc Res Treat 2011:239042

Showing the most recent 10 out of 17 publications