? ? The principal objective of this project is to develop methods that combine pathogen genotyping and patient epidemiology data that can be used in the control, understanding, and tracking of infectious diseases. This work focuses on the modeling of large international collections of patient epidemiology and strain data for the Mycobacterium tuberculosis complex (MTC), the causative agent of tuberculosis disease (TB), because of the urgent global need and the unique data availability due to the National TB genotyping program. Specifically, the project addresses the following problem: given MTC DNA fingerprinting and TB patient data being accumulated nationally and internationally, identify hidden groups capturing MTC genetic families and TB epidemiology using machine learning, and use these hidden groups to address problems in the control, understanding, prevention, and treatment of tuberculosis at city, state, national, and international levels. To address this objective, we identify several aims.
The first aim i s to gather and merge large databases of MTC patient-isolate genotypes as well as associated patient information from the New York City, New York State, United States, and the rest of the world.
The second aim i s to identify MTC strain families based on multiple genotype methods using graphical models constrained to reflect background knowledge.
The third aim i s to identify hidden host-pathogen groups within TB patient demographics and MTC genotypes using a combination of probabilistic graphical models and deterministic multi-way tensor analysis methods designed to capture the temporal dynamics of TB.
The fourth aim answers public health questions posed by TB experts by transforming the questions into quantifiable metrics applied to the hidden groups. The hidden group models and metrics will be embedded in analysis methods, and then evaluated by TB experts. The proposed models and analysis methods will capture and share knowledge embedded in large TB patient and MTC genotyping databases without necessarily sharing the actual data. ? ? ?
Showing the most recent 10 out of 17 publications