This subproject is one of many research subprojects utilizing the resources provided by a Center grant funded by NIH/NCRR. Primary support for the subproject and the subproject's principal investigator may have been provided by other sources, including other NIH sources. The Total Cost listed for the subproject likely represents the estimated amount of Center infrastructure utilized by the subproject, not direct funding provided by the NCRR grant to the subproject or subproject staff. This project proposes to develop tools and systems to aid in the detection of infectious diseases. Infectious diseases harm/kill more people in the world than any single cause. The germs that cause infectious diseases are found everywhere, in the soil, water and air. Due to globalization, the germs that cause infectious diseases have the potential to reach and affect many people. Therefore, surveillance and early detection are key to curtail the further spread of the disease in the population. Huge amounts of data is collected by the hospitals and the Centers for Disease Control and Prevention to track various infection disease incidents such as the demographics of the infected population, the treatment plan, the genetic information of the germ, etc. Analysis of this information can be valuable in understanding the demographic aspects of the spread of the disease and how the germ is responding to various treatment plans.
This research aims to develop efficient methods to analyze the enormous amounts of data related to infectious diseases to find useful and actionable information. Tools will be developed to identify temporal information such as the trends in the disease incidence and progression, the susceptible population, as well as spatial information such as the geographical areas where the incidence is the greatest and the geographical progression of the disease. We plan to focus on one infectious disease and collect data from public as well as private sources related to that disease. We plan to collect research articles, patient medical records, and surveillance data from organizations such as the CDC. The patient medical records will be scrubbed to remove any personal information of the patients. Although the data is collected from heterogenous sources, it will be related to some aspect of the chosen infectious disease. In order to capture all of the relations in the data, we plan to study methods that can exploit heterogeneous data with multiple types of the relations. For example, a patient medical record contains multiple types of information -- the age of the patient, the date and place of the medical report, the organism involved in the infectious disease, the particulars of the treatment plan, etc. A patient medical record may be related to other data for different reasons. The raw data is collected from various data sources is preprocessed to create a relational database with multiple relations. The data from multiple relations is then extracted in a format suitable for three types of analysis -- temporal, spatial and spatio-temporal. For example, to identify the time periods during which there was a highest incidence of the disease, the number of reports for each recorded time period is extracted from the database to be analyzed. Similarly, to identify the geographical region with the highest incidence of the disease, each geographical region and the number of patients treated for the disease is extracted from the database. To identify if there are any genetic differences between the organisms that cause the disease, the genetic sequences are extracted from the database and compared.
Showing the most recent 10 out of 322 publications