Treatment of HIV infection is challenging because of both rapidly evolving therapeutic strategies and the complex social problems affecting patients. Tools that extract information from databases and provide information to resolve some of the involved uncertainties in selection of therapy strategies are necessary. Unfortunately,existing databases are marred with imperfections ranging from missing information to data entry errors and subjective evaluations. In machine learning, data imperfections have received inadequate attention,which is surprising in view of the abundance of mathematical frameworks developed by the decades of research in the .eld of uncertainty processing. In the proposed work, a team of three researchers will address the research tasks that will enhance the physician 's decision-making capabilities by gleaning actionable knowledge from relevant databases. These tasks include detection of frequently co-occurring diseases that require association mining in time-varying domains with ambiguities and uncertainties, prediction of the success of specific treatments, with special attention to induction from sparse and unreliable data, prediction of a patient non-compliance with a focus on ambiguous attributes and statistical and medical validation of the knowledge.
The proposed research will contribute to computer science along the following three lines. 1) Modification of existing techniques for association mining so that they can work with ambiguities and can quantify the uncertainty of the results. Techniques that reflect the time-varying nature of the induced knowledge will also be developed. 2) Development of a novel clustering algorithm (based on collaborative filtering) capable of ignoring the descriptions of the training examples, and modification of existing collaborative-filtering techniques so that they can handle data imperfections. 3) Development of machine-learning techniques for classifier induction from ambiguously described examples. All of these three contributions can be used in knowledge discovery from imperfect databases. In the medical domain, the induced knowledge will provide new hypotheses as well as new treatment strategies.
This research project involves a multidisciplinary collaboration of professionals from different disciplines. The medical students involved will learn to appreciate how modern computer science techniques can enhance medical practice,while the engineering students will learn about the complications encountered in medical applications of computer science. Outreach activities to develop the participation of high school and community college students will be developed. The University of Miami is a Hispanic Serving Institution; the proposed research will involve under-represented student groups in engineering research. Activities on broad dissemination of research results via publications and presentations, incorporation of theoretical and experimental work into courses, and by contribution to relevant Internet sites.