Discovery of critical factors affecting human health is very important in biomedical research, and can provide valuable health guidance to the general public. As the capability to capture and store medical data expands, the need for computational tools that facilitate such discoveries is high and increasing. This project aims to build an efficint risk factor/association discovery system to extract significant, valid, non-redundant, and previously unknown associations of attributes from medical datasets.
The specific aims are:
Specific Aim 1 : Creating an efficient association discovery algorithm integrating the strengths of both knowledge-based and objective association mining techniques. In the preliminary study we designed a novel knowledge-based association analysis algorithm, which can detect and discard the majority of invalid or already known associations. To further improve the analysis quality and weed out redundant associations, we propose to fully investigate our knowledge-based algorithm and integrate it with MAFIA, an objective redundancy-reducing technique. This new association discovery algorithm will present only non-trivial, valid, non-redundant, and previously unknown associations that can lead to exciting discoveries.
Specific Aim 2 : Investigating a novel semantic network-based knowledge model. Systematic application of user and medical domain knowledge in association analysis started only recently. Chen et al. manually built a semantic network consisting of 38 attributes of the Heartfelt adolescent health survey and successfully identified 8 associations on adolescent health and development. To improve the efficiency of semantic network construction, the proposed project will develop an automatic semantic network building component to extract knowledge directly from the Unified Medical Language System, a large medical knowledge base developed by NIH. To validate and evaluate the scalability of this model, a large semantic network based on large real-world medical datasets will be automatically built and fully tested.
Specific Aim 3 : Rigorous evaluation will be conducted with a thorough analysis of large real- world datasets, and discovered associations will be validated with biostatistic models and experienced medical researchers. Since one advantage of our association discovery technique is the capability of finding """"""""hidden"""""""" associations that a user has never suspected, new health-associated factors may be discovered, which will undoubtedly advance the public health research. National Institute of General Medical Sciences (NIGMS) commits to investing in discovery by using a variety of vehicles to support basic research. This project aims to efficiently discover critical factors associated with human health, which directly contribute to enhance the basic biomedical research and support the creation of research resources including software and hardware tools (Goal 1 and 2 in the NIGMS Strategic Plan). Our highly experienced and interdisciplinary research team consists of 1 Computer Scientist, 1 computer technician/programmer, 1 Bio-statistician, and 2 medical technicians, and naturally this project will advance multidisciplinary and interdisciplinary inquiry (Goal 2). The University of Houston-Downtown is both a Hispanic Serving Institute (HSI) and Minority Institute (MI) designated by the U.S. Department of Education, and this project will directly involve underrepresented minority students into biomedical research. More students will benefit from class projects and new courseware spawned from this project (Goal 3 in the NIGMS Strategic Plan).

Public Health Relevance

This project aims to design and rigorously test an efficient medical association discovery algorithm to effectively extract non-trivial, validated, non-redundant, and previously unknown health-related associations (risk factors) from large real-world medical datasets. These associations will provide valuable reasoning and modeling mechanism that are critically important to the foundation of medical and health research. Additionally, health-related associations can also provide a basis for clinical decision making, public health policy, and other important public health fields including health guidance that can directly promote and improve the health of individuals, families, communities, and populations.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Pilot Research Project (SC2)
Project #
Application #
Study Section
Special Emphasis Panel (ZGM1-TWD-8 (SC))
Program Officer
Brazhnik, Paul
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Houston-Downtown
Biostatistics & Other Math Sci
Schools of Arts and Sciences
United States
Zip Code