The long term goal of this research proposal is to develop variable selection procedures that can effectively incorporate both the study design used and the structure of the data. From a biomedical perspective, this development will be advantageous in that it will allow for a more accurate identification of biological features, such s genetic markers or imaging measures that distinguish among different disease groups. In turn, this improved identification of important disease biomarkers will contribute to deeper insights into the nature and etiology of many diseases and disorders. Matched case-control designs are currently used in a wide range of biomedical applications because they control for the effects of important potential confounds that can distort the true relationship between features and diagnostic group membership. In studies that use this design, a key interest is to identify important features in discriminating cases from controls. To ensure high efficiency and statistical power in identifying relevant features in distinguishing among disease groups, it is important to take into account the matched design that is used. However, in many instances, particularly those including high dimensional data analysis, there are few variable selection methods that account for matching. Bayesian approaches to variable selection are beneficial in that they offer efficient methods for handling high dimensional biological data. They yield tractable models that incorporate the biological structure of the data through the selection of prior distributions. The proposed methodology consists of a novel variable selection approach to effectively account for matching in case-control studies by formulating conditional logistic regression models in a Bayesian framework. This methodology will be carefully developed to handle a wide range of settings that have direct relevance to biomedical applications, including high dimensional data settings, interactions among different features, complex data structures, usage of different matched case- control designs, and ordering among disease groups or disorders. The proposed variable selection approach will be investigated in numerous simulation studies employing several types of matching, a brain imaging study in matched samples of stroke patients aimed at finding brain regions predictive of hospital acquired pneumonia, and a matched case-control study aimed at finding biomarkers in blood plasma for cardiovascular events. Its performance in the context of matched case-control studies will also be evaluated in comparison with other variable selection techniques.

Public Health Relevance

In biomedical applications, matched case-control studies are frequently used to identify important biomarkers in characterizing many types of diseases and disorders that are current public health issues. To more accurately identify these important biomarkers, it is necessary to account for both the matched design used and the biological structure of the data. This research proposal will develop a new variable selection methodology that will efficiently incorporate matching and data structure in its analytic approach.

National Institute of Health (NIH)
National Institute of Neurological Disorders and Stroke (NINDS)
Postdoctoral Individual National Research Service Award (F32)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-F16-L (20))
Program Officer
Gilbert, Peter R
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Harvard University
Biostatistics & Other Math Sci
Schools of Public Health
United States
Zip Code
Asafu-Adjei, Josephine K; Betensky, Rebecca A (2015) A Pairwise Naïve Bayes Approach to Bayesian Classification. Intern J Pattern Recognit Artif Intell 29: