Research Project III ? New Statistical Procedures for High-Dimensional Complex Drug Abuse Data Intensive longitudinal data (ILD) gathered in ecological momentary assessment studies and genetic data are increasingly available in drug abuse and HIV research. Investigators often wish to conduct exploratory analyses in these rich data in order to address sophisticated and nuanced research questions such as ?Among a huge number of genetic markers, which ones are associated with vulnerability to the influence of environmental factors (such as time-varying social contexts) on withdrawal symptoms (such as momentary craving and negative affect) during an attempt to quit smoking?? and ?Which genetic markers interact with environmental factors to predict time to an event (such as an episode of risky sex)?? However, in ILD, genetic data, and integrations of the two, the subject sample size may be in the hundreds whereas there may be tens or even hundreds of thousands of variables. This has a crippling effect on exploratory data analyses because nearly all multivariate procedures break down when the number of variables exceeds the sample size. One option is first to use variable screening methods to reduce the enormous number of potential predictor variables down to a subset that is likely to be useful, and then to apply multivariate methods. Principled and effective approaches for variable screening are beginning to emerge in the statistics literature; however, currently these approaches do not extend to high dimensional complex data (HDCD) such as genetic data with ILD outcomes or with time-to-event outcomes because of the presence of time-varying effects and strong within-subject correlation. In the proposed project, we will develop new variable screening methods for HDCD to enable researchers to focus their efforts on a reduced subset of predictors that have potential impact on an outcome.
The Specific Aims of this project are as follows. (1) To develop variable screening procedures for HDCD with intensively measured outcomes. These procedures will be useful to address scientific questions such as, ?Which genetic, individual, and social factors predict the evolving relation between negative affect and craving during an attempt to quit smoking?? (2) To develop variable screening procedures for HDCD with time- to-event outcomes. These procedures will be useful in addressing questions such as, ?Which genes are most strongly associated with the time to relapse?? and ?Are there interactions between genes and time-varying social contexts on the time to relapse?? (3) To apply the innovative methods developed in Aims 1 and 2 to discover new knowledge in drug abuse and HIV risk. We will apply the new methods in empirical analyses of four existing HDCD sets to test new scientific hypotheses related to drug abuse and HIV. This work will be conducted in collaboration with the scientists who collected the data and will be published in the drug abuse/HIV literatures. (4) To disseminate the innovative methods to drug abuse/HIV scientists worldwide. In collaboration with the Dissemination, Software, and Technology Core, we will develop user-friendly software to enable scientists to implement our new methods and distribute software and tutorials on the Center's website.
Showing the most recent 10 out of 169 publications