This research is for the development of new approaches to the analysis of data from large cohort studies, either epidemiologic or clinical trials, with many qualitatively different variables observed over several time points, and with many possibly correlated outcomes of interest.
The aim i s to develop methods for discovering patterns of higher order variable interactions that suggest unusually high risk for some outcomes, that are not readily found by more traditional methods. The methods proposed are an attempt to develop tools that merge so-called data mining approaches with more traditional biostatistical methods, and which have the ability to generate hypotheses which can then be further examined by classical parametric statistical methods, or by modern multivariate semi-parametric model building methods, such as Smoothing Spline Analysis of Variance, (SS-ANOVA), which have been developed under this research program and elsewhere. An additional goal is to incorporate family structure information for a subset of study participants in parallel with the search for high order interactions among variables to uncover patterns that may be related to family structure, and to examine the tradeoff between family related and other information in predicting, or estimating the probability of various outcomes. Data from the Wisconsin Epidemiological Study of Diabetic Retinopathy and the Beaver Dam Eye Study will be used to examine the models under study for their reasonableness and for their ability to answer questions meaningful to the study scientists. The results will have broad applicability to other large epidemiological studies as well as to clinical trials.

Public Health Relevance

Epidemiological and clinical studies have much responsibility for the dramatic improvement in public health and longevity in the last fifty years or so. Better understanting of the effect of lifestyle factors, treatment opportunities, and genetic factors have come about as the result of straightforward as well as sophisticated analysis of the data gleaned from these studies. With extensive data collection and complex data structures, as well as improved computational and software resources, there are opportunities to further develop and extend modern data analysis methods to better capture complex relations between variables that affect outcomes of important personal and public health interest. It is proposed to exploit these opportunities.

Agency
National Institute of Health (NIH)
Institute
National Eye Institute (NEI)
Type
Research Project (R01)
Project #
5R01EY009946-15
Application #
7344695
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Everett, Donald F
Project Start
1992-12-01
Project End
2009-12-31
Budget Start
2008-01-01
Budget End
2008-12-31
Support Year
15
Fiscal Year
2008
Total Cost
$280,750
Indirect Cost
Name
University of Wisconsin Madison
Department
Biostatistics & Other Math Sci
Type
Schools of Medicine
DUNS #
161202122
City
Madison
State
WI
Country
United States
Zip Code
53715
Kong, Jing; Klein, Barbara E K; Klein, Ronald et al. (2015) Backward multiple imputation estimation of the conditional lifetime expectancy function with application to censored human longevity data. Proc Natl Acad Sci U S A 112:12069-74
Kong, Jing; Wang, Sijian; Wahba, Grace (2015) Using distance covariance for improved variable selection with application to learning genetic risk models. Stat Med 34:1708-20
Geng, Zhigeng; Wang, Sijian; Yu, Menggang et al. (2015) Group variable selection via convex log-exp-sum penalty with application to a breast cancer survivor study. Biometrics 71:53-62
Kong, Jing; Klein, Barbara E K; Klein, Ronald et al. (2012) Using distance correlation and SS-ANOVA to assess associations of familial relationships, lifestyle factors, diseases, and mortality. Proc Natl Acad Sci U S A 109:20352-7
Shi, Weiliang; Wahba, Grace; Irizarry, Rafael A et al. (2012) The partitioned LASSO-patternsearch algorithm with application to gene expression data. BMC Bioinformatics 13:98
Wahba, Grace (2010) Encoding Dissimilarity Data for Statistical Model Building. J Stat Plan Inference 140:3580-3596
Bravo, Héctor Corrada; Lee, Kristine E; Klein, Barbara E K et al. (2009) Examining the relative influence of familial, genetic, and environmental covariate information in flexible risk models. Proc Natl Acad Sci U S A 106:8128-33
Bravo, Héctor Corrada; Wright, Stephen; Eng, Kevin H et al. (2009) Estimating Tree-Structured Covariance Matrices via Mixed-Integer Programming. J Mach Learn Res 5:41-48
Shi, Weiliang; Wahba, Grace; Wright, Stephen et al. (2008) LASSO-Patternsearch algorithm with application to ophthalmology and genomic data. Stat Interface 1:137-153
Lu, Fan; Keles, Sunduz; Wright, Stephen J et al. (2005) Framework for kernel regularization with application to protein clustering. Proc Natl Acad Sci U S A 102:12332-7

Showing the most recent 10 out of 14 publications