The investigator has two objectives to accomplish in this research project. The first objective is to develop theory of statistical inference for a new class of semi-parametric odds ratio models that include both the generalized linear model and the Cox regression model as special cases. The second objective is to apply this class of models to solve a number of theoretical problems that are of importance in applications. These applications include (1) addressing issues in parameter identifiability, estimation, and inference in biased sampling designs in studying the association of a disease with gene and environment factors, (2) introducing a new approach for testing goodness of fit of generalized linear models, and (3) developing a new flexible semi-parametric procedure for multivariate density estimation and survival analysis with complex dependence structures.
Flexible and easily interpretable stochastic models are very useful in extracting information from data with complex structures. Such data are often collected in studies exploring the causes of diseases and other socio-economic problems. The investigator proposes a new class of models for the statistical analysis of such data. Results from this research will have broad applications in a variety of scientific fields. When applied to epidemiological studies, results from this research will be able to facilitate the design of more powerful statistical tests for detecting genetic and environmental causes of disease. When applied to sociological studies, results from this research will be able to facilitate the understanding of underlying causes of complex socio-economic problems so that better solutions to the problems can be found.
The goal of the project was to develop theory and methods for the generalized semi-parametric odds ratio models, which are widely applicable in statistical analysis. In comparison to other models, the developed approach offers great flexibility in modeling different types of multivariate data and the model parameters have interpretation highly desirable to scientific researchers. We have developed a complete theory for statistical inference on these models that can accommodate various biased sampling designs commonly used in epidemiological studies of disease risks and in evaluation of treatment efficacy. Algorithms for implementing the developed theory have been proposed and software for carrying out data analysis using the proposed methods has been developed. Our developments provide more flexible and powerful tools for solving data analytical problems. Our developed methods have been directly applied to the genome-wide association studies of prostate cancer and of cardiovascular diseases. A number of papers have been published as a result of the proposed development in this research project. Specifically, Chen (2011) clarified parameter identifiability in many different parametric and/or semi-parametric models under biased sampling designs using the unified odds ratio modeling framework. The results had been applied to the analysis of a genome-wide association study of high density lipoprotein under an extreme-value sampling design (Chen and Li, 2011). The proposed approach is more powerful than other existing approaches. Chen and Chen (2011) studied the gene-environment interaction use the framework to clarify when and how the efficiency gain can be achieved by assuming gene-environment independence. Chen, Reilly, and Li (2013) developed two new methods respectively for the case-control design and matched case-control design for incorporating gene-environment models into the analysis for efficiency gain when the gene-environment independence assumption may be violated. Chen, Kittles, Zhang (2013) proposed several ways of analyzing secondary traits and compared them with existing methods. Chen, Rader, and Li (2014) developed a general theory to make inference on model parameters based on the semi-parametric likelihood with correctly specified or misspecified odds ratio model. Overall, the research outcome will have impacts on the way case-control, matched case-control, and the extreme-value sampling designs are analyzed. The results improve the efficiency in genetic data analysis and help discover important clues in forming new hypothesis for the design of future genomic studies. The research results from this project provide novel solutions to the statistical problems that have broad applications to different scientific fields. As a result, flexible and easily interpretable models are now available for statistical exploration of high-dimensional data. Such data are abundant in biomedical research on disease diagnosis, prevention, and treatment. The research results enhanced our ability to extract useful information from research data, which in turn can lead to better disease prevention, diagnosis and treatment, better social justice and economic development. The solution provided to solve biased sampling problems will have direct impact on the way the genetic association studies are designed and analyzed. The developed statistical software in the research project allows for researchers to use the developed methods directly for their data analysis needs. Educationally, graduate students in Biostatistics are trained in this project. Two PhD students and a number of Master students were directed by the P.I. during the project period. The research results have been partially incorporated the courses taught by the P.I. Reference: Chen, H. Y., Rader, D. E., Li, M. (2014). Likelihood inferences on semi-parametric odds ratio model. Accepted and published online Journal of the American Statistical Association. Chen, H. Y., Reilly, M. P., Li, M. (2013). Semi-parametric odds ratio model for case-control and matched case-control designs. Statistics in Medicine, 32, 3126-3142. Chen, H. Y., Kittles, R. Zhang, W. (2013). Bias correction to secondary trait analysis with case-control design. Statistics in Medicine, 32, 1494-1508. Chen, H. Y. (2011). A unified framework for studying parameter identifiability and estimation in biased sampling design. Biometrika, 98,163-175. Chen, H. Y., Chen, J. (2011). On information coded in gene-environment independence in case-control design. American Journal of Epidemiology, 174, 736-743. Chen, H. Y. and Li, M. (2011). Improving power for detecting genetic association in extreme value sampling design. Genetic Epidemiology, 35, 823-830.