A detailed simulation study was conducted to evaluate alternate strategies for building polygenic risk prediction models from genome-wide association studies. The study made a number of important observations regarding optimal strategies for SNP selection, estimation of their coefficients in the model and handling of correlation among SNPs in linkage disequilibrium. Investigations have been conducted for testing the adequacy of models at extremes of disease-risk where data are often sparse but model departure is more likely with important clinical implications. Application of the method to analysis of a large case-control study of breast cancer indicate common SNPs act in additive fashion on the risk of the disease in a logistic scale. Methods have been developed for testing statistical significance of observed pattern of mutations in affected pedigrees based on novel method of resampling of independent chromosomes within pedigree. Application of the method to exome sequencing data has provided the statistical support behind the finding of a major gene for melanoma recently. Methods have been developed for testing genetic association of diseases with multiple SNPs within a region taking into account for possibility of complex interactions. An innovative computational approach was developed for rapid evaluation of tree-structure models for gene-gene interactions combined with permutation-based resampling methods for testing statistical significance of associations. Method has been developed to use random effect or frailty model to assess heterogeneity of risks of cervical cancer due to unobserved risk-factors. Model was fitted to data available from a cohort study of almost million women to obtain insight into risk distribution of women enrolled in a large HMO. New method was developed for weighting logistic analysis of epidemiologic studies that involves augmentation sampling with re-stratification and population expansion. Performance of the method was evaluated using simulation study and study of human papiloma virus serology. A permutation-based resampling method was developed for using metabolomic data for testing the hypothesis of mediation of the effect of an exposure (e.g smoking) on the risk of a disease (e.g lung cancer) through intermediate biomarkers. A study was conducted to investigate the theoretical properties of the adaptive LASSO algorithm, which is popularly used for model fitting in the presence of high-dimensional covariates, for its ability to control false discovery rate in the underlying variable selection procedure.
Showing the most recent 10 out of 182 publications