A major project of this section is the development of new statistical genetics methodology as prompted by the needs of our applied studies and the testing and comparison of novel and existing statistical methods. The project to develop propensity scores in linkage analyses as a method for inclusion of covariate effects has been continued in conjunction with Dr. Betty Doan. This method appears promising in that it is generally more powerful than including the covariates directly into the model, and does not have strongly inflated Type I error rates. We are currently using computer simulation studies to examine factors that affect the performance of this method and are applying it to Dr. Bailey-Wilsons lung cancer data. We are completed work on establishing a p-value threshold for genome wide association studies using the number of independent SNPs and blocks within the HapMap database, as well as the Affymetrix and Illumina GWAS panels. Since increased density reduces the number of independent tests, using corrections like Bonferroni are not accurate. Instead, we used HAPMAP data and the linkage disequilibrium structure of the genome to identify the true number of independent SNPs across the genome. This work was published this year (1), giving researchers in the field guidelines for appropriate significance thresholds in several ethnic groups plus algorithms for recomputing these thresholds for other ethnic groups and as newer versions of the HapMap are released. We also explored the utility of various machine learning methods in genome-wide association studies, particularly with respect to power and detection of gene-gene and gene-environment interactions. We used GWAS genotype data from the Framingham Heart Study data repository with computer simulated trait data, thus allowing us to show that these methods may be able to detect interaction effects in suitably-powered studies. A paper presenting these results is in press. We are continuing to pursue the use of machine learning methods in genomics studies. Many of these projects are ongoing.

Project Start
Project End
Budget Start
Budget End
Support Year
11
Fiscal Year
2009
Total Cost
$172,609
Indirect Cost
Name
National Human Genome Research Institute
Department
Type
DUNS #
City
State
Country
Zip Code
Chiu, Chi-Yang; Jung, Jeesun; Wang, Yifan et al. (2017) A comparison study of multivariate fixed models and Gene Association with Multiple Traits (GAMuT) for next-generation sequencing. Genet Epidemiol 41:18-34
Larson, Nicholas B; McDonnell, Shannon; Cannon Albright, Lisa et al. (2017) gsSKAT: Rapid gene set analysis and multiple testing correction for rare-variant association studies using weighted linear kernels. Genet Epidemiol 41:297-308
Ioannidis, Nilah M; Rothstein, Joseph H; Pejaver, Vikas et al. (2016) REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants. Am J Hum Genet 99:877-885
Holzinger, Emily R; Szymczak, Silke; Malley, James et al. (2016) Comparison of parametric and machine methods for variable selection in simulated Genetic Analysis Workshop 19 data. BMC Proc 10:147-152
König, Inke R; Auerbach, Jonathan; Gola, Damian et al. (2016) Machine learning and data mining in complex genomic data--a review on the lessons learned in Genetic Analysis Workshop 19. BMC Genet 17 Suppl 2:1
Szymczak, Silke; Holzinger, Emily; Dasgupta, Abhijit et al. (2016) r2VIM: A new variable selection method for random forests in genome-wide association studies. BioData Min 9:7
Fan, Ruzong; Chiu, Chi-Yang; Jung, Jeesun et al. (2016) A Comparison Study of Fixed and Mixed Effect Models for Gene Level Association Studies of Complex Traits. Genet Epidemiol 40:702-721
Ritchie, Marylyn D; Holzinger, Emily R; Li, Ruowang et al. (2015) Methods of integrating data to uncover genotype-phenotype interactions. Nat Rev Genet 16:85-97
Holzinger, Emily Rose; Szymczak, Silke; Dasgupta, Abhijit et al. (2015) Variable selection method for the identification of epistatic models. Pac Symp Biocomput :195-206
Wang, Yifan; Liu, Aiyi; Mills, James L et al. (2015) Pleiotropy analysis of quantitative traits at gene level by multivariate functional linear models. Genet Epidemiol 39:259-75

Showing the most recent 10 out of 30 publications