A major project of this section is the development of new statistical genetics methodology as prompted by the needs of our applied studies and the testing and comparison of novel and existing statistical methods. The project to develop propensity scores in linkage analyses as a method for inclusion of covariate effects has been continued in conjunction with Dr. Betty Doan. This method appears promising in that it is generally more powerful than including the covariates directly into the model, and does not have strongly inflated Type I error rates. We are currently applying it to Dr. Bailey-Wilson's lung cancer data. We have previously developed sex-specific single-nucleotide polymorphism quality control filters for use with high density SNP array chip data to help control false positive rates due to poor data quality in genome-wide association studies. A paper presenting this method was published this year (2). We continue to explore the utility of various machine learning methods in genome-wide association studies, particularly with respect to power and detection of gene-gene and gene-environment interactions. We previously used GWAS genotype data from the Framingham Heart Study data repository with computer simulated trait data, thus allowing us to show that these methods may be able to detect interaction effects in suitably-powered studies. A paper presenting these results was published this year (1). We are continuing to pursue the use of machine learning methods in genomics studies, and are currently evaluating the power of several of these methods in whole-exome sequence data from the 1000 Genomes Project using computer simulated phenotypes as part of Genetic Analysis Workshop 17 (GAW17). We also are using the GAW17 simulated whole-exome sequence (WES) data to develop novel tools for analysis and interpretation of WES data, including strategies for combining linkage and sequence results, various schemes of collapsing rare variants in genes and gene networks to improve the power of sequence analysis, and methods for integrating sequence analyses with existing genomics databases. Development of these analysis methods and tools are ongoing, driven by our own WES and targeted sequence data from multiple studies of complex traits. Haplotype phase imputation is useful for most haplotype based association tests and for missing genotype imputation. However, phase imputation in extended pedigrees is not trivial. For large haplotype blocks, the number of possible diplotype configurations grows exponentially with the number of markers in the block. There are several haplotype phase estimation or reconstruction algorithms, and a few recent methods handle extended pedigree data. However, haplotypes with rare frequencies are often left out in reconstruction. In application, this can result in misclassification of imputed genotypes. These erroneous genotype assignments are considered particular to individuals, and are usually ignored. Based on population haplotype frequencies, we developed an efficient algorithm to impute phases (and therefore genotypes) for haplotype blocks of up to 8 SNPs, in trios as well as in extended pedigrees. To ensure no individual genotype assignment errors, we consider all possible haplotypes, even those with extremely rare frequencies. We are evaluating our phase imputation method using simulated data with masked genotypes within pedigrees. We are examining the misclassification proportions of imputed genotypes when rare haplotypes are left out, and comparing our methods with other methods: PHASE, HAPLORE, PedPhase, and PhyloPed. We will present this method and results at the upcoming 2010 meetings of the American Society of Human Genetics and the International Genetic Epidemiology Society. This study is ongoing.

Project Start
Project End
Budget Start
Budget End
Support Year
12
Fiscal Year
2010
Total Cost
$171,181
Indirect Cost
Name
National Human Genome Research Institute
Department
Type
DUNS #
City
State
Country
Zip Code
Chiu, Chi-Yang; Jung, Jeesun; Wang, Yifan et al. (2017) A comparison study of multivariate fixed models and Gene Association with Multiple Traits (GAMuT) for next-generation sequencing. Genet Epidemiol 41:18-34
Larson, Nicholas B; McDonnell, Shannon; Cannon Albright, Lisa et al. (2017) gsSKAT: Rapid gene set analysis and multiple testing correction for rare-variant association studies using weighted linear kernels. Genet Epidemiol 41:297-308
Ioannidis, Nilah M; Rothstein, Joseph H; Pejaver, Vikas et al. (2016) REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants. Am J Hum Genet 99:877-885
Holzinger, Emily R; Szymczak, Silke; Malley, James et al. (2016) Comparison of parametric and machine methods for variable selection in simulated Genetic Analysis Workshop 19 data. BMC Proc 10:147-152
König, Inke R; Auerbach, Jonathan; Gola, Damian et al. (2016) Machine learning and data mining in complex genomic data--a review on the lessons learned in Genetic Analysis Workshop 19. BMC Genet 17 Suppl 2:1
Szymczak, Silke; Holzinger, Emily; Dasgupta, Abhijit et al. (2016) r2VIM: A new variable selection method for random forests in genome-wide association studies. BioData Min 9:7
Fan, Ruzong; Chiu, Chi-Yang; Jung, Jeesun et al. (2016) A Comparison Study of Fixed and Mixed Effect Models for Gene Level Association Studies of Complex Traits. Genet Epidemiol 40:702-721
Wang, Yifan; Liu, Aiyi; Mills, James L et al. (2015) Pleiotropy analysis of quantitative traits at gene level by multivariate functional linear models. Genet Epidemiol 39:259-75
Pendergrass, Sarah A; Verma, Shefali S; Hall, Molly A et al. (2015) Next-generation analysis of cataracts: determining knowledge driven gene-gene interactions using biofilter, and gene-environment interactions using the Phenx Toolkit*. Pac Symp Biocomput :495-505
Li, Qing; Kim, Yoonhee; Suktitipat, Bhoom et al. (2015) Gene-Gene Interaction Among WNT Genes for Oral Cleft in Trios. Genet Epidemiol 39:385-94

Showing the most recent 10 out of 30 publications