Human genetics research has accelerated in the last decade owing to our evolving understanding of the human genome. With the recent completion of the International HapMap Project, the development of largescale genotyping technology, and rapid decline in genotyping costs, an immense amount of genotype data have been generated, which in turn raises many new challenging problems for analysis and interpretation of the data. This application proposes developing new statistical methodologies that aim to address a wide range of statistical issues in current candidate gene and genome-wide association (GWA) studies. Specifically, the proposal will address the following problems. (1) Recent high-resolution genome mapping indicates that copy number variations (CNVs) are ubiquitous and common in the general population, and may play a major role in phenotypic variation.
In Aim 1, we will develop a Bayesian hidden Markov model based algorithm for highresolution CNV detection using whole-genome SNP genotyping data. Our algorithm has the ability to incorporate both unrelated individuals and family data. (2) Given the high density of genetic markers in largescale candidate gene and GWA studies, it is reasonable to expect that multilocus genotypes offer more information on genetic association than single-marker analysis.
In Aim 2, we will develop a powerful multimarker test for gene-based association analysis and extend the method to analysis of gene-gene interactions. The virtue of our method lies in its ability to borrow strength from nearby markers while reducing the degrees of freedom. (3) In many disease gene-mapping studies, individuals are ascertained from a recently admixed population.
In Aim 3, we will develop novel association tests in genetics studies using recently admixed populations. By considering ancestry level and genotypes together, our method offers higher resolution and power than traditional admixture mapping methods. (4) Appropriate adjustment for multiple dependent tests has long been a problem in genetics studies, especially for studies with limited sample size and without replication datasets.
In Aim 4, we propose new methods to estimate the effective number of tests that reflect the amount of independent information contained in the data. (5) In Aim 5, we will develop, test, distribute, and support freely available implementations of the methods proposed in this application. The methods will be evaluated through analytical approaches, computer simulations and applications to multiple real datasets. Recent development of large-scale genotyping technologies has led to the generation of an immense amount of genotype data, which raises many new challenging problems for the analysis and interpretation of the data. This application proposes developing new statistical methodologies that address a set of unresolved issues.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG004517-03
Application #
7903895
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Brooks, Lisa
Project Start
2008-09-19
Project End
2013-07-31
Budget Start
2010-08-01
Budget End
2011-07-31
Support Year
3
Fiscal Year
2010
Total Cost
$384,328
Indirect Cost
Name
Vanderbilt University Medical Center
Department
Biostatistics & Other Math Sci
Type
Schools of Medicine
DUNS #
004413456
City
Nashville
State
TN
Country
United States
Zip Code
37212
Jia, Cheng; Hu, Yu; Liu, Yichuan et al. (2015) Mapping Splicing Quantitative Trait Loci in RNA-Seq. Cancer Inform 14:45-53
Jia, Cheng; Guan, Weihua; Yang, Amy et al. (2015) MetaDiff: differential isoform expression analysis using random-effects meta-regression. BMC Bioinformatics 16:208
Wang, Xuexia; Zhang, Shuanglin; Li, Yun et al. (2015) A powerful approach to test an optimally weighted combination of rare variants in admixed populations. Genet Epidemiol 39:294-305
Liu, Yichuan; Morley, Michael; Brandimarto, Jeffrey et al. (2015) RNA-Seq identifies novel myocardial gene expression signatures of heart failure. Genomics 105:83-9
Xu, Zheng; Duan, Qing; Yan, Song et al. (2015) DISSCO: direct imputation of summary statistics allowing covariates. Bioinformatics 31:2434-42
Cheng, K F; Lee, J Y; Zheng, W et al. (2014) A powerful association test of multiple genetic variants using a random-effects model. Stat Med 33:1816-27
Guan, Weihua; Li, Chun (2014) Design of DNA pooling to allow incorporation of covariates in rare variants analysis. PLoS One 9:e114523
Liu, Yichuan; Ferguson, Jane F; Xue, Chenyi et al. (2014) Tissue-specific RNA-Seq in human evoked inflammation identifies blood and adipose LincRNA signatures of cardiometabolic diseases. Arterioscler Thromb Vasc Biol 34:902-12
Hu, Yu; Liu, Yichuan; Mao, Xianyun et al. (2014) PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution. Nucleic Acids Res 42:e20
Jia, Cheng; Hu, Yu; Liu, Yichuan et al. (2014) Mapping Splicing Quantitative Trait Loci in RNA-Seq. Cancer Inform 13:35-43

Showing the most recent 10 out of 36 publications