Human genetics research has accelerated in the last decade owing to our evolving understanding of the human genome. With the recent completion of the International HapMap Project, the development of largescale genotyping technology, and rapid decline in genotyping costs, an immense amount of genotype data have been generated, which in turn raises many new challenging problems for analysis and interpretation of the data. This application proposes developing new statistical methodologies that aim to address a wide range of statistical issues in current candidate gene and genome-wide association (GWA) studies. Specifically, the proposal will address the following problems. (1) Recent high-resolution genome mapping indicates that copy number variations (CNVs) are ubiquitous and common in the general population, and may play a major role in phenotypic variation.
In Aim 1, we will develop a Bayesian hidden Markov model based algorithm for highresolution CNV detection using whole-genome SNP genotyping data. Our algorithm has the ability to incorporate both unrelated individuals and family data. (2) Given the high density of genetic markers in largescale candidate gene and GWA studies, it is reasonable to expect that multilocus genotypes offer more information on genetic association than single-marker analysis.
In Aim 2, we will develop a powerful multimarker test for gene-based association analysis and extend the method to analysis of gene-gene interactions. The virtue of our method lies in its ability to borrow strength from nearby markers while reducing the degrees of freedom. (3) In many disease gene-mapping studies, individuals are ascertained from a recently admixed population.
In Aim 3, we will develop novel association tests in genetics studies using recently admixed populations. By considering ancestry level and genotypes together, our method offers higher resolution and power than traditional admixture mapping methods. (4) Appropriate adjustment for multiple dependent tests has long been a problem in genetics studies, especially for studies with limited sample size and without replication datasets.
In Aim 4, we propose new methods to estimate the effective number of tests that reflect the amount of independent information contained in the data. (5) In Aim 5, we will develop, test, distribute, and support freely available implementations of the methods proposed in this application. The methods will be evaluated through analytical approaches, computer simulations and applications to multiple real datasets. Recent development of large-scale genotyping technologies has led to the generation of an immense amount of genotype data, which raises many new challenging problems for the analysis and interpretation of the data. This application proposes developing new statistical methodologies that address a set of unresolved issues.
|Cheng, K F; Lee, J Y; Zheng, W et al. (2014) A powerful association test of multiple genetic variants using a random-effects model. Stat Med 33:1816-27|
|Hu, Yu; Liu, Yichuan; Mao, Xianyun et al. (2014) PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution. Nucleic Acids Res 42:e20|
|Liu, Yichuan; Ferguson, Jane F; Xue, Chenyi et al. (2014) Tissue-specific RNA-Seq in human evoked inflammation identifies blood and adipose LincRNA signatures of cardiometabolic diseases. Arterioscler Thromb Vasc Biol 34:902-12|
|Samuels, David C; Li, Chun; Li, Bingshan et al. (2013) Recurrent tissue-specific mtDNA mutations are common in humans. PLoS Genet 9:e1003929|
|Mao, Xianyun; Li, Yun; Liu, Yichuan et al. (2013) Testing genetic association with rare variants in admixed populations. Genet Epidemiol 37:38-47|
|Torstenson, Eric S; Li, Bingshan; Li, Chun (2013) ASAP: an environment for automated preprocessing of sequencing data. BMC Res Notes 6:5|
|Liu, Yichuan; Ferguson, Jane F; Xue, Chenyi et al. (2013) Evaluating the impact of sequencing depth on transcriptome profiling in human adipose. PLoS One 8:e66883|
|Liu, Eric Yi; Li, Mingyao; Wang, Wei et al. (2013) MaCH-admix: genotype imputation for admixed populations. Genet Epidemiol 37:25-37|
|Byrnes, Andrea E; Wu, Michael C; Wright, Fred A et al. (2013) The value of statistical or bioinformatics annotation for rare variant association with quantitative trait. Genet Epidemiol 37:666-74|
|Chen, Hua Yun; Reilly, Muredach P; Li, Mingyao (2013) Semiparametric odds ratio model for case-control and matched case-control designs. Stat Med 32:3126-42|
Showing the most recent 10 out of 23 publications