This application is developed in response to RFA-MH-08-040, calling for the development of methods for designing sequence-based validation studies as well as for analyzing sequence-based associations with complex phenotypes, such as diabetes, cancer and coronary heart diseases. This RFA is indeed timely and the requested development is consistent with our current research interest, i.e., how to validate initial leads from a genome-wide association study (GWAS) using resequencing technology. Our short-term goal identified in this proposal is to develop novel statistical designs that enable researchers to design cost-effective study designs to validate GWAS discoveries using resequencing technologies. Further, after such sequence data are obtained, our next short-term goal is to develop statistical methods for assessing DNA sequence data features and their correlations with complex phenotypes. Our long-term goal is to develop novel statistical approaches to correlate whole genome sequences with complex disease phenotypes. As it is written, this proposal has three specific Aims: 1) Developing an efficient design for sequence-based validation. We describe a two-stage design: treating GWAS as the first stage, we then sample a subset of individuals to the second stage, based upon both phenotype and genetic markers, i.e., the second stage samples are biased and require special considerations for the design and analysis. 2) Developing statistical methods for validating genetic association analysis with unphased sequencing data. Unphased sequence data are routinely obtained at this time. In order to validate disease association with full sequences, it is important to infer phased sequence data (i.e., long extended haplotypes) and their distributions, and then to correlate them with the disease phenotype. Of course, we need to acknowledge biased sampling features from this two-stage design. 3) Developing statistical methods for validating genetic association analysis with fully phased diploid sequences. As it stands, there are technologies that can be used to obtain fully phased sequence data, such as fosmid-directed sequencing technology used by Dr. Geraghty (Co-Investigator on this project). The availability of fully phased sequence data allows us to study many other aspects of genetic variation and to assess their associations with disease phenotypes. Some of these statistical analysis techniques, once fully developed, may also be applicable to the assessment of whole genome associations with complex diseases. Designs and Methods for Sequence-Based Validation Analyses Project Narrative. This proposal addresses an important statistical issue that we begin to face: how to validate our initial leads from a genome-wide association study using resequencing technology. As one of the research groups funded by the NIH to carry out genome-wide association studies, we have been thinking about cost-effective study design and valid analytic methodologies. Developments identified in this proposal would enable us to accelerate the translation from bench-side discoveries to bed-side practice, thus greatly impacting public health.

Agency
National Institute of Health (NIH)
Institute
National Institute of Mental Health (NIMH)
Type
Research Project (R01)
Project #
5R01MH084621-02
Application #
7691831
Study Section
Special Emphasis Panel (ZMH1-ERB-C (06))
Program Officer
Yao, Yin Y
Project Start
2008-09-25
Project End
2011-06-30
Budget Start
2009-07-01
Budget End
2010-06-30
Support Year
2
Fiscal Year
2009
Total Cost
$440,000
Indirect Cost
Name
Fred Hutchinson Cancer Research Center
Department
Type
DUNS #
078200995
City
Seattle
State
WA
Country
United States
Zip Code
98109
Zhao, Lue Ping; Fan, Wenhong; Goodman, Gary et al. (2015) Deciphering Genome Environment Wide Interactions Using Exposed Subjects Only. Genet Epidemiol 39:334-46
Zhao, Lue Ping; Huang, Xin (2013) Recursive organizer (ROR): an analytic framework for sequence-based association analysis. Hum Genet 132:745-59
Zhang, Xinyi Cindy; Xu, Chang; Mitchell, Ryan M et al. (2013) Tumor evolution and intratumor heterogeneity of an oropharyngeal squamous cell carcinoma revealed by whole-genome sequencing. Neoplasia 15:1371-8
Zhang, Xinyi Cindy; Zhang, Bo; Li, Shuying Sue et al. (2012) Sequencing genes in silico using single nucleotide polymorphisms. BMC Genet 13:6
Li, Shuying S; Wang, Hongwei; Smith, Anajane et al. (2011) Predicting multiallelic genes using unphased and flanking single nucleotide polymorphisms. Genet Epidemiol 35:85-92
Zhang, Xinyi Cindy; Li, Shuying Sue; Wang, Hongwei et al. (2011) Empirical evaluations of analytical issues arising from predicting HLA alleles using multiple SNPs. BMC Genet 12:39