The primary goal of this proposal is to aid Dr. Seunggeun (Shawn) Lee in becoming an independent researcher with expertise in sequencing association studies of complex lung, heart, and blood diseases such as ARDS/ALI, OSA, and type 2 diabetes. Dr. Lee is currently a postdoctoral research fellow in the Department of Biostatistics at the Harvard School of Public Health, where he has begun developing statistical methods for sequence association studies. Complex diseases not only incur tragic human cost but also impose a substantial financial burden on society. Each year, ARDS/ALI alone is responsible for 75,000 deaths and over $20 billion in medical costs in the US. The tremendous development of next generation sequencing technology will improve our understanding of the complex disease etiology by identifying the disease susceptibility rare genetic variants. However, statistical development lags behind the development of sequencing technology, and we need to develop advanced statistical methods to fill this gap. Specifically, the applicant proposes to develop statistical methods to: 1) adjust genotyping errors caused by cost-effective sequencing designs such as low coverage sequencing;and 2) test an association between multivariate correlated phenotypes and rare genetic variants. The developed methods will be applied to ongoing sequencing association studies of ARDS/ALI, OSA, and type 2 diabetes. During the mentored period, the applicant will learn modern statistical methods such as the kernel machine, measurement error model and generalized estimating equation, as well as to establish fundamental statistical framework of the proposed research under the guidance of Dr. Xihong Lin (mentor), and to expand knowledge on complex diseases under the guidance of Dr. David Christiani (co-mentor). In addition, the applicant will broaden his background in heart and lung disorders, modern parallel computing, and Next Generation Sequencing through rigorous coursework and participation in workshops and seminars. Building upon skills acquired in the mentored period, Dr. Lee will expand the established statistical framework to diverse models, and apply them in ARDS, OSA, and type 2 diabetes studies to identify disease susceptibility rare genetic variants. After the completion of this award, the applicant will have developed into an independent and productive researcher with expertise in the application of sequencing technologies to genetic epidemiology research.
Complex disease such as ARDS/ALI, OSA, and type 2 diabetes are major public health concerns. The proposed research will develop advanced statistical methods to analyze Next Generation Sequencing data and apply them to real sequencing association studies to identify disease susceptibility rare genetic variants. The results from this study will improve our understanding of complex disease etiology.
|Ma, Clement; Boehnke, Michael; Lee, Seunggeun et al. (2015) Evaluating the Calibration and Power of Three Gene-Based Association Tests of Rare Variants for the X Chromosome. Genet Epidemiol 39:499-508|
|Barnett, Ian J; Lee, Seunggeun; Lin, Xihong (2013) Detecting rare variant effects using extreme phenotype sampling in sequencing association studies. Genet Epidemiol 37:142-51|
|Wu, Michael C; Maity, Arnab; Lee, Seunggeun et al. (2013) Kernel machine SNP-set testing under multiple candidate kernels. Genet Epidemiol 37:267-75|
|Ionita-Laza, Iuliana; Lee, Seunggeun; Makarov, Vladimir et al. (2013) Family-based association tests for sequence data, and comparisons with population-based association tests. Eur J Hum Genet 21:1158-62|
|Ionita-Laza, Iuliana; Lee, Seunggeun; Makarov, Vlad et al. (2013) Sequence kernel association tests for the combined effect of rare and common variants. Am J Hum Genet 92:841-53|
|Lee, Seunggeun; Teslovich, Tanya M; Boehnke, Michael et al. (2013) General framework for meta-analysis of rare variants in sequencing association studies. Am J Hum Genet 93:42-53|
|Lin, Xinyi; Lee, Seunggeun; Christiani, David C et al. (2013) Test for interactions between a genetic marker set and environment in generalized linear models. Biostatistics 14:667-81|