Statistical Methods for Analysis of Next Generation Sequencing Data in Gen

Lin, Xihong; Dominici, Francesca

Abstract

This proposal is to develop advanced statistical methods for analyzing large next generation sequencing data in genetic cancer epidemiological studies. The genomic era provides an unprecedented promise of understanding multifactorial diseases, such as cancer, and of identifying specific targets that can be used to develop patient-tailored therapies. Although hundreds of genome-wide association studies in the last few years have identified over a thousand common genetic variants associated with many complex diseases, these variants only explain a small fraction of heritability of diseases. The recent advance of next generation sequencing technologies provides an exciting new opportunity for discovering genes and biomarkers associated with diseases or traits, studying gene-environment interactions, predicting disease risk, and advancing personalized medicine. However, large sequencing data, especially rare variants, present fundamental statistical and computational challenges in data analysis and result interpretation. A shortage of appropriate and powerful statistical methods for analysis of next generation sequencing data has become a bottleneck for effectively using these rich resources to rapidly develop novel molecular cancer prevention and treatment strategies. The purpose ofthis proposal is to respond to this need. The proposed methods are motivated by and applied to the Harvard Lung Cancer and Breast Cancer exome and targeted sequencing association studies, in which the investigators play a major leadership role.
The specific aims are: (1) To develop a unified, powerful and robust statistical framework to test the association between rare variants and diseases and traits in sequencing association studies;(2) To develop penalized likelihood-based methods for risk prediction in population based sequencing studies;(3) To use the causal inference framework for mediation analysis to estimate and test for the direct effects of genetic rare variants and their indirect effects mediated through environmental risk factors on disease risk in sequencing studies;and account for measurement error in exposures. (4) To develop efficient user-friendly open access statistical software. This project integrates closely with Projects 1 and 2 with a common theme of analysis of large and complex observational study data, and takes advantage ofthe expertise of Projects 1 and 2 in causal inference on mediation analysis and modeling environmental exposures in studying the interplay of genes and environment. It also relies heavily on the Statistical Computing Core, and the organizational infrastructure, team'building strategies, workshops and visitor program provided through the Administrative Core.

Public Health Relevance

This project aims to develop statistical methods to advance cancer prevention and intervention strategies by using next generation sequencing data to identify genetic variants associated with cancer, to build genetic risk prediction models for cancer risk;and to study the direct and indirect effects of genetic variants in the interplay of genes and environment in cancer risk and progression.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Cancer Institute (NCI)
Type: Research Program Projects (P01)
Project #: 5P01CA134294-07
Application #: 8754133
Study Section: Special Emphasis Panel (ZCA1-RPRB-2)

Project Start
Project End
Budget Start: 2014-07-01
Budget End: 2015-06-30
Support Year: 7
Fiscal Year: 2014
Total Cost: $140,688
Indirect Cost: $52,239

Institution

Name: Harvard University
Department
Type
DUNS #: 149617367

City: Boston
State: MA
Country: United States
Zip Code: 02115

Related projects

Publications

Bobb, Jennifer F; Claus Henn, Birgit; Valeri, Linda et al. (2018) Statistical software for analyzing the health effects of multiple concurrent exposures via Bayesian kernel machine regression. Environ Health 17:67

Chen, Han; Cade, Brian E; Gleason, Kevin J et al. (2018) Multiethnic Meta-Analysis Identifies RAI1 as a Possible Obstructive Sleep Apnea-related Quantitative Trait Locus in Men. Am J Respir Cell Mol Biol 58:391-401

Pierce, Brandon L; Kraft, Peter; Zhang, Chenan (2018) Mendelian randomization studies of cancer risk: a literature review. Curr Epidemiol Rep 5:184-196

Barfield, Richard; Feng, Helian; Gusev, Alexander et al. (2018) Transcriptome-wide association studies accounting for colocalization using Egger regression. Genet Epidemiol 42:418-433

Liu, Zhonghua; Lin, Xihong (2018) Multiple phenotype association tests using summary statistics in genome-wide association studies. Biometrics 74:165-175

Emilsson, Louise; García-Albéniz, Xabier; Logan, Roger W et al. (2018) Examining Bias in Studies of Statin Treatment and Survival in Patients With Cancer. JAMA Oncol 4:63-70

Sun, Ryan; Carroll, Raymond J; Christiani, David C et al. (2018) Testing for gene-environment interaction under exposure misspecification. Biometrics 74:653-662

Antonelli, Joseph; Cefalu, Matthew; Palmer, Nathan et al. (2018) Doubly robust matching estimators for high dimensional confounding adjustment. Biometrics :

Wilson, Ander; Zigler, Corwin M; Patel, Chirag J et al. (2018) Model-averaged confounder adjustment for estimating multivariate exposure effects with linear regression. Biometrics 74:1034-1044

Sofer, Tamar; Schifano, Elizabeth D; Christiani, David C et al. (2017) Weighted pseudolikelihood for SNP set analysis with multiple secondary outcomes in case-control genetic association studies. Biometrics 73:1210-1220

Showing the most recent 10 out of 192 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: