With advancements in next-generation sequencing technologies, sequencing studies has become increasingly used in substance dependence (SD) research. These studies generate a massive amount of sequencing data and allow researchers to comprehensively investigate the role of a deep catalog of genetic variants in SD. Although the ongoing sequencing studies hold great promise for unraveling novel variants that contribute to SD, the high-dimensional data, low frequent variants, complex SD etiology, and heterogeneous SD phenotypes create tremendous analytic and computational challenges. Developing robust and powerful methods and computationally efficient software will address the challenges in SD sequencing data analysis and enhance our ability to identify new SD-related variants. The goals of this application are to develop new methods and software for designing and analyzing population-based and family-based sequencing data with single or multiple phenotypes, and to use them in collaborative research to investigate genetic variants and gene-gene/gene-environment (G-G/G-E) interactions associated with SD. Based on the preliminary simulation results, our central hypothesis is that the proposed methods are more computationally efficient than existing methods, and attain a more robust and powerful performance for various types of phenotypes. The planned specific aims are to: 1) develop a new non-parametric method for the design and analysis of sequencing data with one or multiple SD phenotypes; 2) develop a Joint-U method for high-dimensional G-G/G-E interaction analysis with SD sequencing data; 3) develop a family-similarity-U method for family-based SD sequencing data analysis, accounting for population stratification and rare variants enriched in families; and 4) facilitate the use of the new methods through software development and collaboration. The proposed research will be initiated by an early-stage new investigator (NIDA K01 awardee), who has assembled a team of scientists with expertise in statistical genetics, bioinformatics/software development, SD epidemiology, behavioral genetics, and clinical psychiatry. The successful completion of this project will address several important statistical and computational gaps in ongoing sequencing studies, and advance the methodology and software development for SD sequencing data analysis. The application of the new methods and software to large-scale SD sequencing datasets also holds promise for the discovery of new SD-associated variants and G-G/G-E interactions, which will ultimately lead to a better understanding of SD etiology, with resulting potential benefits for SD prevention and treatment.

Public Health Relevance

The proposed research by a new, early-stage investigator will develop computationally efficient and powerful statistical tools for large-scale sequencing data, and will use these tools to investigate genetic variants and gene-gene/gene-environment interactions associated with substance dependence. The success of the project will address computational and analytical challenges associated with massive sequencing data, and will provide a new statistical framework for high-dimensional data analysis. The application of these tools to multiple SD sequencing datasets through collaborative research also holds promise for the discovery of new SD-associated variants, which will ultimately lead to a better understanding of SD etiology.

National Institute of Health (NIH)
National Institute on Drug Abuse (NIDA)
Research Project (R01)
Project #
Application #
Study Section
Behavioral Genetics and Epidemiology Study Section (BGES)
Program Officer
Lossie, Amy C
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Florida
Biostatistics & Other Math Sci
Schools of Medicine
United States
Zip Code
Shen, Xiaoxi; Lu, Qing (2018) Joint analysis of genetic and epigenetic data using a conditional autoregressive model. BMC Genet 19:71
Li, Ming; He, Zihuai; Tong, Xiaoran et al. (2018) Detecting Rare Mutations with Heterogeneous Effects Using a Family-Based Genetic Random Field Method. Genetics 210:463-476
Shen, Ya-Nan; Yu, Ming-Xing; Gao, Qian et al. (2017) External validation of non-invasive prediction models for identifying ultrasonography-diagnosed fatty liver disease in a Chinese population. Medicine (Baltimore) 96:e7610
Jadhav, S; Koul, H L; Lu, Q (2017) Miscellanea Dependent generalized functional linear models. Biometrika 104:987-994
Jadhav, Sneha; Tong, Xiaoran; Lu, Qing (2017) A functional U-statistic method for association analysis of sequencing data. Genet Epidemiol 41:636-643
Wen, Yalu; Burt, Alexandra; Lu, Qing (2017) Risk Prediction Modeling on Family-Based Sequencing Data Using a Random Field Method. Genetics 207:63-73
Wei, Changshuai; Lu, Qing (2017) A generalized association test based on U statistics. Bioinformatics 33:1963-1971
He, Zihuai; Zhang, Min; Zhan, Xiaowei et al. (2014) Modeling and testing for joint association using a genetic random field model. Biometrics 70:471-9