Despite great progress in molecular genetic methods, considerably less progress has been made in the refinement of phenotypes for substance dependence (SD) and other psychiatric disorders. SD, as defined by the Diagnostic and Statistical Manual of Mental Disorders (DSM), is clinically and etiologically heterogeneous. The DSM-defined traits are not optimal for gene finding efforts, which has substantially limited our understanding of the genetic etiology of SD. Thus, the differentiation of homogeneous subtypes of drug use, related behaviors, and co-occurring phenotypes could improve the identification of genetic variation that underlies the risk for SD and other complex traits. Existing methods are not adequate to tackle this task. The most sophisticated subtyping methods available perform unsupervised cluster analysis or latent class analysis of a disorder's clinical features. Without theoretical guidance, blind cluster or latent class analysis can lead to subtypes of little utilityin genetic analysis. In this project, we will develop novel statistical methods to subtype SD traits quantitatively. Using data from >11,000 identically assessed subjects aggregated from family-based and case-control genetic studies (including genome-wide association studies (GWAS)) of cocaine, opioid and alcohol dependence, we will identify clinical subtypes that are optimized with respect to heritability. All subjects underwent thorough phenotyping using a poly-diagnostic instrument that includes 3000 items, yielding reliable demographic, medical, substance use, and substance-related measures, and DSM diagnoses of all major substance use and psychiatric disorders. A majority of the subjects also underwent GWAS. Our preliminary results support the hypothesis that careful subtyping of substance use and related behaviors enhances the detection of genetic variants that contribute to the risk of addiction-related phenotypes and are not detected using a standard diagnostic approach. The primary aims of the proposed research are to develop: (1) bioinformatics methods to derive quantitative traits that are highly heritable n terms of traditional narrow-sense heritability and recently-defined SNP-based heritability; (2) integrative methods to jointly analyze phenotypic features and genetic markers to identify subtypes that are homogeneous phenotypically and genetically; and (3) genetic association approaches that are more efficient for subtype analysis. The derived subtypes and their association findings will be validated using multiple independent samples. An important secondary aim of the project is to develop and disseminate validated methods and software for public use through the PI's website. In summary, the objectives of the project are significant in their potential to enhance the discovery of genetic variants that contribute to the risk of SD usin novel methods validated by the interdisciplinary research team. These methods, once applied to understanding the etiology of SD, may be suitable for extension to other complex phenotypes.

Public Health Relevance

This project will develop novel statistical and quantitative tools and techniques to refine the phenotypes of substance dependence and other complex disorders to enhance genetic analysis, an important area of genetics research that is underdeveloped. The proposed novel approaches are expected to advance our understanding of genetic contributions to the heterogeneity in disease phenotypes.

National Institute of Health (NIH)
National Institute on Drug Abuse (NIDA)
Research Project (R01)
Project #
Application #
Study Section
Behavioral Genetics and Epidemiology Study Section (BGES)
Program Officer
Wu, Da-Yu
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Connecticut
Engineering (All Types)
Schools of Engineering
United States
Zip Code
Lu, Jin; Sun, Jiangwen; Wang, Xinyu et al. (2018) Inferring phenotypes from substance use via collaborative matrix completion. BMC Syst Biol 12:104
Lu, Jin; Sun, Jiangwen; Wang, Xinyu et al. (2017) Collaborative Phenotype Inference from Comorbid Substance Use Disorders and Genotypes. Proceedings (IEEE Int Conf Bioinformatics Biomed) 2017:392-397
Wang, Xin; Bi, Jinbo (2017) Bi-convex Optimization to Learn Classifiers from Multiple Biomedical Annotations. IEEE/ACM Trans Comput Biol Bioinform 14:564-575
Shang, Chao; Palmer, Aaron; Sun, Jiangwen et al. (2017) VIGAN: Missing View Imputation with Generative Adversarial Networks. Proc IEEE Int Conf Big Data 2017:766-775
Johannesen, Jason K; Bi, Jinbo; Jiang, Ruhua et al. (2016) Machine learning identification of EEG features predicting working memory performance in schizophrenia and healthy adults. Neuropsychiatr Electrophysiol 2:3
Lu, Jin; Liang, Guannan; Sun, Jiangwen et al. (2016) A Sparse Interactive Model for Matrix Completion with Side Information. Adv Neural Inf Process Syst 29:4071-4079
Sun, Jiangwen; Jiang, Zongliang; Tian, Xiuchun et al. (2016) A cross-species bi-clustering approach to identifying conserved co-regulated genes. Bioinformatics 32:i137-i146
Wang, Xin; Bi, Jinbo; Yu, Shipeng et al. (2016) Multiplicative Multitask Feature Learning. J Mach Learn Res 17:
Sun, Jiangwen; Kranzler, Henry R; Bi, Jinbo (2015) An Effective Method to Identify Heritable Components from Multivariate Phenotypes. PLoS One 10:e0144418
Sun, Jiangwen; Kranzler, Henry R; Bi, Jinbo (2015) Refining multivariate disease phenotypes for high chip heritability. BMC Med Genomics 8 Suppl 3:S3