Identifying Risk Factors and Interactions for Type 1 Diabetes in Large Studies

Qian, Xiaoning

Abstract

The etiology of many complex diseases, including Type-1 Diabetes (T1D), cannot be simply explained by genetic causes. Various factors, genetic as well as environmental, influence the progress of diseases. The critical issue to deriving the full benefit from biological, clinical, and longitudinal cohort studies for complex diseases is the appropriate analysis of the available large volumes of data, including these large-scale measurements and knowledge accrued from past research. Data mining approaches, especially feature selection from the massive number of measurements, become critical to identify reproducible and accurate risk factors to characterize pathogenic processes or pharmacologic responses to a therapeutic intervention for complex diseases including T1D. At the same time, data collection takes a significant amount of time and resources. Identifying risk factors and their interactions will significantly expedite the research at a low cost. The pri- mary objective of the proposed application is to develop a general network-based mathematical framework and efficient algorithms for identifying risk factors and their interactions as prognostic features that are highly informative about disease development. We will apply the developed algorithms to analyze the existing large-scale studies maintained at the Pediatric Epidemiology Center (PEC) at the University of South Florida (USF), including The Environmental Determinants of Diabetes in the Young (TEDDY) and the Diabetes Prevention Trial-Type 1 (DPT-1) studies. The identification of risk factors and their interactions provides deep insights to disease causality and mechanism. The proposed project has three specific aims: (1) An innovative data-driven analysis framework for risk factor identification will be presented and a general network-based mathematical model to identify risk factors and their interactions for disease development will be developed. (2) Fast and effective risk factor identification algorithms will be developed, which can be used to identify accurate synergistic factors. (3) The developed algorithms will be used for the large-scale studies, including TEDDY and DPT-1, to identify both genetic and environmental risk factors and their interactions with high predictive values for T1D development. We also will evaluate the performance of our algorithms in comparison with other traditional analysis for predicting the development and onset of T1D. Upon successful completion of this project, we expect that the developed algorithms will become a useful tool for biomedical data analysis with significant impacts on patient-oriented research for understanding the etiology, incidence, prevalence, natural history, and pathophysiology of T1D and other complex diseases. The proposed application will lay down the foundation and provide the direction for exploratory research on the re-use and analysis of existing data sets and the development of novel hypothesis and experiment design. 1

Public Health Relevance

This project aims to develop novel feature selection methods that can be practically employed in analyzing the existing large-scale studies, including The Environmental Determinants of Diabetes in the Young (TEDDY) and the Diabetes Pre- vention Trial-Type 1 (DPT-1) studies to aid in the identification and analysis of genetic as well as environmental risks, including demographic, dietary, immunologic, and metabolic markers, and their interactions for predicting progression to Type-1 Diabetes (T1D). Successful completion of the project will result in efficient feature selection algorithms for iden- tifying risk factors and understanding their interactions for the development of T1D, which will enable a paradigm shift from traditional hypothesis-driven analysis to data-driven analysis to generate working hypotheses to elucidate the etiology, incidence, prevalence, and pathophysiology of T1D and to design better screening strategies for early disease prediction and prevention. Although focusing on the existing TEDDY and DPT-1 studies in this project, the proposed feature selection methods are general and suitable for the analysis of large-scale studies for the identification of risk factors and their inter- actions for other complex diseases, which will serve as a solid foundation for our future data-driven projects to combine genomic, environmental, and clinical measurements from multiple studies maintained at the Pediatric Epidemiology Center (PEC) at the University of South Florida (USF), including, for example, DPT-1, TEDDY, TrialNET, TRIGR (Trial to Reduce Insulin-dependent diabetes mellitus in the Genetically at Risk), and RDCRN (Rare Diseases Clinical Research Network). 1

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK)
Type: Exploratory/Developmental Grants (R21)
Project #: 5R21DK092845-02
Application #: 8300133
Study Section: Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer: Sechi, Salvatore

Project Start: 2011-07-12
Project End: 2013-07-31
Budget Start: 2012-06-01
Budget End: 2013-07-31
Support Year: 2
Fiscal Year: 2012
Total Cost: $100,322
Indirect Cost: $24,338

Institution

Name: University of South Florida
Department: Biostatistics & Other Math Sci
Type: Schools of Engineering
DUNS #: 069687242

City: Tampa
State: FL
Country: United States
Zip Code: 33612

Related projects


NIH 2012 R21 DK	Identifying Risk Factors and Interactions for Type 1 Diabetes in Large Studies Qian, Xiaoning / University of South Florida	$100,322
NIH 2012 R21 DK	Identifying Risk Factors and Interactions for Type 1 Diabetes in Large Studies Qian, Xiaoning / Texas Engineering Experiment Station	$101,290
NIH 2011 R21 DK	Identifying Risk Factors and Interactions for Type 1 Diabetes in Large Studies Qian, Xiaoning / University of South Florida	$173,613

Publications

Xu, Easton Li; Qian, Xiaoning; Yu, Qilian et al. (2018) Feature selection with interactions in logistic regression models using multivariate synergies for a GWAS application. BMC Genomics 19:170

Adl, Amin Ahmadi; Lee, Hye-Seung; Qian, Xiaoning (2017) Detecting Pairwise Interactive Effects of Continuous Random Variables for Biomarker Identification with Small Sample Size. IEEE/ACM Trans Comput Biol Bioinform 14:1265-1275

Wang, Yijie; Qian, Xiaoning (2016) Stochastic block coordinate Frank-Wolfe algorithm for large-scale biological network alignment. EURASIP J Bioinform Syst Biol 2016:9

Qian, Xiaoning; Dougherty, Edward R (2016) Bayesian Regression with Network Prior: Optimal Bayesian Filtering Perspective. IEEE Trans Signal Process 64:6243-6253

Zamani Dadaneh, Siamak; Qian, Xiaoning (2016) Bayesian module identification from multiple noisy networks. EURASIP J Bioinform Syst Biol 2016:5

Lu, Meng; Huang, Jianhua Z; Qian, Xiaoning (2016) Sparse Exponential Family Principal Component Analysis. Pattern Recognit 60:681-691

Lu, Meng; Lee, Hye-Seung; Hadley, David et al. (2014) Supervised categorical principal component analysis for genome-wide association analyses. BMC Genomics 15 Suppl 1:S10

Wang, Yijie; Qian, Xiaoning (2014) Joint clustering of protein interaction networks through Markov random walk. BMC Syst Biol 8 Suppl 1:S9

Lu, Meng; Lee, Hye-Seung; Hadley, David et al. (2014) Logistic Principal Component Analysis for Rare Variants in Gene-Environment Interaction Analysis. IEEE/ACM Trans Comput Biol Bioinform 11:1020-8

Sajjadi, Seyed Javad; Qian, Xiaoning; Zeng, Bo et al. (2014) Network-Based Methods to Identify Highly Discriminating Subsets of Biomarkers. IEEE/ACM Trans Comput Biol Bioinform 11:1029-37

Showing the most recent 10 out of 13 publications

Comments

Be the first to comment on Xiaoning Qian's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: