Prediction and Network Construction Using High-throughput Data

Yeung-Rhee, Ka

Abstract

Biomarker identification is becoming an important use for high-throughput technologies like microarrays and mass spectrometry. These high-throughput data (especially microarray data) are used extensively for tissue type classification, including various tumor types, patient survival time prediction, time to relapse, and other clinically relevant temporal quantities. These high-throughput data measure the activity levels of thousands of potential predictors (genes in the case of gene expression data and peptides in the case of mass spectrometry or protein microarray data). The analysis of these data poses difficult statistical problems since the number of features measured is far larger than the number of tissue samples that are typically available. Moreover, many different sets of predictors produce similar prediction accuracies. Here, we propose to incorporate biological knowledge into a supervised framework to identify biologically meaningful predictors for classification and survival analysis. Towards this end, we will develop Bayesian Model Averaging (BMA) methods to produce simple, reliable, robust, and interpretable predictions. BMA also provides a probabilistic multivariate feature selection method. As part of this effort, we will extend the recently developed latent position cluster model for social networks to infer biological networks and identify network modules. Network properties (e.g., modules and the degree of connectivities) confer biological meanings. Hence, we will integrate network properties in a supervised framework to identify biologically meaningful predictors. We will extend the BMA methods to determine predictive network modules and pre-defined gene categories (e.g. GO categories, KEGG pathways). This proposal has two main computational thrusts: (1) the development of BMA methods for multi-class classification and survival analysis (Aim 1);and (2) the development of latent position cluster model for inferring biological networks and identifying network modules (Aim 3). These two computational thrusts are unified in Aim 2 in which we use network modules and properties in the supervised BMA framework.
In Aim 4, we will generate expression perturbation data to evaluate our network construction methods. Finally, we will make the software and data generated publicly available. The methods developed in this proposal are generally applicable to many high-throughput data types. However, since we will generate expression perturbation data to validate and refine the constructed expression networks, we will focus on applying our developed methods to gene expression data.

Public Health Relevance

Biomarker identification is becoming an important use for high-throughput technologies like microarrays. This proposal aims to identify biologically meaningful predictive biomarkers for tissue type classification, including various tumor types, patient survival time prediction, time to relapse, and other clinically relevant temporal quantities. This project could lead to inexpensive, accurate and robust diagnostic tests that increase the accuracy of diagnoses or prognoses for patients with cancer or other diseases.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 5R01GM084163-02
Application #: 7681282
Study Section: Biodata Management and Analysis Study Section (BDMA)
Program Officer: Remington, Karin A

Project Start: 2008-09-01
Project End: 2013-06-30
Budget Start: 2009-07-01
Budget End: 2010-06-30
Support Year: 2
Fiscal Year: 2009
Total Cost: $453,958
Indirect Cost

Institution

Name: University of Washington
Department: Microbiology/Immun/Virology
Type: Schools of Medicine
DUNS #: 605799469

City: Seattle
State: WA
Country: United States
Zip Code: 98195

Related projects


NIH 2012 R01 GM	Prediction and Network Construction Using High-throughput Data Yeung-Rhee, Ka Yee / University of Washington	$457,112
NIH 2012 R01 GM	Prediction and Network Construction Using High-throughput Data Yeung-Rhee, Ka Yee / University of Washington	$136,976
NIH 2011 R01 GM	Prediction and Network Construction Using High-throughput Data Yeung-Rhee, Ka Yee / University of Washington	$457,966
NIH 2010 R01 GM	Prediction and Network Construction Using High-throughput Data Yeung-Rhee, Ka Yee / University of Washington	$462,742
NIH 2009 R01 GM	Prediction and Network Construction Using High-throughput Data Yeung-Rhee, Ka Yee / University of Washington	$453,958
NIH 2009 R01 GM	Prediction and Network Construction Using High-throughput Data Yeung-Rhee, Ka Yee / University of Washington	$299,936
NIH 2008 R01 GM	Prediction and Network Construction Using High-throughput Data Yeung-Rhee, Ka Yee / University of Washington	$369,349

Publications

Fraley, Chris; Percival, Daniel (2015) Model-Averaged [Formula: see text] Regularization using Markov Chain Monte Carlo Model Composition. J Stat Comput Simul 85:1090-1101

Fronczuk, Maciej; Raftery, Adrian E; Yeung, Ka Yee (2015) CyNetworkBMA: a Cytoscape app for inferring gene regulatory networks. Source Code Biol Med 10:11

Young, William Chad; Raftery, Adrian E; Yeung, Ka Yee (2014) Fast Bayesian inference for gene regulatory networks using ScanBMA. BMC Syst Biol 8:47

Lenkoski, Alex; Eicher, Theo S; Raftery, Adrian E (2014) Two-Stage Bayesian Model Averaging in Endogenous Variable Models. Econom Rev 33:

McCormick, Tyler H; Raftery, Adrian E; Madigan, David et al. (2012) Dynamic logistic regression and dynamic model averaging for binary classification. Biometrics 68:23-30

Yeung, K Y; Gooley, T A; Zhang, A et al. (2012) Predicting relapse prior to transplantation in chronic myeloid leukemia by integrating expert knowledge and expression data. Bioinformatics 28:823-30

Raftery, Adrian E; Niu, Xiaoyue; Hoff, Peter D et al. (2012) Fast Inference for the Latent Space Network Model Using a Case-Control Approximate Likelihood. J Comput Graph Stat 21:901-919

Lo, Kenneth; Raftery, Adrian E; Dombek, Kenneth M et al. (2012) Integrating external biological knowledge in the construction of regulatory networks from time-series expression data. BMC Syst Biol 6:101

Yeung, Ka Yee; Dombek, Kenneth M; Lo, Kenneth et al. (2011) Construction of regulatory networks using expression time-series data of a genotyped population. Proc Natl Acad Sci U S A 108:19436-41

Steele, Russell J; Wang, Naisyin; Raftery, Adrian E (2010) Inference from Multiple Imputation for Missing Data Using Mixtures of Normals. Stat Methodol 7:351-364

Showing the most recent 10 out of 13 publications

Comments

Be the first to comment on Ka Yeung-Rhee's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: