Bayesian Approaches to Model Selection for Survival Data

Ibrahim, Joseph

Abstract

In this proposal, we develop Bayesian methodology for high dimensional genomic data. The overarching theme in this proposal is that we develop several novel statistical methods for motif discovery in genomic sequence data. Chromatin Immunoprecipitation microarray (ChIP-chip) data allows the direct identification of transcription factor binding sites that are active in particular biological states. Jointly modeling array intensities and DNA sequence will lead to more accurate estimation of binding sites. We develop these joint models to account for multiple motifs and varied relationships between binding sites and array intensities. We also propose a novel joint model framework for direct estimation of a motif using gene expression and the DNA sequence that bypasses computationally expensive motif selection procedures. Chromatin structure, in the form of positioning of nucleosomes in DNA, has long been known to play a huge role in protein-DNA binding, however, a quantitative assessment of this role has not been available until very recently. Taking advantage of the increasing availability of accurate experimental data assessing chromatin features, we propose a novel Bayesian statistical model framework for improving motif detection through integration of nucleosome positioning and genomic sequence data. Alternative splicing of mRNA greatly expands the functional repertoire of many genes in the mammalian genome by including or excluding the exons making up the genetic coding sequence. Standard gene expression arrays fail to capture the variability of the exon composition of mRNA species, but rather give a crude measure of overall gene expression. We propose a method that detects over-representation of specific splice junctions in different biological states while adjusting for overall gene expression. The advent of high-throughput genomic technologies has ushered in a new data-driven era, allowing the ability to measure biological activity on a genome-wide scale. Chromatin Immunoprecipitation (ChIP), histone modification, and FAIRE for example are procedures that benefited from this technology, allowing one to determine relative enrichment for their isolated fragments genome wide. The recent development of Next generation sequencing (NGS) platforms offers greater dynamic range, resolution, and genomic coverage in measuring relative enrichment of DNA fragments compared to microarrays. We develop classes of statistical mixture models based on the zero-inflated negative binomial distribution to model such count data and develop an R software package called Zero-Inflated Negative Binomial Algorithm (ZINBA) to carry out the peak calling for a given dataset. 1

Public Health Relevance

We develop Bayesian methodology for high dimensional genomic data. The overarching theme in this proposal is that we develop several novel statistical methods for motif discovery in genomic sequence data. The proposed methodology has major applications in chronic diseases such as cancer, AIDS, cardiovascular disease, and environmental health. We will develop new statistical methods for ChIP-chip data, integrating chormatin structure into motif discovery, joint modeling of gene expression and sequence data, alternative mRNA splicing, and analysis of next generation sequencing (NGS) data. 1

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 5R01GM070335-16
Application #: 8730668
Study Section: Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer: Brazhnik, Paul

Project Start: 1996-03-01
Project End: 2015-08-31
Budget Start: 2014-09-01
Budget End: 2015-08-31
Support Year: 16
Fiscal Year: 2014
Total Cost
Indirect Cost

Institution

Name: University of North Carolina Chapel Hill
Department: Biostatistics & Other Math Sci
Type: Schools of Public Health
DUNS #

City: Chapel Hill
State: NC
Country: United States
Zip Code: 27599

Related projects

Publications

Ankerst, Donna P; Goros, Martin; Tomlins, Scott A et al. (2018) Incorporation of Urinary Prostate Cancer Antigen 3 and TMPRSS2:ERG into Prostate Cancer Prevention Trial Risk Calculator. Eur Urol Focus :

Ibrahim, Joseph G; Kim, Sungduk; Chen, Ming-Hui et al. (2018) Bayesian multivariate skew meta-regression models for individual patient data. Stat Methods Med Res :962280218801147

Liu, Yanyan; Xiong, Sican; Sun, Wei et al. (2018) Joint Analysis of Strain and Parent-of-Origin Effects for Recombinant Inbred Intercrosses Generated from Multiparent Populations with the Collaborative Cross as an Example. G3 (Bethesda) 8:599-605

Chen, Kun; Mishra, Neha; Smyth, Joan et al. (2018) A Tailored Multivariate Mixture Model for Detecting Proteins of Concordant Change Among Virulent Strains of Clostridium Perfringens. J Am Stat Assoc 113:546-559

Sun, Wei; Bunn, Paul; Jin, Chong et al. (2018) The association between copy number aberration, DNA methylation and gene expression in tumor samples. Nucleic Acids Res 46:3009-3018

He, Qianchuan; Liu, Yang; Sun, Wei (2018) Statistical analysis of non-coding RNA data. Cancer Lett 417:161-167

Wu, Jing; de Castro, Mário; Schifano, Elizabeth D et al. (2018) Assessing covariate effects using Jeffreys-type prior in the Cox model in the presence of a monotone partial likelihood. J Stat Theory Pract 12:23-41

Wang, Chun; Chen, Ming-Hui; Wu, Jing et al. (2018) Online updating method with new variables for big data streams. Can J Stat 46:123-146

Gelfond, Jonathan; Goros, Martin; Hernandez, Brian et al. (2018) A System for an Accountable Data Analysis Process in R. R J 10:6-21

Li, Wenqing; Chen, Ming-Hui; Wangy, Xiaojing et al. (2018) Bayesian Design of Non-Inferiority Clinical Trials via the Bayes Factor. Stat Biosci 10:439-459

Showing the most recent 10 out of 136 publications

Comments

Be the first to comment on Joseph Ibrahim's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: