High-throughput screening (HTS) has become increasingly popular in studies that aim to reveal the interaction of particular biochemical processes in biology. In typical HTS experiments, such as gene expression microarray and genome-wide RNAi screening, the first stage in the analysis is often to identify genes with certain characteristics (e.g., genes that are differentially expressed). To gain more insights into the underlying biology, the next stage is to conduct over- representation analysis (ORA), which investigates whether gene sets associated with particular biological functions are statistically over-represented in the identified group of genes. ORA is based on the postulate that genes involved in the same biological process would be coordinately expressed. However, the traditional ORA, which is usually based on the hypergeometic P-value, analyzes individual gene sets separately and does not take into account the interrelationship among gene sets which share similar biological functions. In this project, we incorporate gene ontology and pathway knowledge in ORA. The proposed method can borrow information across related gene sets to strengthen the detection of over- representation signals. It is capable of providing more reliable and meaningful biological insights than the traditional ORA. We consider two typical dependence structures among biological functions: the hierarchical gene ontology structure and the interconnected pathway structure. Each dependence structure is incorporated in a Bayesian ORA model via a hierarchical prior. The Bayesian model provides a flexible framework which allows easy extensions to achieve various goals, such as accommodating either binary indictor or continuous test statistic on differential gene expression, evaluating the reliability of different types of evidence supporting gene ontology annotations, and incorporating a different dependence structure. In addition, we propose an integrated approach to conduct microarray analysis and ORA simultaneously. The information of the gene ontology (or pathway) structure is utilized not only in ORA but also in microarray analysis, which reduces the randomness in microarray analysis and further improves the results in ORA.

Public Health Relevance

High-throughput screening (HTS), such as gene expression microarray and genome-wide RNAi screening, has become an increasingly indispensable tool in biological and medical research. However, the ever expanding knowledge of the functional characteristics of genes (such as gene ontology and pathways) has not been fully explored in HTS data analysis. The goal of this project is to incorporate gene ontology and pathway knowledge in HTS data analysis to provide more reliable and meaningful biological insights into the interpretation of HTS results.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Academic Research Enhancement Awards (AREA) (R15)
Project #
1R15HG006365-01
Application #
8180414
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Bonazzi, Vivien
Project Start
2011-08-17
Project End
2014-07-31
Budget Start
2011-08-17
Budget End
2014-07-31
Support Year
1
Fiscal Year
2011
Total Cost
$277,097
Indirect Cost
Name
Southern Methodist University
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
001981133
City
Dallas
State
TX
Country
United States
Zip Code
75205