High-throughput screening (HTS) has become increasingly popular in studies that aim to reveal the interaction of particular biochemical processes in biology. In typical HTS experiments, such as gene expression microarray and genome-wide RNAi screening, the first stage in the analysis is often to identify genes with certain characteristics (e.g., genes that are differentially expressed). To gain more insights into the underlying biology, the next stage is to conduct over- representation analysis (ORA), which investigates whether gene sets associated with particular biological functions are statistically over-represented in the identified group of genes. ORA is based on the postulate that genes involved in the same biological process would be coordinately expressed. However, the traditional ORA, which is usually based on the hypergeometic P-value, analyzes individual gene sets separately and does not take into account the interrelationship among gene sets which share similar biological functions. In this project, we incorporate gene ontology and pathway knowledge in ORA. The proposed method can borrow information across related gene sets to strengthen the detection of over- representation signals. It is capable of providing more reliable and meaningful biological insights than the traditional ORA. We consider two typical dependence structures among biological functions: the hierarchical gene ontology structure and the interconnected pathway structure. Each dependence structure is incorporated in a Bayesian ORA model via a hierarchical prior. The Bayesian model provides a flexible framework which allows easy extensions to achieve various goals, such as accommodating either binary indictor or continuous test statistic on differential gene expression, evaluating the reliability of different types of evidence supporting gene ontology annotations, and incorporating a different dependence structure. In addition, we propose an integrated approach to conduct microarray analysis and ORA simultaneously. The information of the gene ontology (or pathway) structure is utilized not only in ORA but also in microarray analysis, which reduces the randomness in microarray analysis and further improves the results in ORA.

Public Health Relevance

High-throughput screening (HTS), such as gene expression microarray and genome-wide RNAi screening, has become an increasingly indispensable tool in biological and medical research. However, the ever expanding knowledge of the functional characteristics of genes (such as gene ontology and pathways) has not been fully explored in HTS data analysis. The goal of this project is to incorporate gene ontology and pathway knowledge in HTS data analysis to provide more reliable and meaningful biological insights into the interpretation of HTS results.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Academic Research Enhancement Awards (AREA) (R15)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Bonazzi, Vivien
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Southern Methodist University
Biostatistics & Other Math Sci
Schools of Arts and Sciences
United States
Zip Code