The increasing use of genome-wide gene expression profiling has generated great valuable data that offer cost-effective secondary opportunities to investigate additional research questions that were not included in the original intended purpose. Our goal is to develop systematic approaches on a firm statistical footing to conduct a secondary analysis of the existing microarray expression databases. We will lay emphasis on the consistency between the biological background and the statistical modeling in the developments. Such consistency is critical for enhancing the biological efficiency of the developed analysis tools. The retina is a relatively simple and well-characterized area of the central nervous system. Currently, over 200 genes were identified that cause retinal diseases. We will apply the developed methods to retinal microarray expression databases to identify novel genes and gene-gene relationships (pathways) that govern the normal and pathological processes of the retina. We will also explore the possibility of making eye disease predictions through a public database search and comparison. We propose the following specific studies: 1) to develop novel analytical/statistical methods for detecting the genes involved in a biological pathway. We plan to design a statistical strategy that incorporates partial correlation as a core component in this application;2) to take the first step of turning microarray repositories into a disease diagnosis database. We plan to develop a Bayesian probabilistic method to infer the disease condition of a query microarray data set based on its similarity to those well-characterized data in database;3) to experimentally validate a subset of in silicon predictions. We will verify the expression of newly identified genes from the first study using standard methods such as real-time PCR and Western blot;4) to expand our existing software package Gene Expression Analyzer (GEA) (http://cell.rutgers.edu/gea/) to include the newly developed methods. The source code will be made public. The outcome of the project will significantly facilitate the reuse of the vast amount of public datasets to answer additional research questions, reduce the necessity to generate new data, and improve our understanding of cellular functions and networks under a variety of perturbations.

Public Health Relevance

The proposed study is to develop systematic approaches on a firm statistical footing to conduct a secondary analysis of the public microarray databases. The developed methods will be particularly applied to the retinal expression data to gain additional insights on the mechanisms that govern the normal and pathological processes of the eye. The results may translate into effective diagnostic, therapeutic and preventive strategies for retinal diseases.

Agency
National Institute of Health (NIH)
Institute
National Eye Institute (NEI)
Type
Exploratory/Developmental Grants (R21)
Project #
5R21EY019094-02
Application #
7802826
Study Section
Special Emphasis Panel (ZEY1-VSN (01))
Program Officer
Chin, Hemin R
Project Start
2009-05-01
Project End
2012-04-30
Budget Start
2010-05-01
Budget End
2012-04-30
Support Year
2
Fiscal Year
2010
Total Cost
$190,226
Indirect Cost
Name
University of California Berkeley
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
124726725
City
Berkeley
State
CA
Country
United States
Zip Code
94704
Chapman, Matthew R; Balakrishnan, Karthik R; Li, Ju et al. (2013) Sorting single satellite cells from individual myofibers reveals heterogeneity in cell-surface markers and myogenic capacity. Integr Biol (Camb) 5:692-702
Gao, Qinghui; Ho, Christine; Jia, Yingmin et al. (2012) Biclustering of linear patterns in gene expression data. J Comput Biol 19:619-31
Kim, Kyungpil; Jiang, Keni; Teng, Siew Leng et al. (2012) Using biologically interrelated experiments to identify pathway genes in Arabidopsis. Bioinformatics 28:815-22
Li, Jingyi Jessica; Jiang, Ci-Ren; Brown, James B et al. (2011) Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation. Proc Natl Acad Sci U S A 108:19867-72
Huang, Haiyan; Liu, Chun-Chi; Zhou, Xianghong Jasmine (2010) Bayesian approach to transforming public gene expression repositories into disease diagnosis databases. Proc Natl Acad Sci U S A 107:6823-8