As a step toward understanding the complex differences between normal and cancer cells, much research has been devoted to analyses of genes that are differentially expressed in particular cells. Though recent technological advances have made it possible to conduct serial and/or simultaneous analysis of the expression patterns of thousands of genes, no comprehensive study has been reported on how many genes are expressed differentially and whether most differences are cell line-specific. The long- term goal of this research is to develop intelligent data mapping and visual explanation technologies to improve information exploration and interpretation from high-throughput gene expression profiles for molecular analysis of cancer. Suggested by preliminary evidence from mRNA profiles of breast/prostate cancer cells that transcriptome patterns are rich in information about mechanisms that underlie cancer development, in the R21 research, multidisciplinary knowledge of molecular biology and computational intelligence are applied to (1) design cost effective molecular experiments to establish gene transcriptome distributions across cell lines, (2) pilot test the existence of transcriptome clusters in the molecular species space that correlate to cell phenotypes, and (3) identify key biomarkers that differentiate different cell lines with the highest prediction values. Since new knowledge can only be further acquired by exploring all of the interesting aspects of complex transcriptome data in high-dimensional space, in this R33 application a statistically principled hierarchical visual exploration technique is proposed to effectively reveal and interpret the intrinsic but hidden characteristics of transcriptome clusters that should better define the nature of cancer biology and therapeutic targets. A novel integration of information theory and computer graphics will permit (1) an automatic identification and modeling of biomarker clusters, (2) a probabilistic component analysis to form hierarchical visualization spaces allowing the complete data set to be analyzed at the top level with best separated sub-clusters analyzed at deeper levels, and (3) an interactive intelligent interface for task/hypothesis driven data mining and decision making. The innovative nature of the research relies on the concept of combining (1) a hybrid stepwise nonlinear discriminant analysis for biomarker identification and (2) a hierarchical visual exploration of multi-foci high-dimensional transcriptome distribution to interpret the complex relationships between molecular events and cell phenotypes.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Exploratory/Developmental Grants (R21)
Project #
5R21CA083231-02
Application #
6174109
Study Section
Special Emphasis Panel (ZCA1-SRRB-C (M3))
Program Officer
Gallahan, Daniel L
Project Start
1999-07-01
Project End
2001-09-29
Budget Start
2000-08-18
Budget End
2001-09-29
Support Year
2
Fiscal Year
2000
Total Cost
$147,648
Indirect Cost
Name
Catholic University of America
Department
Engineering (All Types)
Type
Schools of Engineering
DUNS #
City
Washington
State
DC
Country
United States
Zip Code
20064
Wang, Yue; Lu, Jianping; Lee, Richard et al. (2002) Iterative normalization of cDNA microarray data. IEEE Trans Inf Technol Biomed 6:29-37