This project will develop an integrated desktop application to combine data from expression array, RNA transcript array, proteomics, SNP array (for polymorphism an analysis, as well as LOH and copy number determination), methylation array, histone modification array, promoter array, and microRNA array and metabolomics technologies. Current approaches to analysis of individual `omic' technologies suffer from problems of fragmentation, that present an incomplete view of the workings of the cell. However, effective integration into a single analytic platform is non-trivial. There is a need for a consistent approach, infrastructure, and interface between array types, to maximize ease of use, while recognizing and accommodating the specific computational and statistical requirements, and biological context, of each array. A central challenge is the need to create and work with lists of genomic regions of interest (GROIs) for each sample: we propose three novel approaches to aid in identification of GROIs. These lists must then be integrated with rectangular (sample by feature) data arrays to facilitate statistical analysis. Integration between array types occurs at the computational level, through a unified software package, statistically, through tools that seek statistical relationships between features from different arrays, biologically, through use of annotations (particularly gene ontology, protein- protein and protein-DNA interactions, and pathway membership) that document functional relationships between features, and through genomic interactions that suggest relationships between features that map to the same regions of the genome. The end product will support analysis of each platform separately, with a comprehensive suite of data management, statistical and heuristic analytic tools and the means to place findings of interest into a meaningful biological context through cross-reference to extensive biobases. Beyond that, a range of methods - statistical, biological and genomic - will be available to explore interactions and associations between platforms. PDF created with PDF Factory trial version www.pdffactory.com.

Public Health Relevance

While the large-scale array technologies have provided an unprecedented capability to model cellular processes, both in normal functioning and disease states, this capability is utterly dependent on the availability of complex data management, computational, statistical and informatic software tools. The utility of the next generation of arrays - which focus on critical regulation and control functions of the cell - will be stymied by an initial lack of suitable bioinformatic tools. This proposal initiates an accelerated development of an integrated software package intended to empower biologists in the application and analysis of these powerful new technologies, with broadly reaching impact at all levels of biological and clinical research, and across every discipline. ? ? ?

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Small Business Innovation Research Grants (SBIR) - Phase I (R43)
Project #
1R43HG004677-01A1
Application #
7538527
Study Section
Special Emphasis Panel (ZRG1-BST-E (10))
Program Officer
Bonazzi, Vivien
Project Start
2008-08-04
Project End
2010-07-31
Budget Start
2008-08-04
Budget End
2009-07-31
Support Year
1
Fiscal Year
2008
Total Cost
$157,474
Indirect Cost
Name
Epicenter Software
Department
Type
DUNS #
198000077
City
Pasadena
State
CA
Country
United States
Zip Code
91106