This grant proposes the development of an extendable, scalable automated data analysis pipeline for functional genomics data. Functional genomics, including microarrays and proteomics, is evolving quickly, with data sets increasing rapidly in size and new analysis methodologies appearing monthly. Because there are no de facto standards for addressing typical experimental questions, the application of multiple analyses is desirable, but rarely performed due to the effort required. Furthermore, the analysis of functional genomics data is generally a multi-step process, with many possible methods in use at each step (e.g., for image analysis, data normalization, statistical analysis, data mining), leading to a combinatorial explosion of effort when using multiple analyses. The functional genomics data pipeline proposed in this application will provide the ability to automatically perform multiple analyses, will provide easy extendibility for adding new functions and data types, will provide a distributed computing environment to provide adequate computational power, and will integrate automated annotation to allow analyses to be guided by biological knowledge. The system will utilize Enterprise Java Beans to provide a robust server architecture, Java server pages for dynamic generation of web interfaces, and object oriented design patterns to optimize the software architecture. The system will be extendable during operation through use of the Strategy design pattern coupled to the Java reflection mechanism. Functional genomics data sets will be encapsulated within data objects that include links to the NCI caBIO objects to utilize the NCI Center for Bioinformatics data resources. In addition, annotations will be retrievable from web sites and through the Distributed Annotation System. Documentation and testing will proceed in parallel with development, and will integrate end users during design and deployment to tune the user interface. The final system will provide dramatic improvements in researchers' abilities to fully explore their growing data sets and to interpret their experimental results in light of the larger biological knowledge bases. It will be fully supported and released to the community open source.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Exploratory/Developmental Grants (R21)
Project #
5R21LM008309-02
Application #
7008125
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
2005-02-01
Project End
2008-01-31
Budget Start
2006-02-01
Budget End
2008-01-31
Support Year
2
Fiscal Year
2006
Total Cost
$230,215
Indirect Cost
Name
Fox Chase Cancer Center
Department
Type
DUNS #
073724262
City
Philadelphia
State
PA
Country
United States
Zip Code
19111
Ochs, Michael F; Casagrande, John T (2008) Information systems for cancer research. Cancer Invest 26:1060-7
Kossenkov, Andrew V; Peterson, Aidan J; Ochs, Michael F (2007) Determining transcription factor activity from microarray data using Bayesian Markov chain Monte Carlo sampling. Stud Health Technol Inform 129:1250-4
Wang, Guoli; Kossenkov, Andrew V; Ochs, Michael F (2006) LS-NMF: a modified non-negative matrix factorization algorithm utilizing uncertainty estimates. BMC Bioinformatics 7:175
Bidaut, Ghislain; Suhre, Karsten; Claverie, Jean-Michel et al. (2006) Determination of strongly overlapping signaling activity from microarray data. BMC Bioinformatics 7:99