We aim to develop statistical and computational methods for integrative analysis of genomic data sets. While analytical methods for interpretation of a single data set have been extensively studied, practical and statistically sound tools for combined analysis of multiple data sets are not readily available. This has resulted in a serious under-utilization of the large amount of data stored in public databases and in missed opportunities for insights that can be gleaned from common features of multiple data sets. The three aims of this proposal address important challenges in this area. The first is to develop a statistically rigorous and efficient algorithm and a database system that can be used to search for data sets with similar molecular signature across multiple platforms and organisms. Meta-analysis tools will also be implemented to identify common patterns across the data sets identified.
The second aim i s to develop statistical methods and tools to compare and contrast multiple data sets at the level of pathways to facilitate better uderstanding of the biological processes hidden in the data. Methodological challenges include resolving the complex patterns of overlaps and hierarchical relationships in pathway ontologies and implementation of visualization tools.
The third aim i s to incorporate other types of data from public repositories with gene expression to better understand gene regulation. In particular, computational framework for integrating copy number variation data with gene expression data will be studied. In all aims, particular attention will be paid to proper estimation of statistical significance and power for the results obtained and to the development of user-friendly tools. With respect to public health, the proposed work will help physicians and scientists to analyze their genomic data in combination with the data that others have already generated. This will reduce the amount of wasted time, effort, and funds for producing new data when similar data sets are already available. The tools developed will also allow investigators to see unexpected connections among diseases at the molecular level and thus contribute to the development of treatments.
Showing the most recent 10 out of 31 publications