We aim to develop statistical and computational methods for integrative analysis of genomic data sets. While analytical methods for interpretation of a single data set have been extensively studied, practical and statistically sound tools for combined analysis of multiple data sets are not readily available. This has resulted in a serious under-utilization of the large amount of data stored in public databases and in missed opportunities for insights that can be gleaned from common features of multiple data sets. The three aims of this proposal address important challenges in this area. The first is to develop a statistically rigorous and efficient algorithm and a database system that can be used to search for data sets with similar molecular signature across multiple platforms and organisms. Meta-analysis tools will also be implemented to identify common patterns across the data sets identified.
The second aim i s to develop statistical methods and tools to compare and contrast multiple data sets at the level of pathways to facilitate better uderstanding of the biological processes hidden in the data. Methodological challenges include resolving the complex patterns of overlaps and hierarchical relationships in pathway ontologies and implementation of visualization tools.
The third aim i s to incorporate other types of data from public repositories with gene expression to better understand gene regulation. In particular, computational framework for integrating copy number variation data with gene expression data will be studied. In all aims, particular attention will be paid to proper estimation of statistical significance and power for the results obtained and to the development of user-friendly tools. With respect to public health, the proposed work will help physicians and scientists to analyze their genomic data in combination with the data that others have already generated. This will reduce the amount of wasted time, effort, and funds for producing new data when similar data sets are already available. The tools developed will also allow investigators to see unexpected connections among diseases at the molecular level and thus contribute to the development of treatments.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Remington, Karin A
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Harvard University
Schools of Medicine
United States
Zip Code
Mieczkowski, Jakub; Cook, April; Bowman, Sarah K et al. (2016) MNase titration reveals differences between nucleosome occupancy and chromatin accessibility. Nat Commun 7:11485
Park, Richard W; Kim, Tae-Min; Kasif, Simon et al. (2015) Identification of rare germline copy number variations over-represented in five human cancer types. Mol Cancer 14:25
Jung, Youngsook L; Luquette, Lovelace J; Ho, Joshua W K et al. (2014) Impact of sequencing depth in ChIP-seq experiments. Nucleic Acids Res 42:e74
West, Jason A; Cook, April; Alver, Burak H et al. (2014) Nucleosomal occupancy changes locally over key regulatory regions during cell differentiation and reprogramming. Nat Commun 5:4719
Kim, Tae-Min; Xi, Ruibin; Luquette, Lovelace J et al. (2013) Functional genomic analysis of chromosomal aberrations in a compendium of 8000 cancer genomes. Genome Res 23:217-27
Kim, Tae-Min; Laird, Peter W; Park, Peter J (2013) The landscape of microsatellite instability in colorectal and endometrial cancer genomes. Cell 155:858-68
Tolstorukov, Michael Y; Sansam, Courtney G; Lu, Ping et al. (2013) Swi/Snf chromatin remodeling/tumor suppressor complex establishes nucleosome occupancy at target promoters. Proc Natl Acad Sci U S A 110:10165-70
Woo, Caroline J; Kharchenko, Peter V; Daheron, Laurence et al. (2013) Variable requirements for DNA-binding proteins at polycomb-dependent repressive regions in human HOX clusters. Mol Cell Biol 33:3274-85
Tolstorukov, Michael Y; Goldman, Joseph A; Gilbert, Cristele et al. (2012) Histone variant H2A.Bbd is associated with active transcription and mRNA processing in human cells. Mol Cell 47:596-607
Lee, Eunjung; Iskow, Rebecca; Yang, Lixing et al. (2012) Landscape of somatic retrotransposition in human cancers. Science 337:967-71

Showing the most recent 10 out of 31 publications