Gene Set Enrichment Analysis (GSEA), which we introduced in 2003, is now standard practice for analyzing genome-wide expression data. GSEA derives its power from identifying the activation/repression of sets of genes that share common biological function, chromosomal location or regulation and differentiate biological phenotypes or cellular states. This knowledge-based approach is effective in elucidating underlying biological mechanisms and generating hypotheses for further study and experimental validation. Since 2005, we have developed, distributed and supported a freely available version of the GSEA software along with a database of annotated gene sets - the Molecular Signatures Database (MSigDB). This popular resource has over 29,300 registered users and 3,100 citations in the literature, and the MSigDB has over 6,700 fully annotated sets. The goal of this renewal is to add significant value to the GSEA/MSigDB resource while maintaining the same level of professional software quality and strong support that investigators have come to expect. We plan to expand the GSEA method embodied in the software, develop additional tools for output interpretation, and refine and enrich the MSigDB database to further accelerate the pace of genomic research.
Our specific aims are:
Aim 1 : Extend the power of the GSEA software from comparing groups of samples to assessing activation of biological processes and pathways in individual samples.
Aim 2 : Provide enhanced visualization and tools for interpretation of GSEA results.
Aim 3 : Extend the scope and specificity of the Molecular Signatures Database (MSigDB).
Aim 3. 1: We will create a new """"""""hallmark"""""""" MSigDB collection, capturing major biological processes and generated from consolidation and refinement of existing gene sets, to reduce redundancy and increase the """"""""specificity"""""""" of enrichment results.
Aim 3. 2: We will add important new collections to MSigDB capturing a wide range of biological conditions and targeted cellular perturbations.
Aim 4 : Support and maintain the Molecular Signatures Database (MSigDB) and Gene Set Enrichment Analysis (GSEA) software. Our progress over the previous funding period;our extensive experience in developing computational methods for genomics research and delivering them as user-friendly, high quality software;our significant user base and many citations;our large repository of gene sets;and our successful delivery of documentation and training for users make us well poised to carry out the aims of this proposal.

Public Health Relevance

Relevance Gene set-based enrichment analysis is now standard practice for interpreting genome-wide expression data and elucidating the biological mechanisms associated with disease or other cellular states. The combination of the Gene Set Enrichment Analysis software and Molecular Signatures Database of gene sets representing biological processes, pathways, phenotypes, and cellular perturbations make these sophisticated knowledge- based analyses accessible to any biomedical researcher. The work in this project will significantly increase the power and value of both the analysis software and the gene set collection to better derive hypotheses for further investigation and validation and thereby accelerate and facilitate the study of important questions in biomedical research.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Li, Jerry
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Broad Institute, Inc.
United States
Zip Code
Archer, Tenley C; Ehrenberger, Tobias; Mundt, Filip et al. (2018) Proteomics, Post-translational Modifications, and Integrative Analyses Reveal Molecular Heterogeneity within Medulloblastoma Subgroups. Cancer Cell 34:396-410.e8
Huang, Justin K; Carlin, Daniel E; Yu, Michael Ku et al. (2018) Systematic Evaluation of Molecular Networks for Discovery of Disease Genes. Cell Syst 6:484-495.e5
Milne, Roger L (see original citation for additional authors) (2017) Identification of ten variants associated with risk of estrogen-receptor-negative breast cancer. Nat Genet 49:1767-1778
Huang, Franklin W; Mosquera, Juan Miguel; Garofalo, Andrea et al. (2017) Exome Sequencing of African-American Prostate Cancer Reveals Loss-of-Function ERF Mutations. Cancer Discov 7:973-983
Silterra, Jacob; Gillette, Michael A; Lanaspa, Miguel et al. (2017) Transcriptional Categorization of the Etiology of Pneumonia Syndrome in Pediatric Patients in Malaria-Endemic Areas. J Infect Dis 215:312-320
Viswanathan, Vasanthi S; Ryan, Matthew J; Dhruv, Harshil D et al. (2017) Dependency of a therapy-resistant state of cancer cells on a lipid peroxidase pathway. Nature 547:453-457
Boulay, Gaylor; Awad, Mary E; Riggi, Nicolo et al. (2017) OTX2 Activity at Distal Regulatory Elements Shapes the Chromatin Landscape of Group 3 Medulloblastoma. Cancer Discov 7:288-301
Michailidou, Kyriaki (see original citation for additional authors) (2017) Association analysis identifies 65 new breast cancer risk loci. Nature 551:92-94
Kim, Jong Wook; Abudayyeh, Omar O; Yeerna, Huwate et al. (2017) Decomposing Oncogenic Transcriptional Signatures to Generate Maps of Divergent Cellular States. Cell Syst 5:105-118.e9
Hachigian, Lea J; Carmona, Vitor; Fenster, Robert J et al. (2017) Control of Huntington's Disease-Associated Phenotypes by the Striatum-Enriched Transcription Factor Foxp2. Cell Rep 21:2688-2695

Showing the most recent 10 out of 65 publications