ProjectAbstract Gene Set Enrichment Analysis (GSEA) introduced in 2003, is now standard practice for analyzing genome- wide expression data. GSEA derives its power from identifying the activation/repression of sets of genes that share common biological function, chromosomal location or regulation and differentiate biological phenotypes orcellularstates.Thisknowledge-basedapproachiseffectiveinelucidatingunderlyingbiologicalmechanisms and generating hypotheses for further study and experimental validation. Since 2005, we have developed, distributed and supported a freely available GSEA software application along with a database of annotated gene sets ? the Molecular Signatures Database (MSigDB). This popular resource has more than 113,000 registeredusersandover10,200citationsintheliterature,andtheMSigDBhasalmost18,000annotatedsets. ThegoalofthisproposalistocontinuetoevolveandaddvaluetotheGSEA/MSigDBresourcetobestaddress theneedsofthecancerresearchcommunity,whilemaintainingthehighlevelofprofessionalqualityandstrong support that investigators have come to expect. We plan to increase the power and sensitivity of the GSEA methodandenrichtheMSigDBtofurtheracceleratethepaceofgenomicresearch.
Our specificaims are:
Aim1 :DevelopanddeploythenextgenerationoftheGSEAmethodandsoftwaretokeeppacewith the needs of the cancer research community. The new core algorithm will be based on information- theoretic approaches, guided by a collection of 100 relevant benchmarks and informed by an Advisory Board of established cancer researchers. To facilitate the use of GSEA by researchers at all levels of computational sophistication, we will distribute the GSEA analysis tools as both an open source code libraryandasuiteofuserfriendly,reproducible,interactive,electronicnotebooks.
Aim 2 : Extend the scope and specificity of the MSigDB, and evolve the underlying technology. In collaboration with the community, we will add valuable new collections to MSigDB including signatures of drugresponsesandgeneticperturbations,setsforusewithmousemodelsofcancerandPDXs,setsfrom pathway and network databases, and sets for use with proteomic data analysis. The MSigDB will be redesignedfromitscurrentXMLfileformatanddeployedasalightweight,portablerelationaldatabasethat canbettersupportitsgrowingsize,onlineexplorationtools,andusebyinvestigatorsandothersoftware.
Aim 3 : Provide training and outreach activities for the cancer research community, and maintain andsupporttheGSEAsoftwareandMSigDB. ThesuccessandpopularityoftheGSEA/MSigDBresourceoverthepastdecade;?ourextensiveexperiencein developing computational methods for genomics research and delivering them as user-friendly, high quality software;? our significant user base and many citations;? our large repository of gene sets;? and our successful deliveryofdocumentationandtrainingforusersmakeuswellpoisedtocarryouttheaimsofthisproposal.
Gene set-based enrichment analysis is now standard practice for interpreting genome-wide expression data and elucidating the biological mechanisms associated with disease or other cellular states. The combination of the Gene Set Enrichment Analysis software and Molecular Signatures Database of gene sets representing biological processes, pathways, phenotypes, and cellular perturbations make these sophisticated knowledge- basedanalysesaccessibletoanybiomedicalresearcher.Theworkinthisprojectwillsignificantlyincreasethe power and value of both the analysis software and the gene set collection to better derive hypotheses for further investigation and validation and thereby accelerate and facilitate the study of important questions in biomedicalandcancerresearch.