Over the past decade, the landscape of cancer research has changed with the explosion of publicly available and investigator generated datasets, and the rapidly growing number of sophisticated computational methods and tools to integrate and analyze them. There are continuing challenges to the research community as it seeks to harness this wealth of data and analysis tools to move the cancer research agenda forward. The entire cancer research community needs a way to easily collaborate on, document, capture, and share their work, from conception through analysis to publication. Moreover, cancer biologists may have difficulty choosing the right tools and using them correctly, effectively putting this powerful capability out of their direct reach. The goal of thisU24 proposal is to use the GenePattern computational genomics platform, which has served the cancer community since 2004, as the foundation for a new electronic notebook environment to meet these needs. Through these efforts we will support a diverse community of users at the forefront of cancer research who seek to better understand the underlying mechanisms of disease, translate improved methods for patient diagnosis and prognosis to the clinic, and identify new drug targets.
Aim 1. Develop a GenePattern electronic notebook for collaborative in silico research. Leveraging a novel blend of GenePattern, Google Drive/Docs, and the IPython platform, we will develop an environment for creating and deploying electronic notebooks to support the entirety of ongoing collaborative studies, including running analyses, presenting results, recording comments and interpretation of results, and capturing the reproducible computational workflow.
Aim 2. Create a collection of GenePattern notebooks for cancer research. We will formulate and deploy dynamic GenePattern notebooks embodying complete analysis studies based on driving cancer projects, to guide investigators through relevant considerations at each analysis execution step to choices best supporting their research goals.
Aim 3. Add GenePattern modules to address cancer complexity. We will add new modules as required for the notebook collection in Aim 2, including new information-theoretic approaches to identifying biomarkers, clustering, classification, and dimension reduction.
Aim 4. Provide training and GenePattern Notebook support for the cancer research community. We will provide a high level of support for the notebook environment; develop cancer focused training materials featuring notebooks based on driving cancer projects; deploy a public GenePattern server on the high- performance computing infrastructure at the Pittsburgh Supercomputing Center.
GenePattern is a popular bioinformatics software environment that puts sophisticated computational methods within the reach of all biomedical researchers to address a variety of problems at the forefront of cancer research, including patient diagnosis and prognosis, identification of new drug targets, and understanding disease mechanisms. The work in this project will build on GenePattern's foundation to provide GenePattern Notebook, a beginning-to-end computational electronic lab notebook environment for combining analysis and text. We will also create notebooks to capture and share with cancer investigators scientist-oriented cancer analysis scenarios and tasks for use in their own studies.
Showing the most recent 10 out of 20 publications