The CRISPR/Cas9 system is a revolutionary approach for genome editing of mammalian cells. Recent developments in CRISPR/Cas9 knockout technology as well as dCas9 fused with effector proteins enable high throughput cost-effective gene functional screens but create computational challenges. We have developed a Model-based Analysis of Genome-wide CRISPR/Cas9 Knockout (MAGeCK) method for calling genes and pathways from genome-scale CRISPR/Cas9 screens. MAGeCK demonstrates better performance than previous methods and identify robust hits, including novel ones, from several published screens. In this proposal, we aim to develop the statistical and computational methods to improve the MAGeCK algorithm to enable quality control, data analysis, and interactive visualizations of CRISPR screen data. In one unified statistical model, the proposed method corrects batch effect at gRNA level, simultaneously estimates gRNA efficiency and gene selection, and identifies differential gene and pathway selection over multiple conditions, and considers sequencing bias and cell doubling time. Specifically, we propose to:
Aim 1. Develop robust data normalization methods for CRISPR screens.
Aim 2. Develop the statistical and computational framework to call cell- and condition-specific essential genes and pathways from multiple CRISPR screen experiments and conditions.
Aim 3. Develop methods to mitigate outlier gRNA effects and use protein interaction network to enhance the performance of CRISPR screen gene calling.
Aim 4. Develop user-friendly software features, such as quality control, visualization, design and analysis software for CRISPR screens. At the conclusion of these studies, we will have developed more versatile and reliable analysis algorithms for CRISPR screens under diverse experimental settings. These methods could be applied to CRISPR knockout screens, CRISPRi/a screens, sequencing-based si/shRNA screens, and the phenotype could be cell growth, migration, differentiation, or sorting of GFP-labeled gene expression. Our proposed methods will greatly facilitate the technology adoption to many experimental biology groups, so they can use the powerful genome-wide CRISPR screens under diverse experimental settings to answer important biological questions about gene regulation and drug response.

Public Health Relevance

Genome-wide CRISPR screen is a novel and cost-effective technique to identify driver genes in a biological process of interest, but it creates computationa challenges to experimental biologists. We propose to develop the statistical and computational methods for quality control, data analysis and visualization of CRISPR screens to overcome these challenges. These methods will enable the research community to adopt the powerful genome-wide CRISPR screen technology for hypotheses generation and biomedical discovery.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
1R01HG008927-01A1
Application #
9128287
Study Section
Special Emphasis Panel (ZRG1-GGG-E (02)M)
Program Officer
Gilchrist, Daniel A
Project Start
2016-09-09
Project End
2019-06-30
Budget Start
2016-09-09
Budget End
2017-06-30
Support Year
1
Fiscal Year
2016
Total Cost
$519,139
Indirect Cost
$168,891
Name
Dana-Farber Cancer Institute
Department
Type
DUNS #
076580745
City
Boston
State
MA
Country
United States
Zip Code
02215
Pan, Deng; Kobayashi, Aya; Jiang, Peng et al. (2018) A major chromatin regulator determines resistance of tumor cells to T cell-mediated killing. Science 359:770-775
Fei, Teng; Chen, Yiwen; Xiao, Tengfei et al. (2017) Genome-wide CRISPR screen identifies HNRNPL as a prostate cancer dependency regulating RNA splicing. Proc Natl Acad Sci U S A 114:E5207-E5215
Ma, Jian; Köster, Johannes; Qin, Qian et al. (2016) CRISPR-DO for genome-wide CRISPR design and optimization. Bioinformatics 32:3336-3338