The CRISPR/Cas9 system is a revolutionary approach for genome editing of mammalian cells. Recent developments in CRISPR/Cas9 knockout technology as well as dCas9 fused with effector proteins enable high throughput cost-effective gene functional screens but create computational challenges. We have developed a Model-based Analysis of Genome-wide CRISPR/Cas9 Knockout (MAGeCK) method for calling genes and pathways from genome-scale CRISPR/Cas9 screens. MAGeCK demonstrates better performance than previous methods and identify robust hits, including novel ones, from several published screens. In this proposal, we aim to develop the statistical and computational methods to improve the MAGeCK algorithm to enable quality control, data analysis, and interactive visualizations of CRISPR screen data. In one unified statistical model, the proposed method corrects batch effect at gRNA level, simultaneously estimates gRNA efficiency and gene selection, and identifies differential gene and pathway selection over multiple conditions, and considers sequencing bias and cell doubling time. Specifically, we propose to:
Aim 1. Develop robust data normalization methods for CRISPR screens.
Aim 2. Develop the statistical and computational framework to call cell- and condition-specific essential genes and pathways from multiple CRISPR screen experiments and conditions.
Aim 3. Develop methods to mitigate outlier gRNA effects and use protein interaction network to enhance the performance of CRISPR screen gene calling.
Aim 4. Develop user-friendly software features, such as quality control, visualization, design and analysis software for CRISPR screens. At the conclusion of these studies, we will have developed more versatile and reliable analysis algorithms for CRISPR screens under diverse experimental settings. These methods could be applied to CRISPR knockout screens, CRISPRi/a screens, sequencing-based si/shRNA screens, and the phenotype could be cell growth, migration, differentiation, or sorting of GFP-labeled gene expression. Our proposed methods will greatly facilitate the technology adoption to many experimental biology groups, so they can use the powerful genome-wide CRISPR screens under diverse experimental settings to answer important biological questions about gene regulation and drug response.

Public Health Relevance

Genome-wide CRISPR screen is a novel and cost-effective technique to identify driver genes in a biological process of interest, but it creates computationa challenges to experimental biologists. We propose to develop the statistical and computational methods for quality control, data analysis and visualization of CRISPR screens to overcome these challenges. These methods will enable the research community to adopt the powerful genome-wide CRISPR screen technology for hypotheses generation and biomedical discovery.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Gilchrist, Daniel A
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Dana-Farber Cancer Institute
United States
Zip Code
Pan, Deng; Kobayashi, Aya; Jiang, Peng et al. (2018) A major chromatin regulator determines resistance of tumor cells to T cell-mediated killing. Science 359:770-775
Fei, Teng; Chen, Yiwen; Xiao, Tengfei et al. (2017) Genome-wide CRISPR screen identifies HNRNPL as a prostate cancer dependency regulating RNA splicing. Proc Natl Acad Sci U S A 114:E5207-E5215
Ma, Jian; Köster, Johannes; Qin, Qian et al. (2016) CRISPR-DO for genome-wide CRISPR design and optimization. Bioinformatics 32:3336-3338