CLIP Tool Kit (CTK): software package, user interface and tutorials for CLIP data analysis Multiple steps of gene expression regulation rely on co- and post-transcriptional processing of RNA through its interaction with RNA-binding proteins (RBPs). UV cross-linking and immunoprecipitation (CLIP) of protein- RNA complexes, followed by high-throughput sequencing of isolated RNA, has become a gold standard for mapping in vivo protein-RNA interactions on a genome-wide scale. However, there remains a lack of software tools that provide flexible, streamlined and comprehensive analysis of CLIP data that incorporates the most recent technical advancements of CLIP protocols and sequencing technology. Here we propose to develop the CLIP Tool Kit (CTK) to fill this gap by building on our extensive experience in this field and enhancing a prototype we have developed for our research over the years. Taking raw sequence reads coming off sequencers as input, CTK will perform a series of analyses using its three major components: 1) quality filtering and mapping of raw reads, followed by a stringent, model-based algorithm to collapse PCR duplicates to obtain unique CLIP tags; 2) an adaptive valley-seeking algorithm to define CLIP tag clusters and perform peak calling; and 3) a statistical method to determine the exact protein-RNA crosslink sites through analysis of crosslink induced mutation sites (CIMS) and truncation sites (CITS). We will implement several novel algorithms as well as important extensions to our current software package, to make significant improvement in accuracy and efficiency, and to keep up with advancement in CLIP protocols and sequencing technologies. In addition, we aim to improve the usability and dissemination of the software. We will implement an interface through Galaxy so that CTK can be integrated into this widely used bioinformatics workflow management system. Multiple tutorials for typical analysis pipelines will be developed and a user group will be established. Thus, this study will provide a valuable resource for the RNA biology research community.

Public Health Relevance

Protein-RNA interactions play a central role in co- and post-transcriptional gene expression regulation, and perturbation of such interactions has been implicated in an expanding list of genetic diseases ranging from neurological disorders to cancer. CLIP is a powerful biochemical assay to map in vivo protein-RNA interactions on a genome-wide scale. This study aims to develop a software package that provides flexible, streamlined and comprehensive CLIP data analysis, which will be a valuable resource for delineating functional protein-RNA interactions underlying normal and disease cellular contexts.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Small Research Grants (R03)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Gilchrist, Daniel A
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Columbia University (N.Y.)
Schools of Medicine
New York
United States
Zip Code
Ustianenko, Dmytro; Chiu, Hua-Sheng; Treiber, Thomas et al. (2018) LIN28 Selectively Modulates a Subclass of Let-7 MicroRNAs. Mol Cell 71:271-283.e5
Jacko, Martin; Weyn-Vanhentenryck, Sebastien M; Smerdon, John W et al. (2018) Rbfox Splicing Factors Promote Neuronal Maturation and Axon Initial Segment Assembly. Neuron 97:853-868.e6
Ustianenko, Dmytro; Weyn-Vanhentenryck, Sebastien M; Zhang, Chaolin (2017) Microexons: discovery, regulation, and function. Wiley Interdiscip Rev RNA 8:
Zhang, Chaolin; Shen, Yufeng (2017) A Cell Type-Specific Expression Signature Predicts Haploinsufficient Autism-Susceptibility Genes. Hum Mutat 38:204-215
Shah, Ankeeta; Qian, Yingzhi; Weyn-Vanhentenryck, Sebastien M et al. (2017) CLIP Tool Kit (CTK): a flexible and robust pipeline to analyze CLIP sequencing data. Bioinformatics 33:566-567