CLIP Tool Kit (CTK): software package, user interface and tutorials for CLIP data analysis Multiple steps of gene expression regulation rely on co- and post-transcriptional processing of RNA through its interaction with RNA-binding proteins (RBPs). UV cross-linking and immunoprecipitation (CLIP) of protein- RNA complexes, followed by high-throughput sequencing of isolated RNA, has become a gold standard for mapping in vivo protein-RNA interactions on a genome-wide scale. However, there remains a lack of software tools that provide flexible, streamlined and comprehensive analysis of CLIP data that incorporates the most recent technical advancements of CLIP protocols and sequencing technology. Here we propose to develop the CLIP Tool Kit (CTK) to fill this gap by building on our extensive experience in this field and enhancing a prototype we have developed for our research over the years. Taking raw sequence reads coming off sequencers as input, CTK will perform a series of analyses using its three major components: 1) quality filtering and mapping of raw reads, followed by a stringent, model-based algorithm to collapse PCR duplicates to obtain unique CLIP tags; 2) an adaptive valley-seeking algorithm to define CLIP tag clusters and perform peak calling; and 3) a statistical method to determine the exact protein-RNA crosslink sites through analysis of crosslink induced mutation sites (CIMS) and truncation sites (CITS). We will implement several novel algorithms as well as important extensions to our current software package, to make significant improvement in accuracy and efficiency, and to keep up with advancement in CLIP protocols and sequencing technologies. In addition, we aim to improve the usability and dissemination of the software. We will implement an interface through Galaxy so that CTK can be integrated into this widely used bioinformatics workflow management system. Multiple tutorials for typical analysis pipelines will be developed and a user group will be established. Thus, this study will provide a valuable resource for the RNA biology research community.

Public Health Relevance

Protein-RNA interactions play a central role in co- and post-transcriptional gene expression regulation, and perturbation of such interactions has been implicated in an expanding list of genetic diseases ranging from neurological disorders to cancer. CLIP is a powerful biochemical assay to map in vivo protein-RNA interactions on a genome-wide scale. This study aims to develop a software package that provides flexible, streamlined and comprehensive CLIP data analysis, which will be a valuable resource for delineating functional protein-RNA interactions underlying normal and disease cellular contexts.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Small Research Grants (R03)
Project #
5R03HG009528-02
Application #
9469437
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Gilchrist, Daniel A
Project Start
2017-04-10
Project End
2019-03-31
Budget Start
2018-04-01
Budget End
2019-03-31
Support Year
2
Fiscal Year
2018
Total Cost
Indirect Cost
Name
Columbia University (N.Y.)
Department
Biochemistry
Type
Schools of Medicine
DUNS #
621889815
City
New York
State
NY
Country
United States
Zip Code
10032
Jacko, Martin; Weyn-Vanhentenryck, Sebastien M; Smerdon, John W et al. (2018) Rbfox Splicing Factors Promote Neuronal Maturation and Axon Initial Segment Assembly. Neuron 97:853-868.e6
Ustianenko, Dmytro; Chiu, Hua-Sheng; Treiber, Thomas et al. (2018) LIN28 Selectively Modulates a Subclass of Let-7 MicroRNAs. Mol Cell 71:271-283.e5
Ustianenko, Dmytro; Weyn-Vanhentenryck, Sebastien M; Zhang, Chaolin (2017) Microexons: discovery, regulation, and function. Wiley Interdiscip Rev RNA 8:
Zhang, Chaolin; Shen, Yufeng (2017) A Cell Type-Specific Expression Signature Predicts Haploinsufficient Autism-Susceptibility Genes. Hum Mutat 38:204-215
Shah, Ankeeta; Qian, Yingzhi; Weyn-Vanhentenryck, Sebastien M et al. (2017) CLIP Tool Kit (CTK): a flexible and robust pipeline to analyze CLIP sequencing data. Bioinformatics 33:566-567