Prioritizing rare variants associated with cancer using non-coding annotation

Gerstein, Mark

Abstract

We will investigate potential disease-associated genetic variants in the non-coding regions of the human genome. Recent work in the ENCODE project and in population-scale RNA sequencing has contributed significantly to our knowledge of non-coding elements. Thus, given the focus on coding variation in many previous disease studies, there is much untapped potential in exploring the non-coding variation associated with disease. We plan to prioritize rare, germline non-coding variants for connection to disease, using a generalized framework that we will tune specifically to Prostate Cancer as a test case. Our approach will build upon our existing tool, FunSeq, which prioritizes rare somatic variants in cancer, to create eleVAR - elevating germline VARiants. FunSeq was developed to prioritize somatic variants in regions of the genome depleted of common variants in the general population, based on data from the 1000 Genomes project. eleVAR will use this general principle to analyze germline variations, and build upon it by adding several key features, including: (i) prioritizing variants leading to gain of new transcription-factor (TF) binding sites(in addition to disruption of existing sites), (ii) annotating variants in enhancers and connecting them to target genes, (iii) prioritizing variants highly connected in a variety of biological networks, (iv) annotating variants in non-coding RNAs similarly to those in TF binding sites, and (v) prioritizing variants associated with variable, allele-specific activity. Our second objective s to use eleVAR to prioritize variants in whole genome sequences from the TCGA/ICGC consortium. Our efficient implementation of eleVAR will include a module for updating parameters in response to high throughput experimental data. We will progressively tune and evaluate eleVAR, first using publicly available data, and then using multiple rounds of high throughput experimental characterization of variants occurring specifically in prostate cancer. Our last objective is to functionally validate a subset of variants in details. First, we will idenify variants in the 6 representative eleVAR positives and look at their frequency of occurrence in a large prostate cancer cohort using targeted re-sequencing. We will use the CRISPR/Cas system to generate endogenous mutations, determining their effects on target gene expression, cell morphology and tumorigenicity, and TF binding by EMSA and chromatin immunoprecipitation.

Public Health Relevance

We plan to prioritize rare, germline variants associated with disease for functional impact, using prostate cancer as a test case. We will focus on variants in non-coding regions - a category of variant underrepresented in previous studies. Utilizing a range of genomics data, our goal is to prioritize variants for validation with our eleVAR pipeline.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 1R01HG008126-01A1
Application #: 9071599
Study Section: Special Emphasis Panel (ZHG1-HGR-M (J1))
Program Officer: Pazin, Michael J

Project Start: 2016-07-01
Project End: 2019-06-30
Budget Start: 2016-07-01
Budget End: 2017-06-30
Support Year: 1
Fiscal Year: 2016
Total Cost: $836,728
Indirect Cost: $162,239

Institution

Name: Yale University
Department: Biochemistry
Type: Schools of Medicine
DUNS #: 043207562

City: New Haven
State: CT
Country: United States
Zip Code: 06520

Related projects


NIH 2018 R01 HG	Prioritizing rare variants associated with cancer using non-coding annotation Gerstein, Mark Bender / Yale University
NIH 2017 R01 HG	Prioritizing rare variants associated with cancer using non-coding annotation Gerstein, Mark Bender / Yale University
NIH 2016 R01 HG	Prioritizing rare variants associated with cancer using non-coding annotation Gerstein, Mark Bender / Yale University	$836,728

Publications

Lochovsky, Lucas; Zhang, Jing; Gerstein, Mark (2018) MOAT: efficient detection of highly mutated regions with the Mutations Overburdening Annotations Tool. Bioinformatics 34:1031-1033

McGillivray, Patrick; Ault, Russell; Pawashe, Mayur et al. (2018) A comprehensive catalog of predicted functional upstream open reading frames in humans. Nucleic Acids Res 46:3326-3338

Li, Shantao; Shuch, Brian M; Gerstein, Mark B (2017) Whole-genome analysis of papillary kidney cancer finds significant noncoding alterations. PLoS Genet 13:e1006685

Balasubramanian, Suganthi; Fu, Yao; Pawashe, Mayur et al. (2017) Using ALoFT to determine the impact of putative loss-of-function variants in protein-coding genes. Nat Commun 8:382

Chen, Jieming; Wang, Bo; Regan, Lynne et al. (2017) Intensification: A Resource for Amplifying Population-Genetic Signals with Protein Repeats. J Mol Biol 429:435-445

Comments

Be the first to comment on Mark Gerstein's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: