The objective of this project is to facilitate the interpretation of genetic variants identified in clinical whole- genome and whole-exome sequencing studies through the development of computational methods to predict functional effects of individual variants. Genome-scale sequencing technologies are increasingly enabling studies of genetic variation in large numbers of individuals; however, interpreting the clinical significance of the hundreds of thousands of genetic variants identified in these studies remains a critical challenge. Gene expression regulation is one mechanism by which variants can result in disease or other clinically-significant phenotypes. This mechanism is likely to be particularly important for disease-causing variants that do not directly affect protein structure by altering an amino acid sequence. The methods developed here will enable researchers to predict whether genetic variants are likely to have a regulatory effect on gene expression. The first stage of the project is to build computational models to predict such regulatory effects using a random forest machine learning approach. These models will be trained to recognize regulatory variation using a set of variants that have been shown to be involved in expression regulation in a recent study of gene expression across hundreds of individuals. Separate algorithms will be developed to predict two different types of regulatory effects: changes in the total amount of RNA produced from a particular gene (expression level variation) and changes in the specific form of RNA produced from a particular gene (splicing or isoform ratio variation). The second stage of the project is to evaluate the performance of these models on gene expression datasets from a separate human population and from different tissues within the human body, to explore their generalizability and to determine to what extent the characteristics of regulatory variants are conserved across tissues and populations. The final stage of the project is to use genetic variants in publicly-available databases that are known to be pathogenic to characterize how well these models perform at predicting clinical significance. This stage will test the hypothesis that variants that regulate gene expression are more likely to be clinically significant than variants that do not regulate expression. This project will impact public health by providing useful tools to improve prediction of the clinical significance of genetic variants identified in genome- scale sequencing studies. In addition, the project will provide biological insight into the tissue-specificity and population-specificity of genomic features that characterize regulatory genetic variants.

Public Health Relevance

Genome-scale sequencing technologies are increasingly enabling studies of genetic variation in large numbers of patients and other individuals. However, interpreting the clinical significance of the hundreds of thousands of genetic variants identified in these studies remains a critical challenge. This project will provide useful tools to improve automated prediction of the clinical significance of such genetic variants as well as their regulatory effects on gene expression.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Postdoctoral Individual National Research Service Award (F32)
Project #
5F32HG008330-02
Application #
9039466
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Junkins, Heather
Project Start
2014-12-01
Project End
2017-11-30
Budget Start
2015-12-01
Budget End
2016-11-30
Support Year
2
Fiscal Year
2016
Total Cost
Indirect Cost
Name
Stanford University
Department
Genetics
Type
Schools of Medicine
DUNS #
009214214
City
Stanford
State
CA
Country
United States
Zip Code
94304
Ioannidis, Nilah M; Wang, Wei; Furlotte, Nicholas A et al. (2018) Gene expression imputation identifies candidate genes and susceptibility loci associated with cutaneous squamous cell carcinoma. Nat Commun 9:4264
Ioannidis, Nilah M; Davis, Joe R; DeGorter, Marianne K et al. (2017) FIRE: functional inference of genetic variants that regulate gene expression. Bioinformatics 33:3895-3901
Ioannidis, Nilah M; Rothstein, Joseph H; Pejaver, Vikas et al. (2016) REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants. Am J Hum Genet 99:877-885