Our capability to apply high-throughput molecular profiling technologies to increasingly large cohorts and sample sets is significantly expanding our understanding of human biology and complex disease. The Genotype Tissue-Expression (GTEx) project is creating a unique resource of genetic variation and gene expression across a wide range of human tissues. Upon completion this will include RNA sequence data from over 25,000 samples spanning 53 human tissues/organs and whole genome and exome sequence data from 960 donors. Additional data types not yet generated will include miRNA-seq, protein levels, DNA methylation, ChIP-seq, and DNase I hypersensitive site data among others. The ability to easily access, interpret and integrate these large data sets by a wide range of users with varying needs and skills is becoming of critical importance to leverage the full utility of the data. The GTEx Portal (http://gtexportal.org/) is the most widely accessed resource for the GTEx project, hosting all unprotected data, analysis results and numerous visual exploration tools, and has been enthusiastically received by the scientific community. To maximize the impact of this resource, we plan to expand the portal to: host data currently in production and new data types still to be generated; present novel and integrative analyses of existing data, and data from external sources; and to develop and share flexible tools for data analysis, visualization and access.
Aim 1. We will host and support all open-access GTEx data and analysis results, performing systematic re-analyses of the data with new methods to reflect the state-of-the-art in RNA-seq analysis. We will add all new data sets to the portal to include novel assays (e.g. mi-RNA-seq, protein, methylation assays, etc/), derived analysis results (e.g. trans-eQTLs, splice- QTLs, GWAS enrichment analyses, protein-QTLs, etc.), and RNA-seq data sets from external investigators.
Aim 2. We will work closely with both small focus groups of tool developers and engage our large user-base to identify and prioritize new features for development to display and integrate between multiple data types, and collaborate with other large genomic resources (e.g. ENCODE, UCSC and ENSEMBL browsers) to enable better integration of data sources and to enhance the utility and accessibility of the GTEx resource.
Aim 3. We will automate and share all analysis pipeline tools with the scientific community. To support a wide range of user access needs, we will develop an open-source API to provide comprehensive data access, and also improve visualization tools and user-driven data analyses on the portal. To maximize use of the resource, we will design and offer training videos and outreach workshops.

Public Health Relevance

The GTEx portal provides a unique, unprecedented, and widely used data resource of gene expression, genetic variation, and other molecular phenotypes across a wide and diverse range of human tissues. To maximize this resource's utility for the biomedical community, we propose to enhance the portal's capabilities to host all open-access GTEx data, including new data types being produced, all analytical pipelines, including future additions, as well as compatible data sets from the research community that will expand the resource. We will also provide a suite of best practice tools for analysis, along with training and support, to meet a wide range of user needs.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Biotechnology Resource Cooperative Agreements (U41)
Project #
5U41HG009494-04
Application #
9938665
Study Section
Special Emphasis Panel (ZHG1)
Program Officer
Volpi, Simona
Project Start
2017-08-15
Project End
2022-05-31
Budget Start
2020-06-01
Budget End
2021-05-31
Support Year
4
Fiscal Year
2020
Total Cost
Indirect Cost
Name
Broad Institute, Inc.
Department
Type
DUNS #
623544785
City
Cambridge
State
MA
Country
United States
Zip Code
02142