Our capability to apply high-throughput molecular profiling technologies to increasingly large cohorts and sample sets is significantly expanding our understanding of human biology and complex disease. The Genotype Tissue-Expression (GTEx) project is creating a unique resource of genetic variation and gene expression across a wide range of human tissues. Upon completion this will include RNA sequence data from over 25,000 samples spanning 53 human tissues/organs and whole genome and exome sequence data from 960 donors. Additional data types not yet generated will include miRNA-seq, protein levels, DNA methylation, ChIP-seq, and DNase I hypersensitive site data among others. The ability to easily access, interpret and integrate these large data sets by a wide range of users with varying needs and skills is becoming of critical importance to leverage the full utility of the data. The GTEx Portal (http://gtexportal.org/) is the most widely accessed resource for the GTEx project, hosting all unprotected data, analysis results and numerous visual exploration tools, and has been enthusiastically received by the scientific community. To maximize the impact of this resource, we plan to expand the portal to: host data currently in production and new data types still to be generated; present novel and integrative analyses of existing data, and data from external sources; and to develop and share flexible tools for data analysis, visualization and access.
Aim 1. We will host and support all open-access GTEx data and analysis results, performing systematic re-analyses of the data with new methods to reflect the state-of-the-art in RNA-seq analysis. We will add all new data sets to the portal to include novel assays (e.g. mi-RNA-seq, protein, methylation assays, etc/), derived analysis results (e.g. trans-eQTLs, splice- QTLs, GWAS enrichment analyses, protein-QTLs, etc.), and RNA-seq data sets from external investigators.
Aim 2. We will work closely with both small focus groups of tool developers and engage our large user-base to identify and prioritize new features for development to display and integrate between multiple data types, and collaborate with other large genomic resources (e.g. ENCODE, UCSC and ENSEMBL browsers) to enable better integration of data sources and to enhance the utility and accessibility of the GTEx resource.
Aim 3. We will automate and share all analysis pipeline tools with the scientific community. To support a wide range of user access needs, we will develop an open-source API to provide comprehensive data access, and also improve visualization tools and user-driven data analyses on the portal. To maximize use of the resource, we will design and offer training videos and outreach workshops.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Biotechnology Resource Cooperative Agreements (U41)
Project #
5U41HG009494-02
Application #
9544284
Study Section
Special Emphasis Panel (ZHG1)
Project Start
Project End
Budget Start
2018-06-01
Budget End
2019-05-31
Support Year
2
Fiscal Year
2018
Total Cost
Indirect Cost
Name
Broad Institute, Inc.
Department
Type
DUNS #
623544785
City
Cambridge
State
MA
Country
United States
Zip Code