This component describes our plans to enhance the interconnectivity of the UCSC Genome Browser and related databases with other computational groups and tools in the scientific community, maintain the high quality of the Genome Browser software and data, and provide a robust computing environment capable of supporting our developers and users. We propose three primary ways in which we plan to develop, use, and extend the data exchange standards that make it easier for other bioinformaticians to both use our data and make their own data available in the Genome Browser. We plan to further develop our widely adopted track and assembly hub systems that group together genomics files in an organized fashion and label them for browser display, in particular by extending the representation of metadata (such as biosample sources and treatments) in hubs, and expanding our search capabilities. We will continue to work with ontology groups to incorporate their controlled vocabularies into relevant fields of our metadata. We will closely collaborate with the Global Alliance for Genomics and Health (GA4GH) project to ensure that their APIs are sufficiently flexible to express our data sets and to develop a JSON-based web services API to our databases. We plan to maintain and improve the quality of our software through the continued use of good engineering practices, including the appropriate use of functional programming approaches to minimize code side effects and maximize parallel processing potential. We will continue to employ incremental, object-oriented, modular programming techniques and unit tests to maintain code quality, as well as our weekly paired-review process that ensures a thorough review of new code and helps distribute knowledge of the code base throughout our organization. Augmenting our engineering practices, we will continue to maintain a separate quality assurance group that applies a combination of automated and manual testing to check the quality of the software and data released on our website. The Genome Browser production and development environments are supported by several mid-range server- grade computers and a variety of storage subsystems chosen with good price/performance ratios in mind. We plan to reconfigure our system to reduce single points of failure and increase parallelism, and will reduce our need for a large compute cluster by making increased use of the cloud for large bursts of computation, such as that associated with our multiple genome alignment pipeline.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Biotechnology Resource Cooperative Agreements (U41)
Project #
2U41HG002371-18
Application #
9357894
Study Section
National Human Genome Research Institute Initial Review Group (GNOM)
Project Start
Project End
Budget Start
2017-07-01
Budget End
2018-06-30
Support Year
18
Fiscal Year
2017
Total Cost
Indirect Cost
Name
University of California Santa Cruz
Department
Type
DUNS #
125084723
City
Santa Cruz
State
CA
Country
United States
Zip Code
95064
Brozovic, Matija; Dantec, Christelle; Dardaillon, Justine et al. (2018) ANISEED 2017: extending the integrated ascidian database to the exploration and evolutionary comparison of genome-scale datasets. Nucleic Acids Res 46:D718-D725
Casper, Jonathan; Zweig, Ann S; Villarreal, Chris et al. (2018) The UCSC Genome Browser database: 2018 update. Nucleic Acids Res 46:D762-D769
Canver, Matthew C; Haeussler, Maximilian; Bauer, Daniel E et al. (2018) Integrated design, execution, and analysis of arrayed and pooled CRISPR genome-editing experiments. Nat Protoc 13:946-986
GTEx Consortium (2018) Erratum: Genetic effects on gene expression across human tissues. Nature 553:530
Dyke, Stephanie O M; Linden, Mikael; Lappalainen, Ilkka et al. (2018) Registered access: authorizing data access. Eur J Hum Genet 26:1721-1731
Howard, Jonathan M; Lin, Hai; Wallace, Andrew J et al. (2018) HNRNPA1 promotes recognition of splice site decoys by U2AF2 in vivo. Genome Res 28:689-698
GTEx Consortium; Laboratory, Data Analysis &Coordinating Center (LDACC)—Analysis Working Group; Statistical Methods groups—Analysis Working Group et al. (2017) Genetic effects on gene expression across human tissues. Nature 550:204-213
Saha, Ashis; Kim, Yungil; Gewirtz, Ariel D H et al. (2017) Co-expression networks reveal the tissue-specific regulation of transcription and splicing. Genome Res 27:1843-1858
Tyner, Cath; Barber, Galt P; Casper, Jonathan et al. (2017) The UCSC Genome Browser database: 2017 update. Nucleic Acids Res 45:D626-D634
Vivian, John; Rao, Arjun Arkal; Nothaft, Frank Austin et al. (2017) Toil enables reproducible, open source, big biomedical data analyses. Nat Biotechnol 35:314-316

Showing the most recent 10 out of 41 publications