This program project will involve high-throughput experiments, including whole genome sequencing (WGS), RNA sequence profiling on cell populations (RNA-seq), perturbation RNA profiling (Perturb-seq), and CRISPR screens. These studies will generate substantial quantities of data requiring secure, reliable storage and computational analysis. Data derived from human research participants require a data security level suitable for their sensitive nature. Secure and sufficient computational and information resources are crucial for facilitating efficient, consistent, and reproducible analyses for all projects. Storing all data from the program project centrally enables data provenance tracking, data sharing, and integration. Shared computing supports consistent analysis pipelines, with a single resource responsible for maintaining current versions of databases and software. A bioinformatics specialist performing routine analyses will allow researchers in each project to focus on novel research endeavors. We plan to support the three project components and the three other cores with the following specific aims:
Aim 1. Provide informatics infrastructure to enable reproducible and secure data analyses. Core B will provide a secure and reliable informatics infrastructure to support foundational data analyses for the projects. A key component of the computing environment will be a server with sufficient compute cycles and memory for both routine analyses and novel research activities of the projects. This secure environment will include online disk storage for all raw and processed project data and analysis code. This core will implement a well-designed backup system including regular onsite and offsite backup of project data. Software packages and necessary public and licensed databases will be deployed and kept current. The entire compute environment will be secured by a modern firewall, with systems accessed via VPN. A system administrator will maintain the computing system, including backup, and will support users from the collaborating institutes.
Aim 2 : Perform routine bioinformatics analyses for each Project. A bioinformatics specialist in Core B will provide basic computational analyses for genome and transcriptome data. Statistical researchers in Core B will develop innovative robust statistical methods for Perturb-seq in Project 3, and creative pipelines for all studies. To enable consistent and reproducible analyses for all projects, this core will deploy a robust software infrastructure to manage all data, record its provenance, and track all results. The core will perform preprocessing, quality control, annotation, and other routine analysis steps for WGS, transcriptome sequencing, CRISPR screens, and characterization of specificity and efficiency of gene correction. Core B will also help maintain a public resource of genes and regulatory regions related to T cell deficiencies. Core B will facilitate researchers in all projects and cores with deposition of all datasets into the appropriate public databases and repositories, with all appropriate metadata.

Agency
National Institute of Health (NIH)
Institute
National Institute of Allergy and Infectious Diseases (NIAID)
Type
Research Program Projects (P01)
Project #
1P01AI138962-01A1
Application #
10024569
Study Section
Special Emphasis Panel (ZAI1)
Project Start
2020-09-08
Project End
2025-08-31
Budget Start
2020-07-01
Budget End
2021-06-30
Support Year
1
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of California San Francisco
Department
Type
DUNS #
094878337
City
San Francisco
State
CA
Country
United States
Zip Code
94118