We are developing and improving powerful statistical and genetic tools to analyze and integrate massive omics data sets jointly with information on disease risk and severity. This work will enable far better use and re-use of complex and massive omics data sets and software by a wide community of users?ranging from students, researchers, and clinical scientists to expert data scientists and statisticians. We are building modular high- performance computational resources as part of a web services framework called GeneNetwork 2 (GN2). GN2 provides efficient data uploading and access and a suite of QC and analysis code that can be used or adapted for any species. Code is written in Python, C++, and R, and is supported by a relational database (MySQL) that incorporates the largest coherent collection of expression quantitative trait locus (eQTL) data. GN2 is optimized to handle a new generation of complex genetic crosses, including heterogeneous stock, hybrid diversity panels, GWAS cohorts, and sets of recombinant inbred strains such as the BXD and Collaborative Cross. GN2 includes new code for comparative and translational analysis of eQTL data sets and network graphs. In this grant we extend GN2 in four specific ways: far more capable data entry and export APIs and workflows, QC, and simulation routines (Aim 1); new high performance tools for the analysis of complex cross populations, comparative and translational analysis of systems genetics data sets (Aim 2), a new plug-in application programming interface (API) architecture with backend use of GPU web service systems (Aim 3), and statistical methods for correlated high dimensional data and predictive Bayesian modelling (Aim 4). We anticipate that this open and scalable architecture and modular code will become a core resource for both molecular biologists and data scientists, particularly those working in predictive modeling and precision medicine. All members of our team work closely with the systems genetics community and are training the next generation of young scientists interested in scalable integrative models of disease risk and treatment.

Public Health Relevance

We are developing an open and modular web service for collaborative systems genetics and precision medicine. The statistical and network analysis tools we are creating are being used to analyze and integrate massive genomics data sets generated as part of NIH programs with data on disease risk and treatment, GeneNetwork is used by thousands of researchers, many supported by NIH. In this program we are adding powerful analytic and modeling software for the analysis of many common experimental model organisms and even human cohorts. .

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
1R01GM123489-01A1
Application #
9311935
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Ravichandran, Veerasamy
Project Start
2017-04-15
Project End
2021-03-31
Budget Start
2017-04-15
Budget End
2018-03-31
Support Year
1
Fiscal Year
2017
Total Cost
$489,764
Indirect Cost
$165,239
Name
University of Tennessee Health Science Center
Department
Physiology
Type
Schools of Medicine
DUNS #
941884009
City
Memphis
State
TN
Country
United States
Zip Code
38103
Ashbrook, D G; Mulligan, M K; Williams, R W (2018) Post-genomic behavioral genetics: From revolution to routine. Genes Brain Behav 17:e12441
Hook, Michael; Roy, Suheeta; Williams, Evan G et al. (2018) Genetic cartography of longevity in humans and mice: Current landscape and horizons. Biochim Biophys Acta Mol Basis Dis 1864:2718-2732
Delprato, A; Algéo, M-P; Bonheur, B et al. (2017) QTL and systems genetics analysis of mouse grooming and behavioral responses to novelty in an open field. Genes Brain Behav 16:790-799
St John, Steven J; Lu, Lu; Williams, Robert W et al. (2017) Genetic control of oromotor phenotypes: A survey of licking and ingestive behaviors in highly diverse strains of mice. Physiol Behav 177:34-43