The analysis of big genomic data requires specialized software able to cope with challenges emerging from both the high dimensional nature of the data itself and the complexity of the underlying biological mechanisms. With NIH support we developed, tested and now maintain the Bayesian Generalized Linear Regression R-library (available at CRAN, BGLR, Prez and de los Campos 2014): a comprehensive Bayesian statistical software that implements a large collection of Whole-Genome Regression (WGR) procedures, including shrinkage and variable selection methods for linear models and semi parametric regressions (RKHS). Several studies that have used BGLR for analyses of large genomic data sets (with hundreds of thousands of SNPs and thousands of individuals) as well as multi-layer omic data demonstrate the value of the software. For the renewal of our grant we propose a set of improvements and developments that will make BGLR better suited for the analysis of Big Data and will greatly expand the classes of models implemented. We will develop and implement:
(Aim 1) methods to enable BGLR to carry out computations using inputs that are stored in distributed binary files, without fully loading data into RAM-this will open great opportunities for the analysis of big omic data sets;
(Aim 2) a BGLR module to fit a diverse array of interaction models, including interactions between categorical (e.g., sex, treatment) or quantitative (e.g., BMI) risk factors with whole- genome data (e.g., SNPs, expression profiles);
(Aim 3) methods to incorporate prior information (e.g., annotation) into whole genome regressions; and, (Aim 4) instruments for online training. The successful achievement of our aims will provide researchers with efficient data analysis tools for whole-genome analysis of large omic data sets.

Public Health Relevance

The analysis of big genomic data is challenging and requires the development of specialized software. Our group has developed, tested and now maintain the Bayesian Generalized Linear Regression R-library (available at CRAN, BGLR). BGLR implements, in a unified framework, a large collection of Bayesian models for high-dimensional genetic data analyses, including: various shrinkage and variable selection procedures, semi-parametric regression methods (RKHS) and pedigree-models. For the renewal of our grant we have identified a set of improvements and developments that will make BGLR better suited for the analyses of big genomic data and will expand the types of models implemented by accommodating different type interactions as well as the possibility of incorporating prior information (e.g., annotation) into whole-genome regressions.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
2R01GM101219-04
Application #
8964392
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Brazhnik, Paul
Project Start
2012-03-01
Project End
2018-05-31
Budget Start
2015-09-01
Budget End
2016-05-31
Support Year
4
Fiscal Year
2015
Total Cost
$307,000
Indirect Cost
$107,000
Name
Michigan State University
Department
Public Health & Prev Medicine
Type
Schools of Medicine
DUNS #
193247145
City
East Lansing
State
MI
Country
United States
Zip Code
48824
Bernal Rubio, Yeni L; González-Reymúndez, Agustin; Wu, Kuan-Han H et al. (2018) Whole-Genome Multi-omic Study of Survival in Patients with Glioblastoma Multiforme. G3 (Bethesda) 8:3627-3636
Enciso-Rodriguez, Felix; Douches, David; Lopez-Cruz, Marco et al. (2018) Genomic Selection for Late Blight and Common Scab Resistance in Tetraploid Potato (Solanum tuberosum). G3 (Bethesda) 8:2471-2481
Bellot, Pau; de Los Campos, Gustavo; Pérez-Enciso, Miguel (2018) Can Deep Learning Improve Genomic Prediction of Complex Human Traits? Genetics 210:809-819
Sun, Mengying; Vazquez, Ana I; Reynolds, Richard J et al. (2018) Untangling the complex relationships between incident gout risk, serum urate, and its comorbidities. Arthritis Res Ther 20:90
de Los Campos, Gustavo; Vazquez, Ana Ines; Hsu, Stephen et al. (2018) Complex-Trait Prediction in the Era of Big Data. Trends Genet 34:746-754
Lello, Louis; Avery, Steven G; Tellier, Laurent et al. (2018) Accurate Genomic Prediction of Human Height. Genetics 210:477-497
Kim, Hwasoon; Grueneberg, Alexander; Vazquez, Ana I et al. (2017) Will Big Data Close the Missing Heritability Gap? Genetics 207:1135-1145
Pickens, C Austin; Vazquez, Ana I; Jones, A Daniel et al. (2017) Obesity, adipokines, and C-peptide are associated with distinct plasma phospholipid profiles in adult males, an untargeted lipidomic approach. Sci Rep 7:6335
Pérez-Enciso, M; de Los Campos, G; Hudson, N et al. (2017) The 'heritability' of domestication and its functional partitioning in the pig. Heredity (Edinb) 118:160-168
González-Reymúndez, Agustín; de Los Campos, Gustavo; Gutiérrez, Lucía et al. (2017) Prediction of years of life after diagnosis of breast cancer using omics and omic-by-treatment interactions. Eur J Hum Genet 25:538-544

Showing the most recent 10 out of 40 publications