The analysis of big genomic data requires specialized software able to cope with challenges emerging from both the high dimensional nature of the data itself and the complexity of the underlying biological mechanisms. With NIH support we developed, tested and now maintain the Bayesian Generalized Linear Regression R-library (available at CRAN, BGLR, Prez and de los Campos 2014): a comprehensive Bayesian statistical software that implements a large collection of Whole-Genome Regression (WGR) procedures, including shrinkage and variable selection methods for linear models and semi parametric regressions (RKHS). Several studies that have used BGLR for analyses of large genomic data sets (with hundreds of thousands of SNPs and thousands of individuals) as well as multi-layer omic data demonstrate the value of the software. For the renewal of our grant we propose a set of improvements and developments that will make BGLR better suited for the analysis of Big Data and will greatly expand the classes of models implemented. We will develop and implement:
(Aim 1) methods to enable BGLR to carry out computations using inputs that are stored in distributed binary files, without fully loading data into RAM-this will open great opportunities for the analysis of big omic data sets;
(Aim 2) a BGLR module to fit a diverse array of interaction models, including interactions between categorical (e.g., sex, treatment) or quantitative (e.g., BMI) risk factors with whole- genome data (e.g., SNPs, expression profiles);
(Aim 3) methods to incorporate prior information (e.g., annotation) into whole genome regressions; and, (Aim 4) instruments for online training. The successful achievement of our aims will provide researchers with efficient data analysis tools for whole-genome analysis of large omic data sets.
The analysis of big genomic data is challenging and requires the development of specialized software. Our group has developed, tested and now maintain the Bayesian Generalized Linear Regression R-library (available at CRAN, BGLR). BGLR implements, in a unified framework, a large collection of Bayesian models for high-dimensional genetic data analyses, including: various shrinkage and variable selection procedures, semi-parametric regression methods (RKHS) and pedigree-models. For the renewal of our grant we have identified a set of improvements and developments that will make BGLR better suited for the analyses of big genomic data and will expand the types of models implemented by accommodating different type interactions as well as the possibility of incorporating prior information (e.g., annotation) into whole-genome regressions.
Bernal Rubio, Yeni L; González-Reymúndez, Agustin; Wu, Kuan-Han H et al. (2018) Whole-Genome Multi-omic Study of Survival in Patients with Glioblastoma Multiforme. G3 (Bethesda) 8:3627-3636 |
Enciso-Rodriguez, Felix; Douches, David; Lopez-Cruz, Marco et al. (2018) Genomic Selection for Late Blight and Common Scab Resistance in Tetraploid Potato (Solanum tuberosum). G3 (Bethesda) 8:2471-2481 |
Bellot, Pau; de Los Campos, Gustavo; Pérez-Enciso, Miguel (2018) Can Deep Learning Improve Genomic Prediction of Complex Human Traits? Genetics 210:809-819 |
Sun, Mengying; Vazquez, Ana I; Reynolds, Richard J et al. (2018) Untangling the complex relationships between incident gout risk, serum urate, and its comorbidities. Arthritis Res Ther 20:90 |
de Los Campos, Gustavo; Vazquez, Ana Ines; Hsu, Stephen et al. (2018) Complex-Trait Prediction in the Era of Big Data. Trends Genet 34:746-754 |
Lello, Louis; Avery, Steven G; Tellier, Laurent et al. (2018) Accurate Genomic Prediction of Human Height. Genetics 210:477-497 |
Kim, Hwasoon; Grueneberg, Alexander; Vazquez, Ana I et al. (2017) Will Big Data Close the Missing Heritability Gap? Genetics 207:1135-1145 |
Pickens, C Austin; Vazquez, Ana I; Jones, A Daniel et al. (2017) Obesity, adipokines, and C-peptide are associated with distinct plasma phospholipid profiles in adult males, an untargeted lipidomic approach. Sci Rep 7:6335 |
Pérez-Enciso, M; de Los Campos, G; Hudson, N et al. (2017) The 'heritability' of domestication and its functional partitioning in the pig. Heredity (Edinb) 118:160-168 |
González-Reymúndez, Agustín; de Los Campos, Gustavo; Gutiérrez, Lucía et al. (2017) Prediction of years of life after diagnosis of breast cancer using omics and omic-by-treatment interactions. Eur J Hum Genet 25:538-544 |
Showing the most recent 10 out of 40 publications