Statistical Learning for Biomedical Data

Malley, James

Abstract

This projects studies statistical learning machines as applied to biomedical and clinical prediction, probabilitiy assignment, regresssion, and ranking problems. The algorithms involved include Random Forests, support vector machines, neural networks, and variations of the boosting algorithm. These are all recently developed techniques orginally constructed by the machine learning community, and which are only now starting to see applications in biomedical problems. These methods were not designed through familiar parametric statistical reasoning, but using the more advanced methods of nonparametric density estimation, are known to be provably Bayes risk consistent. They are, therefore, well-adapted to large data, especially whole genome data. More recently we have found methods to calculate risk and hazard using probability machines. These methods, now called Risk Machines, are entirely model-free and are provably valid using current techniques in mathematical statistics. No model or parametric input is required by the researcher. Personalized risks can be calculated for individuals relative to any possible predictor or environmental hazard or any interaction between gene and environment. These methods solve the problems first posed in our earlier book: """"""""Statistical Learning for Biological Data"""""""" (coauthors, K Malley and S Pajevic;published 2011). The practical applications of these solutions, Risk Machines, will appear in our next book: """"""""Estimation of Risk and Probability: A Machine Learning Approach"""""""" (for Wiley &Sons). Consistent, valid estimation of genetic risks can be found using risk machines, for such problem as childhood-onset schizophrenia. Also, new predictive features can be constructed from observed features that account for known problem in statistical genetics such as recombination hot spots and linkage disequilibrium. These synthetic features can then be evaluated separately for risk estimation to the subject. This is another example of personalized medicine provided by statistical learning machines.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: Center for Information Technology (CIT)
Type: Investigator-Initiated Intramural Research Projects (ZIA)
Project #: 1ZIACT000271-10
Application #: 8565488
Study Section

Project Start
Project End
Budget Start
Budget End
Support Year: 10
Fiscal Year: 2012
Total Cost: $44,209
Indirect Cost

Institution

Name: Center for Information Technology
Department
Type
DUNS #

City
State
Country
Zip Code

Related projects


NIH 2017 ZIA CT	Statistical Learning for Biomedical Data Malley, James D. / Computer Research and Technology
NIH 2016 ZIA CT	Statistical Learning for Biomedical Data Malley, James D. / Computer Research and Technology
NIH 2015 ZIA CT	Statistical Learning for Biomedical Data Malley, James D. / Computer Research and Technology
NIH 2014 ZIA CT	Statistical Learning for Biomedical Data Malley, James D. / Computer Research and Technology
NIH 2013 ZIA CT	Statistical Learning for Biomedical Data Malley, James D. / Center for Information Technology	$85,800
NIH 2012 ZIA CT	Statistical Learning for Biomedical Data Malley, James D. / Center for Information Technology	$44,209
NIH 2010 ZIA CT	Statistical Learning for Biomedical Data Malley, James D. / Center for Information Technology	$84,850
NIH 2009 ZIA CT	Statistical Learning for Biomedical Data Malley, James D. / Center for Information Technology	$211,036

Publications

Battogtokh, Bilguunzaya; Mojirsheibani, Majid; Malley, James (2017) The optimal crowd learning machine. BioData Min 10:16

Holzinger, Emily R; Szymczak, Silke; Malley, James et al. (2016) Comparison of parametric and machine methods for variable selection in simulated Genetic Analysis Workshop 19 data. BMC Proc 10:147-152

Szymczak, Silke; Holzinger, Emily; Dasgupta, Abhijit et al. (2016) r2VIM: A new variable selection method for random forests in genome-wide association studies. BioData Min 9:7

Li, Jing; Malley, James D; Andrew, Angeline S et al. (2016) Detecting gene-gene interactions using a permutation-based random forest method. BioData Min 9:14

Salem, Ghadi H; Dennis, John U; Krynitsky, Jonathan et al. (2015) SCORHE: a novel and practical approach to video monitoring of laboratory mice housed in vivarium cage racks. Behav Res Methods 47:235-50

Holzinger, Emily Rose; Szymczak, Silke; Dasgupta, Abhijit et al. (2015) Variable selection method for the identification of epistatic models. Pac Symp Biocomput :195-206

Malley, James D; Moore, Jason H (2014) First complex, then simple. BioData Min 7:13

Malley, James D; Malley, Karen G; Moore, Jason H (2014) O brave new world that has such machines in it. BioData Min 7:26

Dasgupta, Abhijit; Szymczak, Silke; Moore, Jason H et al. (2014) Risk estimation using probability machines. BioData Min 7:2

Kruppa, Jochen; Liu, Yufeng; Biau, Gérard et al. (2014) Probability estimation with machine learning methods for dichotomous and multicategory outcome: theory. Biom J 56:534-63

Showing the most recent 10 out of 28 publications

Comments

Be the first to comment on James Malley's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: