Malley, James D.

Center for Information Technology

This projects studies statistical learning machines as applied to biomedical and clinical prediction, probabilitiy assignment, regresssion, and ranking problems. The algorithms involved include Random Forests, support vector machines, neural networks, and variations of the boosting algorithm. These are all recently developed techniques orginally constructed by the machine learning community, and which are only now starting to see applications in biomedical problems. These methods were not designed through familiar parametric statistical reasoning, but using the more advanced methods of nonparametric density estimation, are known to be provably Bayes risk consistent. They are, therefore, well-adapted to large data, especially whole genome data. More recently we have found methods to calculate risk and hazard using probability machines. These methods, now called Risk Machines, are entirely model-free and are provably valid using current techniques in mathematical statistics. No model or parametric input is required by the researcher. Personalized risks can be calculated for individuals relative to any possible predictor or environmental hazard or any interaction between gene and environment. These methods solve the problems first posed in our earlier book: "Statistical Learning for Biological Data" (coauthors, K Malley and S Pajevic;published 2011). The practical applications of these solutions, including Risk Machines, will appear in our next book: "Estimation of Risk and Probability: A Machine Learning Approach" (in preparation). Consistent, valid estimation of genetic risks can be found using risk machines, for such problem as childhood-onset schizophrenia. Also, new predictive features can be constructed from observed features that account for known problem in statistical genetics such as recombination hot spots and linkage disequilibrium. These synthetic features can then be evaluated separately for risk estimation to the subject. This is another example of personalized medicine provided by statistical learning machines.

- Agency
- National Institute of Health (NIH)
- Institute
- Center for Information Technology (CIT)
- Type
- Investigator-Initiated Intramural Research Projects (ZIA)
- Project #
- 1ZIACT000271-11
- Application #
- 8746530
- Study Section

- Project Start
- Project End
- Budget Start
- Budget End
- Support Year
- 11
- Fiscal Year
- 2013
- Total Cost
- $85,800
- Indirect Cost

Showing the most recent 10 out of 13 publications