Statistical models for genetics data are often surprisingly challenging, and often require advanced and new statistical methods. Using probability machines on whole genome data is a recent invention, with the original research on probability machines appearing in Methods of Information in Medicine (September 2011). Our methods point to refined and personalized probability predictions using a wide range of biomarkers, medical information and whole genome data. The detection of childhood-onset schizophrenia using 800,000 snps using probability machines has error rates of 15% or less, and the list of predictive snps can be filtered down to a list of less than a few hundred. Other studies of psychiatric conditions (ADHD, bipolar) are also now underway using probability machines and personalized medicine, subject-specific predictions.

Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Center for Information Technology
Zip Code
Shah, Mona; Mamyrova, Gulnara; Targoff, Ira N et al. (2013) The clinical phenotypes of the juvenile idiopathic inflammatory myopathies. Medicine (Baltimore) 92:25-41
Malley, J D; Kruppa, J; Dasgupta, A et al. (2012) Probability machines: consistent probability estimation using nonparametric learning machines. Methods Inf Med 51:74-81
Kim, Yoonhee; Li, Qing; Cropp, Cheryl D et al. (2011) Performance of random forests and logic regression methods using mini-exome sequence data. BMC Proc 5 Suppl 9:S104
Dasgupta, Abhijit; Sun, Yan V; K├Ânig, Inke R et al. (2011) Brief review of regression-based and machine learning methods in genetic epidemiology: the Genetic Analysis Workshop 17 experience. Genet Epidemiol 35 Suppl 1:S5-11
Nicodemus, Kristin K; Malley, James D (2009) Predictor correlation impacts machine learning algorithms: implications for genomic studies. Bioinformatics 25:1884-90
Strobl, Carolin; Malley, James; Tutz, Gerhard (2009) An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychol Methods 14:323-48
Kim, Yoonhee; Wojciechowski, Robert; Sung, Heejong et al. (2009) Evaluation of random forests performance for genome-wide association studies in the presence of interaction effects. BMC Proc 3 Suppl 7:S64