This project is aimed at developing a new framework of ensemble modeling of speech signals to address the long standing challenge of robust and accurate recognition of spontaneous speech. Toward this goal, random forests based allophonic clustering is used to construct ensemble models of allophones by random sampling the variables underpinning the allophonic variations; data sampling is used to enrich the diversity of the ensemble models by balancing within-set data sufficiency and between-sets data diversity; functional discriminative training is used to further optimize the efficiency and accuracy of the ensemble models. Experimental evaluations of these methods are performed on a standard speech recognition task to facilitate direct assessments of their efficacy by the speech research community. The ensemble modeling approach promises higher accuracy performance and lower computation costs than the current multiple system integration approach, owing to the improved likelihood scores contributed by the ensemble models in local steps of decoding search. The approach as advocated in this project opens up a new paradigm for investigating the many issues in speech acoustic modeling, it offers a new way for ensemble modeling of structured data generally, and therefore it has the potential of significantly impacting the fields of speech recognition and other machine learning applications. The research findings are disseminated via journal publication, conference presentation, and a website. The methods of this project have broad applications in speech recognition and structured data classification, and particularly they are employed to improve the accuracy performance of a telemedicine automatic captioning system.