This research investigates optimization methods for constructing higher-order learning machines to solve very large classification problems such as those found in data mining and character recognition. In higher-order learning machines, the input vectors are mapped nonlinearly into a higher-dimensional space and then a linear discriminant is constructed in the higher- dimensional space. The resulting discriminant function may be a polynomial, neural network, radial basis function classifier, or decision tree. Vapnik's Statistical Learning Theory is used within the problem formulation to avoid over-fitting or over- parameterizing the discriminant function. To solve the resulting minimization problem, innovative optimization methods are developed. This work will have both an educational and research impact. The optimization methods for classification problems will also be used within computer-based course materials for teaching concepts of mathematical programming and as a basis for research projects for undergraduates. The research impact will be the development of novel, fast, and accurate classification methods based on statistical learning theory. These methods will be applied to very large practical applications in a diverse set of fields, for example: character recognition in engineering, cancer diagnosis in medicine, database marketing in business, and prediction of mortgage prepayment in finance.