Large heterogeneous feature sets are quite common in biological studies such as genetic, transcriptomic, proteomic and metabolomic information and in electronic health records. The goal of personalized medicine is often to link this information to therapeutic responses. Higher accuracy prediction can assist in selecting the most desirable therapy for each individual patient. Some of the latest machine-learning tools, such as deep learning based on convolutional neural networks, have shown great promise in various areas of image-based predictive modeling but are often unsuitable for scenarios involving non-image based large feature sets that appear quite frequently in biological scenarios. The project develops a novel framework termed REFINED (REpresentation of Features as Images with NEighborhood Dependencies) to represent high-dimensional vectors as compact images that increases the accuracy of machine-learning models trained on such datasets and is able to handle heterogeneous feature set as well. Successful implementation of the innovation will assist in the goal of higher-accuracy predictive modeling from biological datasets. The developed algorithms will be made available online in a user-friendly manner. Investigators are deeply involved in educating and training the next generation of students at all levels with attention to minority and underrepresented groups.
The project involves the design of a novel regression framework that can convert scalar and functional predictors into mathematically justifiable image objects that can be processed by convolutional networks based deep-learning methodologies. Preliminary results illustrated on biological datasets show the higher prediction accuracy of the framework as compared to existing methodologies while maintaining desirable properties in terms of bias. The specific project contributions involve (a) an innovative design for representation of high-dimensional scalar features as images with neighborhood dependencies that results in high accuracy predictive modeling using Convolutional Neural Network based deep learning (b) extension of the image-based representation to incorporate functional changes in predictors and outputs. The project also explores the theoretical underpinnings for this new predictive-modeling framework for biological scenarios. The framework can be applied to any biological-prediction problem where the predictors have scalar, functional and/or image attributes. The successful completion of this project will result in a new effective tool for feature representation and function-on-function regression and will be a significant methodology to perform object regression.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.