This proposal aims to develop computational tools for analyzing complex pathology and radiology image data as well genomics data. Recent technological innovations are enabling scientists to capture complex imaging and genomic data from different views. However, the major computational challenges are due to the unprecedented scale and complexity of heterogeneous data analytics. To solve the key and challenging problems in mining such comprehensive heterogeneous image and genomic data, the PI proposes to develop novel large scale learning tools and explore ways to integrate features from multiple data sources for clinical outcome prediction. It will greatly support the Precision Medicine Initiative, which has become a national goal and was unveiled by the U.S. government as a research effort designed to enable physicians to select individualized treatments. This project will facilitate the development of novel educational tools to enhance several current courses.

The PI proposes an integrated research and education plan based on the following three components: (1) big image analytics and feature extraction, in which novel sparse convolution kernels, sparse deformable models and quantitative topology measurements are proposed to extract local and global features to fully characterize whole images; (2) large scale feature learning, in which domain knowledge guided sparse feature learning models and non-convex sparse feature learning models are proposed for large scale image marker discovery; and (3) multi-source image-omics data integration, in which sparse multi-view learning and large scale learning with bipartite graph are developed for big image-omics data integration, where the image-omics refers to both image data (pathology images or radiology images) and omics data (genomics, proteomics or metabolomics) captured from the same patient. This project will advance research in efficient feature learning from giga-pixel images, and in integrating heterogeneous image-omics data for outcome prediction and knowledge discovery. The success of this project will create a new paradigm for medical image informatics and big data.

National Science Foundation (NSF)
Division of Information and Intelligent Systems (IIS)
Application #
Program Officer
Amarda Shehu
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Texas at Arlington
United States
Zip Code