The research objective of this proposal is to address the computational challenges in an innovative BIGDATA application on imaging-omics based precision medicine. Recent advances in high-throughput imaging (such as histopathology image) and multi-omics (such as DNA sequence, RNA expression, methylation, etc.) technologies created new opportunities for exploring relationships between histology, molecular events, and clinical outcomes using quantitative methods. However, the unprecedented scale and complexity of these imaging-omic data have presented critical computational bottlenecks requiring new concepts and enabling tools. This project builds a new computational framework to integrate novel big data mining algorithms with cloud and high-performance computing strategies for revealing complex relationships between histopathology images, multi-omics, and phenotypic outcomes. This project is innovative and crucial not only to facilitating the development of new big data mining techniques, but also to addressing emerging scientific questions in imaging-omics and many other biomedical applications. The developed methods and tools are expected to impact other cancer genomics research and enable investigators working on cancer medicine to effectively test their scientific hypothesis. This project facilitates the development of novel educational tools to enhance several current courses. University of Texas at Arlington is a minority-serving institution and has large population of Hispanic and Black Americans. This project engages the minority students and under-served populations in research activities to give them a better exposure to cutting-edge science research.
To solve the key and challenge problems in big imaging-omics data mining, this project explores the following research tasks. First, the large-scale non-convex sparse learning models are developed for identifying outcome-relevant phenotypic traits from big histopathology images. Second, the biological domain knowledge is utilized to guide the sparse learning models to uncover the molecular bases of complex traits. Third, the data integration models are designed to integrate imaging-omics data from multiple sources and discover the heterogeneous biomarkers. Fourth, the Baysian learning model is explored to predict longitudinal cancer outcomes. Fifth, the cloud computing and high-performance computing strategies are developed to support the big imaging-omics data mining, such as optimizations for various data mining workloads on heterogeneous hardware (e.g. GPU and NUMA multicore processors) to fully unlock the potential of data center hardware. It is innovative to integrate big data mining algorithms with cloud and high-performance computing to imaging-omics that hold great promise for a systems biology of the precision medicine.