Today's massive generation of digital data is greatly outpacing the development of computational methods and tools and presents critical challenges for achieving the full transformative potential of these data. For example, recent advances in acquiring multi-modal brain imaging and genome-wide array data provide exciting new opportunities to study the influence of genetic variation on brain structure and function. Major computational challenges are, however, bottlenecks for comprehensive joint analysis of these data due to their unprecedented scale and complexity. This project will employ the new capabilities of large-scale data mining techniques in multi-view learning, multi-task learning, and robust classification to address critical challenges in systematically analyzing massive multi-modal genetic, imaging, and other biomarker data. Specifically, this project will: (1) develop new multi-view learning methods to detect task-relevant phenotypic biomarkers from large scale heterogeneous imaging and other biomarker data, (2) implement new sparse multi-task regression models to reveal the genetic basis of phenotypic biomarkers at multiple levels (e.g., SNP, haplotype, gene and/or pathway), (3) design novel robust classification methods via structural sparsity for outcome prediction using integrated genotypic and phenotypic data, and (4) package these new methods into a data mining toolkit and release it to the public.

The intellectual merits of this project derive not only from the development of novel data mining methods, but also from their application to imaging genetic studies. These methods are designed to take into account interrelated structures among multiple data modalities and offer systematic strategies to reveal structural imaging genetic associations. The proposed methods and tools are expected to impact neurological and psychological research and enable investigators to effectively test imaging genetics hypothesis and advance biomedical science and technology. In addition, the proposed data mining framework addresses generic critical needs of large-scale data analysis and integration and, therefore, will impact a large number of research areas where high-value knowledge and complex patterns can potentially be discovered from massive high-dimensional and heterogeneous data sets. This project will facilitate the development of novel educational tools to enhance several current courses at UT Arlington and IUPUI. Both universities are minority-serving institutions, and the PIs will engage the minority students and under-served populations in research activities to give them a better exposure to cutting-edge scientific research.

National Science Foundation (NSF)
Division of Information and Intelligent Systems (IIS)
Standard Grant (Standard)
Application #
Program Officer
Sylvia J. Spengler
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Indiana University
United States
Zip Code