Transfer learning provides crucial techniques for utilizing data from related studies that are conducted under different contexts or on diverse populations. It is an important topic with a wide range of applications in integrative genomics, neuroimaging, computer vision and signal processing. This research work will provide new tools to scientific researchers who routinely collect and analyze high dimensional and complex data across different sources and platforms. This project aims to develop new analytical tools to improve conventional methods by delivering more informative and interpretable scientific findings. The developed transfer learning algorithms, which can reliably extract and combine knowledge from diverse data types and across different studies, will help address important issues from genomics applications. User-friendly software packages will be developed and made publicly available. Scientific researchers can use the tools to translate dispersed and heterogeneous data sources into new knowledge and medical benefits. This will help improve the understanding of the role of various genetic factors in complex diseases, and accelerate the development of new medicines and treatments in a cost-effective way.

Transfer learning for large-scale inference aims to extract and transfer the knowledge learned from related source domains to assist the simultaneous inference of thousands or even millions of parameters in the target domain. We aim to develop a general framework to gain understanding of the benefits and caveats of transfer learning in a wide range of large-scale inference problems including sparse estimation, false discovery rate analysis, sparse linear discriminant analysis and high-dimensional regression. Our research addresses two key issues in transfer learning: (a) What should be transferred? (b) How to transfer and prevent negative learning? We aim to pursue three major research goals. The first is to develop a class of computationally efficient and robust transfer learning algorithms for high-dimensional sparse inference. The general strategy is to first learning the local sparsity structure of the high-dimensional object through auxiliary data and then apply the structural knowledge to the target domain by adaptively placing differential weights or setting varied thresholds on corresponding coordinates. The second is to formalize a decision-theoretic framework for high-dimensional transfer learning that is applicable across the sparse and non-sparse regimes. Along this direction, we aim to develop a class of kernelized nonparametric empirical Bayes methods for data-sharing shrinkage estimation and multiple testing. The third is to address the urgent needs and new challenges arising from important genomics applications using the newly developed methods.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

National Science Foundation (NSF)
Division of Mathematical Sciences (DMS)
Standard Grant (Standard)
Application #
Program Officer
Yong Zeng
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pennsylvania
United States
Zip Code