Recent technological and scientific advances have allowed the acquisition of vast amounts of various types of data. Such an abundance of information should lead to new scientific understanding and breakthroughs. However, the large-scale nature of this data introduces serious complications that choke classical data analysis techniques, leading to a stagnation of scientific progress in many areas. This issue requires novel mathematical techniques in order to effectively extract and analyze the information. This project will use Lyme disease data (through a collaboration with LymeDisease.org) as a motivating example in the design and testing of the methods, as it serves as a prime example of complex large-scale data with very significant impact to a fast growing community. The results of this project will thus have swift societal impact; for example, analysis on the LymeData will not only further the understanding of the disease itself, but will also lead to more accurate and precise diagnoses, and more personalized and effective treatments for patients. In addition, this proposal will support the education of postdoctoral, graduate and undergraduate students, and facilitate outreach efforts aimed especially at increasing the participation of under-represented populations. To accomplish this task, in addition to the activities funded by this proposal, the PIs will utilize existing programs such as the Women In Technology Sharing Online (WitsOn) program, Women in Data Science and Mathematics Research Collaboration Workshop (WiSDM), and MAPS 4 College of Los Angeles, all in which the PIs are already actively involved, to recruit under-represented populations and to promote the mathematical and technical sciences.

The fundamental research in this project will center around three main objectives, each addressing a particularly important challenge that arises in large-scale data applications. The first goal is to design innovative data completion techniques that are practical for big data; this will involve the design and theoretical development of data completion methods using non-random (and non-uniform) observation patterns, adaptive sampling schemes, and utilizing additional structures hidden in the observations. Rather than using classical (computationally expensive) convex programming techniques, the project will focus on extremely efficient simple solvers that can be run in real-time during an inference task. Secondly, the team proposes two novel deep learning approaches for inferential tasks that (i) are extremely computationally efficient and can thus be applied to massive datasets, and (ii) achieve the accuracy benefits of modern deep learning approaches, which improve upon state of the art methods. Third, the project will develop critical data fusion techniques that allow data from a wide variety of sources to be analyzed in an aggregated manner. Lastly, the team proposes to combine these three data analysis tasks in a novel multi-stage feedback design where outputs from data completion, deep learning inferences and fusion will be cycled back as inputs to these mechanisms for an iterative and robust inference framework. Progress on these goals will yield new mathematical frameworks in data science, and provide techniques that will be directly applied to large-scale data to allow efficient and powerful data analysis.

Project Start
Project End
Budget Start
2019-04-01
Budget End
2021-08-31
Support Year
Fiscal Year
2019
Total Cost
$290,125
Indirect Cost
Name
University of California Los Angeles
Department
Type
DUNS #
City
Los Angeles
State
CA
Country
United States
Zip Code
90095