Big biomedical data has become a key resource for discovery in biomedicine. However, due to their distributed nature and the existence of sensitive and noisy data, they are not utilized to their full potential. For example, lots of biomedical data are generated and stored at different sites such as hospitals, pharmacies, and biomedical labs, numerous challenges could arise when gathering, integrating, and utilizing them. Thus, more suitable and effective techniques are urgently needed to integrate and analyze them. This project aims to design a set of novel algorithmic tools for several fundamental data analytics problems commonly encountered in big biomedical data.
This project could potentially increase the ability of gathering and integrating big biomedical data, effectively exploiting such data, and handling unreliable and sensitive biomedical data. Specifically, the project will focus on four data analysis problems: distributed truth discovery, distributed classification, distributed clustering, and differentially private learning. Each of these problems has already been recognized as critical tools in data analysis and information integration. The project addresses a number of challenging issues (such as communication cost, data reliability, robustness, computational cost, and privacy-preserving), and aim to achieve highly efficient and quality guaranteed solutions for each of these problems. These four research aims will be evaluated based on their effectiveness, efficiency, and the practicality on specific biomedical datasets. Furthermore, this research will provide educational and research opportunities to both graduate and undergraduate students including students from underrepresented groups.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.