Network data and text data are frequently found in many different areas of science and business (for example, social networks, genetic networks, news, tweets, and blogs). A problem of great interest in statistics and machine learning is how to extract information from network data and text data. The research in this project will have several components: (a) collecting and cleaning large-scale network data sets and text data sets, (b) developing new models, methods, and theory for analyzing network data and text data, and (c) analyzing a large data set, which the PI and collaborators have collected and cleaned (the data set is on the publication data of statisticians). The research will have an impact in social science, business, and especially in understanding the publication patterns and trends of statisticians. The project also provides training opportunities for undergraduate and graduate students.
Network data and text data are widely available and contain valuable information for answering many scientific problems. This project will focus on network analysis and text learning and will make contributions in the following topics: (a) development of models for network and text data, (b) development of computationally feasible approaches to network global testing, pairwise comparison, dynamic network analysis, and text learning, and (c) development of new approaches for studying the research patterns and trends of the statistical community. The research will lead to new data sets and new methods and theory in the application areas of social networks, genetic networks, and text mining, and will have an impact in social science, business, and biology.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.