Recent advances in computing and measurement technologies have led to an explosion in the amounts of data that are being collected in all areas of application. Much of these data have complex structure, in the form of text, images, video, audio, streaming data, and so on. This project focuses on one important class of problems, viz, data with network structure. Such data are common in diverse engineering and scientific areas, such as biology, computer science, electrical engineering, economics, and sociology. While there has been extensive research on networks (primarily outside the field of Statistics), much of it deals with characterizing and modeling network structures using link information only. The goal of the current research program is to exploit the node features as additional information and develop statistical methods that take into account both link and node information. The research program will make significant contributions in several areas, including Statistics, Biology, Computer Science, Electrical Engineering, Physics, Psychology, and Sociology. The educational program also includes substantial initiatives that will involve undergraduate and graduate students and expose them to state-of-the-art research in the topics related to the project.
The research aims to develop new statistical methodologies and associated theory that exploit the network structure in the data. Such data are becoming increasingly common in various fields. Specifically, the investigator aims to study three different but related problems: a) link prediction for partially observed networks, which deals with the situation where the network we observe is the true network with observation errors; b) community detection in networks with node features, which combines network link information and additional information on the nodes to improve community detection; c) learning network structures, which deals with the situation where one is interested in identifying the underlying network structure from the data.