Connected data is a hallmark of the Internet age. We now have an unprecedented ability to collect information i) on social relationships from websites like Facebook; ii) on connections between ideas from hyperlinked repositories such as Wikipedia; iii) on links between scientific fields from online cross-referenced citation databases; iv) on interactions between proteins in biology; and v) even on the connections in the human brain. Network computations, such as finding the most important people, ideas, papers, or the most important connections, help refine these raw collections of information into meaningful summaries. Consequently, making these computations fast and efficient will help produce scientific insights from the growing plethora of data available.

A highly successful paradigm for stating network computations is as the solution of a matrix problem. For instance, such an approach was the heart of Google's celebrated PageRank algorithm for finding the most important pages on the web. Modern connected data, however, is so large that it has eclipsed the ability of even the best algorithms from the 20th century to cope. Interesting network computations have become more complicated as well. The investigator will study a new class of algorithms to compute nonlinear functions of matrices, such as the matrix exponential. The matrix exponential has many uses; for example, it underlies many new computations designed to identify the most important relationships in neural networks. Standard techniques for the matrix exponential involve examining all of the connections at each step (and there could be hundreds or thousands of steps), only to highlight a few pieces of information. A more recent paradigm, called local computations, only utilizes the connections from a few entities (in the matrix, they only look at a few rows or columns) at a time. The goal of this research is to design new algorithms for the matrix exponential and other functions of matrices in the local computations paradigm. These new algorithms will be able to operate on the world's largest networks quickly (ideally in seconds or minutes), and help application specialists study their data in new ways. Three driving applications will be ranking and voting, link prediction, and brain networks. The investigation will also include the study of higher-order connections in networks that give rise to three or four dimensional matrices -- commonly called tensors.

All of the software developed for this research will be made available in a software package for local computations of matrix functions. The investigator will present tutorials on this software package to ensure that researchers across many disciplines can utilize the outcome of this research. To ensure that this research reaches students across many disciplines, the investigator will develop a graduate course on the use of matrix methods for network computations. Finally, given the growing importance of network data, the investigator will develop a module for high school students to show how solving systems of equations, part of the core high school curriculum, can be used to analyze information networks.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Application #
1149756
Program Officer
Tracy Kimbrel
Project Start
Project End
Budget Start
2012-05-01
Budget End
2019-04-30
Support Year
Fiscal Year
2011
Total Cost
$568,943
Indirect Cost
Name
Purdue University
Department
Type
DUNS #
City
West Lafayette
State
IN
Country
United States
Zip Code
47907