Given a graph, such as a social/computer network, or the blogo- sphere, how will a virus (or rumor or new product) propagate in it? Will it take over, creating a pandemic? How to select the 'k' best nodes/edges for immunization, or, conversely, the best 'k' blogs for the fastest dissemination of a new idea? The an- swer to such questions is vital, for public health, for network security, for market penetration, for blog monitoring and many more applications.

In the PI's past work, they studied arbitrary graphs, and showed that propagation depends on a single number, namely, the first eigenvalue of the adjacency matrix of the network. Specifically, they studied the so-called 'epidemic threshold' for flu-like propagation (``SIS'' model = susceptible-infectious-susceptible), on un-directed, un-weighted static graphs. All earlier work fo- cused on full cliques, or homogeneous graphs, or specific cases of power-law graphs - *all* of which are special cases of the PI's eigenvalue result.

The major thrusts of the current proposal are two. The first is *theory*: For a mumps-like model (``SIR'' = susceptible - infect- ed - recovered), and for additional models, when will a virus re- sult in a pandemic? What can we say about weighted graphs? About time-evolving graphs, like an ad-hoc network of mobile phone users? The second thrust is on *algorithms*: Given a graph, a virus model (SIS, SIR, etc), and a fixed budget of 'k' nodes/edges to immunize, how can we quickly find an optimal or near-optimal solution, to best contain the virus? How can we modify the algorithm, when the network changes over time?

The TECHNICAL MERIT of the work is that it is the first to focus on *arbitrary* graphs, thus including real ones. In contrast, the vast majority of past analytical work makes unrealistic as- sumptions about the graph topology (cliques, homogeneous graphs etc.).

The BROADER IMPACT is high, as dynamics of large-scale graphs ap- pear in numerous settings: cascades on blogs; product penetration and viral marketing; rumor/information propagation; immunization policies; advertisement policies etc.

For further information see the project web page: URL: www.cs.cmu.edu/~christos/NSF-PROJECTS/Immunization/

Project Report

Will a virus (or new idea, or new product) take over a given network, or will it die out soon? Given limited resources (say, vaccines), what is the best way to allocate them, to contain the virus best? These are the questions that motivated this work. TECHNICAL MERIT. The main contribution was the 'G2' theorem, which, under mild assumptions, generalizes previous results in two settings: (a) for any network topology and (b) for (almost) any virus type (flu, with no immunity; mumps, with lifetime immunity, etc) we have a closed-form solution for the epidemic threshold, also known as the tipping point. Below this condition, the virus (product, idea) will quickly become extinct; above it, it may survive and create an epidemic. What is valuable is that the only measure that matters from the topology of the network, is the so-called 'eigenvalue' of the connectivity matrix - that alone tells us how well connected is the network. At the high level, the eigenvalue is roughly related to the average number of connections per node ('degree'). Based on this ``eigenvalue'' observation, we were able to derive near-optimal immunization policies, for a long list of settings: time-evolving graphs, mobile phone networks, patient-transfer hospital networks, and many more. The second line of work studied the shape of waves of activity over time, where we found that the growth phase seems to be exponential, while the decay phase seems to be a power-law with a surprising exponent of -1.5. This is the 'spikeM' model, which seems to fit well several settings, like 'harry potter' queries on google-analytics, memes over time ('yes we can'), as well as malware propagation over time. The third line of work studied competing viruses, where we showed that, regardless of the network topology, 'winner takes all' under mild assumptions. This was known as Gause's law, for fully connected networks, but we were the first to generalize it to arbitrary topologies. Finally, we also provide parameter-free algorithms that spot the 'culprits' of an infection/meme/rumor. This could be useful for historical analysis of epidemiology or criminology data, as well as marketing. Our work attracted a 'best paper' award in CIKM, a top data mining conference. BROADER IMPACT. We collaborated with medical doctors (Dr. Iwashyna, Dr. Don Burke). The goal was to find near-optimal strategies of resourse allocation to hospitals. Our algorithms run in minutes, instead of weeks; and they achieve 6 times fewer infections than the current strategy, when applied on real networks (US hospitals and their patient-transfer patterns).

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1017415
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2010-09-01
Budget End
2013-08-31
Support Year
Fiscal Year
2010
Total Cost
$499,682
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213