In many problems arising in biology, social sciences and various other fields, it is often necessary to analyze populations of entities (e.g., molecules or individuals) interconnected by a network. This proposal intends to develop new statistical formalisms and computational methodologies for modeling and inference the semantic underpinnings of network entities, and investigate how these aspects influence the network topology and its temporal evolution during biological and sociological processes. It will also study a number of yet unexplored topics such as discriminative learning of network structures, recovering temporally evolving network sequences, and related theoretical issues.

The proposed research is envisaged to help address big-picture problems such as: 1) Hidden Identity/Function Induction, e.g., what role(s) do individuals play when they interact with different peers under different conditions? 2) Structural/Organizational Forecast, e.g., whether and how changes of molecular functions lead to alterations of biological pathways? 3) System Robustness, e.g., how a network adjusts to perturbations caused by exogenous intrusions?

This research straddles statistical learning, social/biological sciences and data mining. The intellectual merit of the proposed work lies in both the algorithmic and theoretical novelties of the methodological developments, and the analysis of specific social and biological networks and various other applications enabled by the proposed methods. The main novelties include: (1) new Bayesian formalisms for latent space modeling of node functions and network linkages, which capture the functional/behavioral context of network entities; (2) novel temporal extensions of exponential random graph model for network evolution, and inference/learning algorithms; (3) algorithms for reverse-engineering temporally rewiring networks from longitudinal node attribute data; and (4) novel discriminative learning algorithms for learning very-large networks from partial samples of the network and relevant learning theory. These methods will be applied to the ENRON email network to explore the behavioral patterns under various business operation conditions, and to analyze a longitudinal molecular abundance profile measured from breast cancer cells to infer (alterations of) networks under carcinogenic or tumor-suppressing environments. The results are expected to advance the principles and technologies for network analysis, and enable a wide-range of applications of broader interests.

The proposed research is also expected to have broad educational and societal impact. As an interdisciplinary research effort, this project will provide rich opportunities for multi-disciplinary educational and research training, at both undergraduate and graduate levels. A thorough understanding of social network structures in human populations can have significant impact on important issues such as policy making or technology adoption. Knowledge of cellular networks and its changes in response to exogenous interventions can help reasoning disease causes and designing therapeutic schemes. Our methodological and software deliverables can potentially facilitate such studies, improve the cost-effectiveness of network data collection, and foster future development in this area.

More details of this project can be found at www.cs.cmu.edu/~epxing/projects/network.htm

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0713379
Program Officer
Vasant G. Honavar
Project Start
Project End
Budget Start
2007-09-15
Budget End
2011-08-31
Support Year
Fiscal Year
2007
Total Cost
$429,000
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213