Discovery of pathways that implicate complex diseases in humans is at the forefront of biomedical research. Many scientists are specifically interested in discovery of local causal pathways that contain only direct causes and direct effects of the phenotype or target molecule of interest. In the current project we propose a new framework and methods to enable accurate discovery of local causal pathways by integrating high-throughput observational data with efficient experimentation strategies. At the core of this framework are computational causal discovery methods that account for multiplicity of causal pathways consistent with the data. This phenomenon confounds the causal role of the variables and leads to a large number of false negative and false positive predictions in the output of all current causal discovery algorithms. The framework is designed specifically for biomedical researchers by taking into consideration their significant resource limitations and experimental workflow. For this reason, one of the primary objectives of the proposed framework is to minimize the use of costly wet-laboratory experimental resources while achieving high discovery accuracy. The proposed project extends our prior work, where we have studied the phenomenon of multiplicity of molecular signatures and causal pathways consistent with the data and provided a family of new methods (called TIE*) that can provably and efficiently discover from observational data all signatures of the phenotype. Even though TIE* methods can extract multiple signatures of the disease, determining its local causal pathway and causal role of the involved molecular variables requires new methods that are proposed herein. We hypothesize that the new methods for discovery of local causal pathways from a combination of observational and experimental data can achieve higher discovery accuracy than existing observational approaches while using fewer experimental resources than existing experimental approaches. Briefly, we propose to develop new accurate and experimentally efficient local causal pathway discovery methods;extensively evaluate new and existing methods both in realistic in-silico and real biological data and pathways;improve understanding of assumptions of these methods and their practicality in high-throughput data;and apply these methods to two ongoing front-line biomedical projects to generate and experimentally validate new insights about two diseases. The first biomedical project aims to understand molecular mechanisms leading to metastasis and lymph node involvement from locally advanced breast cancer. The second biomedical project aims to unravel the fatty liver disease-related local causal pathways.

Public Health Relevance

The proposed project will provide new causal pathway discovery methods that will help bring the biomedical research community substantially closer to its goals of understanding molecular mechanisms that cause and control the development and progression of diseases. This research will have significant and wide methodological and practical implications spanning many areas of biomedicine and will generate immediate benefits for personalized medicine and development of new drugs and therapies to effectively fight human diseases.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
New York University
Internal Medicine/Medicine
Schools of Medicine
New York
United States
Zip Code
Ma, Sisi; Kemmeren, Patrick; Aliferis, Constantin F et al. (2016) An Evaluation of Active Learning Causal Discovery Methods for Reverse-Engineering Local Causal Pathways of Gene Regulation. Sci Rep 6:22558
Attur, Mukundan; Krasnokutsky, Svetlana; Statnikov, Alexander et al. (2015) Low-grade inflammation in symptomatic knee osteoarthritis: prognostic value of inflammatory plasma lipids and peripheral blood leukocyte biomarkers. Arthritis Rheumatol 67:2905-15
Attur, M; Statnikov, A; Samuels, J et al. (2015) Plasma levels of interleukin-1 receptor antagonist (IL1Ra) predict radiographic progression of symptomatic knee osteoarthritis. Osteoarthritis Cartilage 23:1915-24
Ma, Sisi; Kemmeren, Patrick; Gresham, David et al. (2014) De-novo learning of genome-scale regulatory networks in S. cerevisiae. PLoS One 9:e106479
Statnikov, Alexander; Lytkin, Nikita I; Lemeire, Jan et al. (2013) Algorithms for Discovery of Multiple Markov Boundaries. J Mach Learn Res 14:499-566
Statnikov, Alexander; Alekseyenko, Alexander V; Li, Zhiguo et al. (2013) Microbiomic signatures of psoriasis: feasibility and methodology comparison. Sci Rep 3:2620
Udyavar, Akshata R; Hoeksema, Megan D; Clark, Jonathan E et al. (2013) Co-expression network analysis identifies Spleen Tyrosine Kinase (SYK) as a candidate oncogenic driver in a subset of small-cell lung cancer. BMC Syst Biol 7 Suppl 5:S1
Bai, Jane P F; Alekseyenko, Alexander V; Statnikov, Alexander et al. (2013) Strategic applications of gene expression: from drug discovery/development to bedside. AAPS J 15:427-37
Statnikov, Alexander; Henaff, Mikael; Lytkin, Nikita I et al. (2012) New methods for separating causes from effects in genomics data. BMC Genomics 13 Suppl 8:S22