The Training Component of the Center for Causal Modeling and Discovery (CCMD) of Biomedical Knowledge from Big Data will (1) train data scientists to advance CMD methods to answer biomedical questions with Big Data, (2) train biomedical investigators to plan and conduct CMD analyses of large complex datasets, and (3) train users of the Center's software to quickly and easily apply CMD tools to Big Data problems. Individuals participating in CCMD training activities are expected to include graduate Students, postdoctoral students, young investigators and established investigators from academia and industry, across our own Center, Pittsburgh, other BD2K Centers, and beyond. The Training Component takes advantage of the unique environment at CMU and Pitt to contribute to the achievement of the following BD2K goals: 1) promote the careers of new and early-stage investigators, 2) contribute to the broad, effective dissemination of the approaches, methods, software, tools, and related resources developed by the CCMD, 3) develop innovative approaches to training in the skills necessary to do research in the area of Big Data science, and 4) share training methods and materials with other Centers as well as the broader community. We will leverage the extensive foundation we have built over nearly two decades to educate data scientists and domain scientists in the theory and application of CMD methods. We will develop unique training materials including innovative training software, to support a wide range of training activities targeted to our main constituencies. Professional training activities will include online tutorials, a new online course, integration with core curricula, CMD workshops, and a one-week CCMD Summer School. The benefits of our professional training activities will be immediately felt in our three training programs (two of which are NIH funded), and will extend out to our Universities and others, as well as other BD2K Centers. Efforts aimed at users include training videos, online documentation, software Webinars, Developer Office Hours, and Hackathons. The Training Component will increase the reach of the Center by promoting the Center's approaches and products within multiple scientific communities.

Public Health Relevance

The Center for Causal Modeling and Discovery (CCMD) of Biomedical Knowledge from Big Data will train graduate students, postdoctoral students, young and established investigators from academia and industry, to advance CMD methods that answer biomedical questions with Big Data, process CMD analyses of large complex datasets, and to efficiently apply CMD tools to Big Data problems using the Center's software.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Specialized Center--Cooperative Agreements (U54)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-BST-R (52))
Program Officer
Brooks, Lisa
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pittsburgh
United States
Zip Code
Raghu, Vineet K; Ramsey, Joseph D; Morris, Alison et al. (2018) Comparison of strategies for scalable causal discovery of latent variable models from mixed data. Int J Data Sci Anal 6:33-45
Huang, Biwei; Zhang, Kun; Lin, Yizhu et al. (2018) Generalized Score Functions for Causal Discovery. KDD 2018:1551-1560
Zhang, Kun; Schölkopf, Bernhard; Spirtes, Peter et al. (2018) Learning causality and causality-related learning: some recent progress. Natl Sci Rev 5:26-29
Meyer, Wynn K; Jamison, Jerrica; Richter, Rebecca et al. (2018) Ancient convergent losses of Paraoxonase 1 yield potential risks for modern marine mammals. Science 361:591-594
Naeini, Mahdi Pakdaman; Cooper, Gregory F (2018) Binary Classifier Calibration Using an Ensemble of Piecewise Linear Regression Models. Knowl Inf Syst 54:151-170
Lu, Songjian; Fan, Xiaonan; Chen, Lujia et al. (2018) A novel method of using Deep Belief Networks and genetic perturbation data to search for yeast signaling pathways. PLoS One 13:e0203871
Ponzoni, Luca; Bahar, Ivet (2018) Structural dynamics is a determinant of the functional significance of missense variants. Proc Natl Acad Sci U S A 115:4164-4169
Ding, Michael Q; Chen, Lujia; Cooper, Gregory F et al. (2018) Precision Oncology beyond Targeted Therapy: Combining Omics Data with Machine Learning Matches the Majority of Cancer Cells to Effective Therapeutics. Mol Cancer Res 16:269-278
Sedgewick, Andrew J; Buschur, Kristina; Shi, Ivy et al. (2018) Mixed Graphical Models for Integrative Causal Analysis with Application to Chronic Lung Disease Diagnosis and Prognosis. Bioinformatics :
Andrews, Bryan; Ramsey, Joseph; Cooper, Gregory F (2018) Scoring Bayesian Networks of Mixed Variables. Int J Data Sci Anal 6:3-18

Showing the most recent 10 out of 61 publications