Big Data has the potential to revolutionize cancer research and care, but extracting the information it holds on the optimal strategies for cancer control will require cutting-edge tools in data science. The optimal strategies for cancer control will be dynamic strategies that adapt clinical decisions over time to a patient?s evolving clinical history. Unfortunately, conventional statistical methods cannot appropriately compare dynamic strategies, so we need methods specifically designed for this task: g-methods. G-methods have helped to shape clinical care in many areas, but they have not been systematically applied to cancer research. Further, while g-methods let us validly estimate the effect of pre-specified strategies, these may not be the optimal strategies. My overarching goal is to apply and further develop analytic methods to learn the optimal strategies for cancer control from complex longitudinal data and generate user-friendly, publicly-available software to make these methods available to the cancer research community. I will apply these methods to answer key clinical questions across the prostate cancer control continuum: 1) the optimal dietary and lifestyle strategies to prevent aggressive prostate cancer, 2) the optimal screening strategy following a baseline PSA test to maximize detection of aggressive disease while minimizing detection of indolent tumors, and 3) the optimal statin therapy strategy to maximize survival among men with nonmetastatic prostate cancer. This project will leverage data from a large prospective cohort study and a novel platform of electronic health records linked with genetic data. I will first apply g-methods to estimate the effects of recommended strategies for cancer control that a randomized trial would have limited feasibility to evaluate. I will then investigate whether novel methods that learn the optimal strategies from the data may lead to improved, targeted recommendations that get the right interventions to the right people at the right time. This innovative project will advance comparative effectiveness research for cancer care at the cutting edge of data science. I am optimally positioned to undertake this research based on my 1) expertise in cancer, epidemiology, and causal inference; 2) exceptional multidisciplinary mentoring team comprised of global leaders in their respective fields; and 3) unparalleled research environment to support my career development. Through this work, I will expand my expertise in new areas, including machine learning. The proposed research and training will help me achieve my long-term career goal to become an independent investigator and lead a transdisciplinary research program that integrates causal inference and machine learning to identify optimal strategies for cancer control. Leveraging rich, existing data, this proposal represents a significant opportunity to develop, apply, and disseminate powerful methods for big clinical data to accelerate progress in cancer research and care.

Public Health Relevance

We will answer key clinical questions about the optimal strategies for prostate cancer prevention, detection, and treatment by applying innovative analytic methods to complex longitudinal data. This research will result in the development, application, and dissemination of powerful methods for big clinical data that accelerate progress in cancer research and care.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Career Transition Award (K99)
Project #
Application #
Study Section
Subcommittee I - Transistion to Independence (NCI)
Program Officer
Radaev, Sergey
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Harvard University
Public Health & Prev Medicine
Schools of Public Health
United States
Zip Code