I am a PhD statistician who specializes in observational studies and comparative effectiveness research (CER). I have just completed the first year of a three-year fellowship in the statistics department of Stanford University. My long-term goal is to become a tenure track professor in a health services, biostatistics or statistics department. Project 1: Make an instrumental variable technique, known as """"""""near-far matching,"""""""" more accessible to CER researchers. Near-far matching is similar to propensity score matching, but is capable of estimating unbiased treatment effects even when there is confounding caused by unobserved covariates. Project 2: Combine causal inference techniques (i.e., ways to obtain unbiased treatment effects) with cutting edge predictive modeling techniques (e.g., nonparametric, fast algorithms for discovering interesting parts of the covariate space). The proposed techniques can use observational data to identify subpopulations which have large (or very small) responses to a given treatment (a.k.a. """"""""treatment effect heterogeneity""""""""). Interestingly, we propose a prediction technique which is still valid when there is unobserved confounding. Project 3: Propose an empirical Bayes technique which uses patient-level information (e.g., demographic covariates) plus patient-level longitudinal experience (e.g., repeated measurements of HA1c or pain scores) to enhance prognostic ability for chronic care patients. I have assembled an interdisciplinary team of scientist from Stanford University who are highly committed to my career development. My primary mentor is Phil Lavori PhD, the chair of the Department of Health Research and Policy. My co-mentor is Mark Cullen MD, chief of the Division of General Medical Disciplines in Stanford Medical School. In addition to my mentors, I have four advising consultants: two statistician - Trevor Hastie and Tze Lai;two health economists - Victor Fuchs and Jay Bhattacharya. I have specific projects I will work on with each of these researchers. These projects require regular meetings between me and each of my supporters. Much of my mentoring will come through an apprenticeship model of collaboration. The career development program outlined in this application contains formal mentorship, didactic coursework, and seminars structured around three areas: (1) predictive modeling, (2) longitudinal analysis and (3) health disparities and health systems. During the first year of the K99 phase I will audit three classes that specifically address these three topics. Stanford has very active statistics and CER groups;I will attend several regularly reoccurring workshops and seminars in order to build my experience with these research communities. Additionally, I will have biweekly meetings with Professor Cullen's Alcoa Study research group - led by Professor Cullen and consisting of postdocs, research fellows, and junior faculty discussing research issues.

Public Health Relevance

Part of the goal of patient centered outcomes research (PCOR) is to provide evidence-based guidance on which types of treatments produce better outcomes for patients. The gold standard for proof of treatment efficacy is a randomized clinical trial (RCT). This is primarily true because well run RCTs, through researcher- controlled randomization, tend to produce clear, dependable and easily defendable evidence for the relative efficacy of treatments. Unfortunately, in many situations running an RCT is prohibitively expensive, unethical, would take too long to produce actionable results, or could not be run because clinical equipoise has been lost through belief and consensus rather than evidence-based research. This is to say, in PCOR there is a need for extracting evidence-based guidance from sources other than RCTs. I am a PhD statistician who specializes in observational studies. The techniques I develop and deploy are good at mitigating confounding (a.k.a. selection bias) in non-RCT settings. Naive statistical tools such, as t- tests and regression, do not usually produce unbiased estimates of treatment effects. In fact, they only do so under extremely specific conditions (essentially, exclusively in RCT settings). Ignoring this fact can be dangerous. Inappropriately assuming a statistical technique produces unbiased treatment effects leads to treating women with hormone replacement therapy, and potentially causing worse outcomes than would have happened otherwise. I worry that it has almost become a platitude, but it is surely the case that causation does not imply causation. I have developed a technique, near-far matching, which can be thought of as an amalgamation of propensity score matching and instrumental variables. It uses the RCT framework to guide the construction of an observational study. In certain situations, near-far matching can reduce confounding from both observed as well as unobserved covariates. In order to establish myself as a high quality CER statistician I propose in Aim 1 to (1) provide clearer tools for PCOR researchers to think about, implement and critique IV studies and (2) establish myself as a thought leader in observational studies and in instrumental variables in particular. The Stanford statistics department is world-renown for several major contributions to the literature - for example: bootstrapping, sparsity, probability theory, and in particular its work on predictive modeling (a.k.a. machine learning). Predictive modeling uses observed correlation to predict outcomes. These techniques are capable of dealing with big data (millions of observations and thousands of covariates and interaction terms) and are revolutionizing many different disciplines. But predictive modeling has grown up separately from the causal inference literature. In the prediction literature, there are no notions of counter factual or treatment effect or confounding on unobservables. In Aim 2 I propose a project with Trevor Hastie, a seminal figure in the predictive literature, to merge part of the causal inference framework onto predictiv modeling techniques. By establishing a connection between two very different parts of the statistics literature, I hope to create a valuable opportunity for the CER literature by producing powerful tools for identifying treatment effect heterogeneity in subpopulations. I currently know very little about the predictive modeling literature, but learning-by-doing will put me in an excellent spot to make unique contributions to three different communities - CER, causal inference and predictive modeling. Given the increase in burden due to chronic care issues, a serious PCOR researcher needs to be able to engage longitudinal data. Determining who will become sick, over the long run, can benefit everyone because it gives us the opportunity to intervene early, before chronic issues start to cascade out of control. Unfortunately, the techniques I am most familiar with are limited to assessing treatments in the acute care setting. In Aim 3 I propose introducing a dynamic empirical Bayes methodology into the health risk assessment literature. This project will (1) train me in longitudinal predictive modeling and (2) introduce into the CER literature a better statistical methodology for predicting health outcomes in chronic care settings. Through each of these aims, and through additional projects outlined in my Training Activities During Award Period section, I will learn the social, behavioral, societal and structural frameworks for understanding health disparities, health services, health systems and comparative effectiveness. Through this K99/R00 I will become a better CER/PCOR statistician and I will do so through projects which create new methodologies for PCOR researchers. I will learn a lot;and as I learn, I will help produce tools for people to use.

National Institute of Health (NIH)
Agency for Healthcare Research and Quality (AHRQ)
Career Transition Award (K99)
Project #
Application #
Study Section
Special Emphasis Panel (ZHS1-HSR-C (01))
Program Officer
Willis, Tamara
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Stanford University
Biostatistics & Other Math Sci
Schools of Arts and Sciences
United States
Zip Code
Prochaska, Judith J; Michalek, Anne K; Brown-Johnson, Catherine et al. (2016) Likelihood of Unemployed Smokers vs Nonsmokers Attaining Reemployment in a One-Year Observational Study. JAMA Intern Med 176:662-70
Sun, Dennis L; Harris, Naftali; Walther, Guenther et al. (2015) Peer Assessment Enhances Student Learning: The Results of a Matched Randomized Crossover Experiment in a College Statistics Class. PLoS One 10:e0143177