Diagnosis-based risk adjustment is widely used in the US and abroad for health plan payment, notably for Medicare Parts C and D, for commercial contracting and quality assessment, and in numerous state Medicaid programs. Yet the risk adjustment technology used for payments has not kept up with improved classification systems, larger patient datasets, improved estimation algorithms or recent theoretical and clinical developments. Our work will take advantage of the richer ICD-10-CM classification system, in use since October 2015, with over 5 times as many diagnoses as ICD-9-CM Codes. ICD-10 codes now recognize: left vs. right side for thousands of conditions, distinguish between initial, subsequent and sequela diagnoses, and incorporate hundreds of new clinical, demographic and biometric variables. Based on the ICD-10, more exact models can leverage increased diagnostic coding accuracy to reduce opportunities for gaming or discriminating against patients with conditions who are predicted to be unprofitable. Led by two of the three developers of the Centers for Medicare and Medicaid Services Hierarchical Condition Category (CMS-HCC) existing classification system, our team of physicians, public policy experts, statisticians and economists will comprehensively improve the accuracy of risk adjustment and predictive models using larger sample sizes, clinical judgment and state-of-art economic and statistical modeling. We will also expand the conventional regression methods explored, to include machine learning algorithms, constrained regression, and LASSO estimation. We will calculate a new ?appropriateness to include? (ATI) score that captures diagnostic vagueness, discretion and suitability for use in risk adjustment models, and use this score to inform which variables are included in plan payment formulas. Selection incentives remain of concern in public US health plan payments formulas and may be costing Medicare over $5 billion per year (NBER 2017). Prediction and payment models from this project can reduce overpayment and offset plan incentives to skimp on services that attract sick people. To ensure that these models and formulas are useful for enrollees of all ages, they will initially be calibrated and tested on large commercially-insured claims data, covering ages 0 to 64. They will then be validated and refined for Medicare, Medicaid, and state employees using data from All-Payer Claims Data from five states and a second large commercial dataset. We will make development steps, statistical programs, and full details of the classification system and prediction formulas publicly available for comment, refinement, and use by health care delivery system researchers, payers and providers.

Public Health Relevance

This project will develop new classification systems and new prediction and payment models that take advantage of the fivefold increase in diagnostic codes available with the October 2015 change from ICD- 9-CM to ICD-10-CM. Using data from two national claims datasets and five state all-payer claims datasets that collectively cover over 75 million enrollees, we will identify new, underutilized ICD-10 capabilities, create new clusters of diagnoses useful for prediction, develop new algorithms for using these clusters, and estimate formulas that predict spending, utilization and diverse health care outcomes for all ages. Methods and results will be publicly described and software posted on the web for use in risk adjustment and diverse clinical, financial, policy evaluation, and quality assessment outcomes by health care delivery system researchers, payers and providers.

National Institute of Health (NIH)
Agency for Healthcare Research and Quality (AHRQ)
Research Project (R01)
Project #
Application #
Study Section
Healthcare Systems and Values Research (HSVR)
Program Officer
Hellinger, Fred
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Boston University
Schools of Arts and Sciences
United States
Zip Code