Background: The rising demands and health care costs make it urgent to develop new statistical methods to accurately predict high-costs VA patients and important risk factors associated with high costs. The ability to prospectively predict high-costs patients is an important step toward controlling future health care costs. It is also important to identify disease areas that contribute significantly to the high health care costs and other risk factors which policy makers can target by future intervention. Health care cost data are characterized by a high level of skewness and heteroscedastic variances. The large number of variables collected in the VA database provides rich information, but at the same time, imposes great challenges for statistical analysis and computation. The administrative and electronic medical record data from VA databases often contain missing data. The new statistical procedure we propose aims to take advantage of the rich databases in VA for analyzing costs data. It employs and develops state-of-art high-dimensional semiparametric statistical procedures to handle the complexity of VA data sets. Objectives: The project aims to develop a High Costs Prediction (HCP) system, which employs novel high-dimensional semiparametric statistical methods and algorithms to analyze large VA database with missing values and occurrence of censoring. The HCP system identifies potential high-costs patients, provides prediction intervals of future costs, and suggests a list of important risk factors for cost control. The outcomes of the project will help VA researchers and policy makers design effective interventions to target those potential high-cost patients and reduce their costs without sacrificing quality of care. The project will collaborate closely with VA Office of Analytics and Business Intelligence (OABI) to analyze costs data for patients receiving primary care within VHA. In particular, we will identify a set of modifiable risk factors (MRF) that are simultaneously important for improving care and reducing costs. Our proposed work fills in an important blank area of VA health care costs data analysis. By combining the HCP system with the existing Care Assessment Needs Scoring (CAN) system, we will make important progress toward the ultimate goal of building a data-driven decision support system. Methods: The project will develop a novel semiparametric procedure for predicting high costs patients. The approach we propose incorporates high-dimensional covariates and nonlinear covariate effects and addresses the challenge of censoring by death, which improves accuracy and increases the flexibility of modeling. It does not require discretizing the cost and hence fully uses the information contained in the cost data. It does not require any parametric distributional assumption. Another major contribution of this project is that we propose weighted semiparametric quantile regression based novel variable selection procedures which can simultaneously identify and estimate significant risk factors for high-dimensional data at the presence of missing values. Our approach will develop a patient level dataset that combines all available cost data from the databases provided through the Decision Support System (DSS) National Extracts. We will link data from the Managerial Cost Accounting System (MCA, formerly Decision Support System or DSS) with three VA databases including: the VA Patient Treatment File (PTF); the VA Outpatient Clinic File (OCF); and the VA Beneficiary Identification and Records Locator Subsystem death file. We will compare the newly proposed methods with existing methods using both the VA data and simulated data.

Public Health Relevance

The rising demands and health care costs make it urgent to develop new statistical methods to accurately predict high-costs VA patients and identify important risk factors associated with high costs. Our overall goal is to develop a High Costs Prediction (HCP) system that employs novel high- dimensional semiparametric statistical methods to prospectively predict high-costs patients for the next year using data from the current and previous year. We will also identify disease areas and other risk factors that contribute significantly to high health care costs. The project will help improve VA?s use of predictive analytics to more efficiently allocate resources and contribute to building a data- driven decision support system.

National Institute of Health (NIH)
Veterans Affairs (VA)
Non-HHS Research Projects (I01)
Project #
Application #
Study Section
HSR-3 Methods and Modeling for Research, Informatics, and Surveillance (HSR3)
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
VA Puget Sound Healthcare System
United States
Zip Code