We propose developing an algorithm and user-friendly software to better identify treatments using Medicare claims data. We will validate our approach using procedures listed in the Surveillance, Epidemiology, and End Results (SEER) database as a gold standard. In this way, we hope to better match procedures identified using Medicare claims data with SEER listed procedures. The focus of this research is observational (i.e. non-randomized) data. Well-run randomized clinical trials can provide the best level of evidence of treatment effects. However, randomized trials in the United States have suffered from poor accrual for many interventions. Despite the fact that well-designed randomized clinical trials should be the gold standard, well-designed observational studies might be the only method of obtaining inferences concerning comparative effectiveness for some cancer interventions. In cancer research, one of the most commonly used databases for observational research is the linked SEER-Medicare database. SEER-Medicare data has provided useful measurements of the effectiveness of a number of cancer therapies. Algorithms for identifying relevant treatment and diagnosis codes using Medicare data are often based on clinical reasoning and scientific evidence. One group of researchers, for example, developed an algorithm for identifying laparoscopic surgery among kidney cancer cases before claims codes for laparoscopic surgery were well developed. While such algorithms are useful for others pursuing similar investigations, there may still be substantial mismatch between treatment identified by the SEER cancer registry and treatment identified through Medicare claims. In this work, we propose developing a rigorous machine learning algorithm that can help researchers in better identifying treatments in Medicare claims data. Specifically, we will design a neural language modeling algorithm and implement a software system that finds vector representations of diagnosis and procedure codes. We plan on using the neural language modeling algorithm to learn vector representations from SEER- Medicare claims data where related procedure and diagnosis codes are neighbors (i.e. closely related). We will investigate whether the codes we identify within neighborhoods correspond to the procedure codes used for published SEER-Medicare studies. We will then design a software assistant interface that will allow an investigator to explore which codes are related to a given seed of diagnosis or procedure codes. Finally, we will investigate the sensitivity and specificity of the algorithm by comparing procedures identified using Medicare claims with procedures listed in the SEER database. We will replicate analyses from a published SEER-Medicare paper to investigate if estimated treatment effects differ when using our novel algorithm compared to using the algorithm in the published paper.

Public Health Relevance

In cancer research, one of the most commonly used databases for observational research is the linked Surveillance, Epidemiology, and End Results (SEER)-Medicare database. To improve the identification of procedures when using Medicare claims data, we will design a software assistant interface that will allow an investigator to explore which codes are related to a given seed of diagnosis or procedure codes. This should improve the identification of procedures when using Medicare claims data, and make conclusions drawn from analyses using the database more reliable and consistent.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Exploratory/Developmental Grants (R21)
Project #
5R21CA202130-02
Application #
9188540
Study Section
Special Emphasis Panel (ZCA1)
Program Officer
Mariotto, Angela B
Project Start
2015-12-01
Project End
2018-11-30
Budget Start
2016-12-01
Budget End
2018-11-30
Support Year
2
Fiscal Year
2017
Total Cost
Indirect Cost
Name
Research Institute of Fox Chase Cancer Center
Department
Type
DUNS #
064367329
City
Philadelphia
State
PA
Country
United States
Zip Code
19111
Bai, Tian; Chanda, Ashis Kumar; Egleston, Brian L et al. (2018) EHR phenotyping via jointly embedding medical concepts and words into a unified vector space. BMC Med Inform Decis Mak 18:123
Gilbert, Elizabeth A; Krafty, Robert T; Bleicher, Richard J et al. (2017) On the Use of Summary Comorbidity Measures for Prognosis and Survival Treatment Effect Estimation. Health Serv Outcomes Res Methodol 17:237-255
Bai, Tian; Chanda, Ashis Kumar; Egleston, Brian L et al. (2017) Joint Learning of Representations of Medical Concepts and Words from EHR Data. Proceedings (IEEE Int Conf Bioinformatics Biomed) 2017:764-769