Deep learning for representation of codes used for SEER-Medicare claims research

Egleston, Brian; Vucetic, Slobodan

Abstract

We propose developing an algorithm and user-friendly software to better identify treatments using Medicare claims data. We will validate our approach using procedures listed in the Surveillance, Epidemiology, and End Results (SEER) database as a gold standard. In this way, we hope to better match procedures identified using Medicare claims data with SEER listed procedures. The focus of this research is observational (i.e. non-randomized) data. Well-run randomized clinical trials can provide the best level of evidence of treatment effects. However, randomized trials in the United States have suffered from poor accrual for many interventions. Despite the fact that well-designed randomized clinical trials should be the gold standard, well-designed observational studies might be the only method of obtaining inferences concerning comparative effectiveness for some cancer interventions. In cancer research, one of the most commonly used databases for observational research is the linked SEER-Medicare database. SEER-Medicare data has provided useful measurements of the effectiveness of a number of cancer therapies. Algorithms for identifying relevant treatment and diagnosis codes using Medicare data are often based on clinical reasoning and scientific evidence. One group of researchers, for example, developed an algorithm for identifying laparoscopic surgery among kidney cancer cases before claims codes for laparoscopic surgery were well developed. While such algorithms are useful for others pursuing similar investigations, there may still be substantial mismatch between treatment identified by the SEER cancer registry and treatment identified through Medicare claims. In this work, we propose developing a rigorous machine learning algorithm that can help researchers in better identifying treatments in Medicare claims data. Specifically, we will design a neural language modeling algorithm and implement a software system that finds vector representations of diagnosis and procedure codes. We plan on using the neural language modeling algorithm to learn vector representations from SEER- Medicare claims data where related procedure and diagnosis codes are neighbors (i.e. closely related). We will investigate whether the codes we identify within neighborhoods correspond to the procedure codes used for published SEER-Medicare studies. We will then design a software assistant interface that will allow an investigator to explore which codes are related to a given seed of diagnosis or procedure codes. Finally, we will investigate the sensitivity and specificity of the algorithm by comparing procedures identified using Medicare claims with procedures listed in the SEER database. We will replicate analyses from a published SEER-Medicare paper to investigate if estimated treatment effects differ when using our novel algorithm compared to using the algorithm in the published paper.

Public Health Relevance

In cancer research, one of the most commonly used databases for observational research is the linked Surveillance, Epidemiology, and End Results (SEER)-Medicare database. To improve the identification of procedures when using Medicare claims data, we will design a software assistant interface that will allow an investigator to explore which codes are related to a given seed of diagnosis or procedure codes. This should improve the identification of procedures when using Medicare claims data, and make conclusions drawn from analyses using the database more reliable and consistent.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Cancer Institute (NCI)
Type: Exploratory/Developmental Grants (R21)
Project #: 1R21CA202130-01
Application #: 9023921
Study Section: Special Emphasis Panel (ZCA1)
Program Officer: Mariotto, Angela B

Project Start: 2015-12-01
Project End: 2017-11-30
Budget Start: 2015-12-01
Budget End: 2016-11-30
Support Year: 1
Fiscal Year: 2016
Total Cost
Indirect Cost

Institution

Name: Research Institute of Fox Chase Cancer Center
Department
Type
DUNS #: 064367329

City: Philadelphia
State: PA
Country: United States
Zip Code: 19111

Related projects


NIH 2017 R21 CA	Deep learning for representation of codes used for SEER-Medicare claims research Egleston, Brian L.; Vucetic, Slobodan / Research Institute of Fox Chase Cancer Center
NIH 2016 R21 CA	Deep learning for representation of codes used for SEER-Medicare claims research Egleston, Brian L.; Vucetic, Slobodan / Research Institute of Fox Chase Cancer Center

Publications

Bai, Tian; Chanda, Ashis Kumar; Egleston, Brian L et al. (2018) EHR phenotyping via jointly embedding medical concepts and words into a unified vector space. BMC Med Inform Decis Mak 18:123

Gilbert, Elizabeth A; Krafty, Robert T; Bleicher, Richard J et al. (2017) On the Use of Summary Comorbidity Measures for Prognosis and Survival Treatment Effect Estimation. Health Serv Outcomes Res Methodol 17:237-255

Bai, Tian; Chanda, Ashis Kumar; Egleston, Brian L et al. (2017) Joint Learning of Representations of Medical Concepts and Words from EHR Data. Proceedings (IEEE Int Conf Bioinformatics Biomed) 2017:764-769

Comments

Be the first to comment on Brian Egleston's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: