Approximately 3 million young adults aged 18-44 years currently have diabetes in the United States. This number is projected to increase to ~5.8 million by 2060. Differentiating diabetes types is crucial, because the etiology, treatments, and outcomes of diabetes differ substantially by type. Type 1 diabetes (T1D) accounts for ~17% and type 2 diabetes (T2D) ~75% of total diabetes in US young adults. This distribution of diabetes types continuously evolves. We do not have a large-scale surveillance system to monitor the prevalence and incidence of T1D and T2D in US young adults. The widespread use and increasing functionality of electronic health record (EHR) systems substantially increase the quantity, breadth, and timeliness of data available for surveillance and reduce costs compared with population-based registries and surveys. EHR algorithms have shown great potential in identifying diabetes cases. This study will analyze both structured EHR data (e.g., diagnosis codes, medications, and laboratory results) and unstructured clinical notes. We will apply expert knowledge, machine learning, and natural language processing to develop the best algorithms for identifying prevalent and incident T1D and T2D cases. The primary objective of this study is to establish an EHR-based surveillance system for monitoring the burden of T1D and T2D in US young adults. We will collaborate with 3 EHR research networks from the National Patient-Centered Clinical Research Network (PCORnet), covering ~6 million racially, ethnically, and socioeconomically diverse young adults from 4 states (IL, LA, NY, and TX) in 3 Census regions. The patient populations in this study are roughly representative of the source populations in the catchment areas.
The specific aims of this study are 1) to estimate the prevalence of T1D and T2D in US young adults by age, sex, race/ethnicity, and geographic region in 2019; 2) to estimate the incidence of T1D and T2D in US young adults by age, sex, race/ethnicity, and geographic region in 2019; 3) to estimate 10-year trends in the prevalence and incidence of T1D and T2D in US young adults by age, sex, race/ethnicity, and geographic region, 2014-2023; and 4) to compare the prevalence and incidence of diabetes by type, as well as temporal trends, in US young adults with those in young adults from other countries and regions. This study is innovative, because it will detect a false negative rate as low as 0.2%, leverage EHRs for surveillance (more efficient and cost-effective than registries and surveys), use advanced statistical approaches (e.g., machine learning and natural language processing), estimate a denominator using patient zip codes, build flexibility into the surveillance methods according to local availability of clinical notes, and use a 2-staged sampling approach to improve chart review efficiency. This study will advance our understanding of the age, sex, racial/ethnic, and geographic differences in the burden of T1D and T2D in US young adults. The obtained surveillance data will inform planning for healthcare needs, prioritize the allocation of healthcare resources, and reduce health disparities via identifying and prioritizing subpopulations for prevention of diabetes and related comorbidities.
The burden of diabetes has been increasing considerably in recent decades in US young adults. However, there is no large-scale surveillance system for monitoring the burden of diabetes by type in US young adults. This study will build an efficient and cost-effective multisite surveillance system using electronic health records, to estimate the prevalence, incidence, and temporal trends of type 1 and type 2 diabetes in US young adults according to age, sex, race/ethnicity, and geographic region.