Transgender and gender nonconforming (TGNC) people face a disproportionate burden of adverse health outcomes. Although there is a growing body of literature on the unique health issues among TGNC populations, they remain severely underserved as existing data on TGNC health are scarce. Under-reporting is common due to issues related to social and economic marginalization, stigma, and discrimination, leading to challenges in obtaining population-based estimates since TGNC individuals are often unwilling to self-identify and reluctant to participate in traditional surveys. Further, past TGNC research has primarily focused on mental health, substance use and abuse, and sexual transmitted infections and diseases. There is limited data available on age-related chronic conditions such as cancer, the second leading cause of death in the United States. Nonetheless, cancer is one of the top research priorities among the TGNC population. With a rapidly growing aging TGNC population, there is an urgent need to characterize the cancer burden among these individuals and understand how cancer impact them differentially compared to non-TGNC individuals. On the other hand, rapid adoption of electronic health record (EHR) systems has made longitudinal clinical data available for research. EHRs contain not only important structured data, such as demographics, diagnoses, procedures, and medications, but also unstructured clinical narratives such as physician?s notes. More than 80 percent of the clinical information is documented in clinical narratives, which contain more detailed patient information including gender identity and cancer risk factors. Motivated by these observations and built upon our previous studies on 1) the adequacy of TGNC gender identity terms, 2) clinical natural language processing methods for information extraction, and 3) EHR-based cohort studies, we propose to conduct a population-based cohort analysis to examine the cancer burden and risk factors among TGNC people using a unique data source from a large network of EHRs?OneFlorida, one of the 13 PCORI-funded clinical data research networks (CDRNs) contributing to the PCORnet. Using both structured and unstructured OneFlorida data, we will first develop computable phenotypes to identify TGNC individuals and subsequently evaluate their cancer risk. Our research is significant because: 1) no population-based cohort studies on cancer risk have been conducted among the TGNC population. Our results will support the development of tailored, evidence- based cancer screening programs for TGNC people; 2) our research will create a cohort of TGNC people that can be not only tracked longitudinally in EHR but also recruited for future clinical studies; and 3) working with a PCORnet CDRN makes our analysis framework generalizable to the overall PCORNet. Overall, the proposed research will advance our knowledge in cancer among the aging TGNC population.
Our project will fill an important gap in our knowledge of cancer burden and risk factors in transgender and gender nonconforming (TGNC) people, a sexual and gender minority (SGM) group. A computable phenotype that can accurately identify TGNC cohorts in large networks of electronic health records (EHRs) enables us to monitor TGNC health longitudinally, which is significant for aging-related diseases such as cancer. Built upon this work, our future work can focus on building informatics tools to support the long-term surveillance and health monitoring of SGMs using large-scale national networks of EHRs.