Sharable, innovative and scalable methods for abstracting relevant characteristic patient phenotypes from electronic health records (EHRs) and for systematically understanding disease relationships are critical for accomplishing precise disease diagnoses and personalized disease prevention and treatment for patients. As of May 28, 2020, there are 5,716,271 confirmed 2019 Novel Coronavirus (COVID-19) cases worldwide, including 1,699,933 cases in the United States, and 356,124 deaths across over 200 countries, areas, and territories including 100,442 deaths in the United States, with the numbers continually climbing. The pandemic has had profound economic, social, and public health impact. As Columbia University Irving Medical Center (CUIMC) has been fighting the virus on the frontline in the epicenter of New York City and treating more than 4,100 SARS-CoV-2 positive patients, we aim to address the urgent COVID-19 Public Heath need by developing sharable phenotyping methods to identify and characterize COVID-19 cases using our EHR data and multiple data standards, including the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) and the Human Phenotype Ontology (HPO), and generate novel knowledge about COVID- 19, such as its risk factors, disease subtypes, and temporal clinical courses.
Our specific aims for this supplement are as follows: Extension to the original Aim 1: Develop and validate scalable and sharable approaches to abstracting characteristic phenotypes of COVID-19 from both structured and unstructured EHR data and to standardize the concept representations of these EHR phenotypes using widely adopted data standards, including the OMOP CDM, HPO, SNOMED-CT, UMLS, and RxNorm. Extension to the original Aim 3: Develop and validate methods for temporal phenotyping for COVID-19 and methods for identifying disease subtypes of varying clinical outcomes among heterogeneous populations using deep characteristic EHR phenotypes of COVID-19. We will disseminate the resulting methods and knowledge with the broad scientific communities and the nation. We will also leverage this supplement to create research and training opportunities for postdocs and graduate students from biomedical informatics, data science and computer science, advancing interdisciplinary collaborations in data science and biomedical informatics to combat COVID-19 and other health problems.