The Coronavirus Disease 2019 (COVID-19) pandemic has caught the world off guard, reshaping ways of life, the economy, and healthcare delivery. Data in electronic health records (EHRs) should be widely available to study COVID-19 but have not yet been effectively shared across clinical sites, with public health agencies, or with policy makers. There are several large, national and international projects to build informatics infrastructure to analyze the EHR data of patients with COVID-19. However, aggregating data from multiple EHRs only works if you can trust the final results. This means being able to go back to each site and talk to the people who know the data best, to understand the local clinical guidelines, coding practices, data quality problems, and other factors that affect the data. In March, 2020, we launched an international effort called the Consortium for Clinical Characterization of COVID-19 by EHR (4CE). It brings together more than 100 informatics experts, statisticians, and ICU doctors from around the world. The novel aspect of 4CE is that we recognize the complexities of EHR data and the need to directly involve the local data experts, not only in the data collection, but also in the development of research questions and the data analyses. We try to move fast, believing that early intelligence is worth more than complete intelligence later. To do this, we avoid roadblocks that typically slow down informatics projects, such as building or installing new software, or the regulatory hurdles involved in sharing patient-level data. Instead, we ask participating sites to run analyses locally, using simple existing tools, like SQL, R, and Python scripts, and only share aggregate counts and statistics centrally with the rest of the 4CE consortium. We review and validate the data as a group, identify and fix data quality problems, and ask sites to repeat the analyses until everything is right. Through multiple cycles of data verification, we iteratively clean up the data and gain confidence that the findings we are seeing are real. Because we can do this quickly, we go from research question to results in just a few weeks. This proposed project will ?productize? the 4CE approach, through three Specific Aims: (1) Transition 4CE to ?Phase 2?, where sites will begin more complex local analyses. We will develop Phase 2 analysis scripts; update our data upload, validation, and visualization websites; and, test the Phase 2 scripts at three sites before expanding to the rest of the consortium. (2) Demonstrate and evaluate 4CE through two use cases. We will refine and validate an algorithm for identifying COVID-19 patients with ?severe? disease and use 4CE to characterize central nervous system complications in COVID-19. (3) Develop a plan for integrating with complementary efforts and long-term sustainability. As part of this, we will create a guide that shows sites how to use 4CE data extracts and quality checks to support other COVID-19 informatics projects, including the generation of OMOP files.

Public Health Relevance

Data in electronic health records (EHRs) should be widely available to study COVID-19 but have not yet been effectively shared across clinical sites, with public health agencies, or with policy makers. The Consortium for Clinical Characterization of COVID-19 by EHR (4CE) addresses this problem by running analyses locally at more than 100 hospitals worldwide and sharing the aggregate results with the public through interactive data visualizations.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project--Cooperative Agreements (U01)
Project #
Application #
Study Section
Program Officer
Wiley, Kenneth L
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Brigham and Women's Hospital
United States
Zip Code
Yu, Zhi; Kim, Seoyoung C; Vanni, Kathleen et al. (2018) Association between inflammation and systolic blood pressure in RA compared to patients without RA. Arthritis Res Ther 20:107
Can, Anil; Castro, Victor M; Dligach, Dmitriy et al. (2018) Lipid-Lowering Agents and High HDL (High-Density Lipoprotein) Are Inversely Associated With Intracranial Aneurysm Rupture. Stroke 49:1148-1154
Mosley, Jonathan D; Feng, QiPing; Wells, Quinn S et al. (2018) A study paradigm integrating prospective epidemiologic cohorts and electronic health records to identify disease biomarkers. Nat Commun 9:3522
Chen, Chia-Yen; Lee, Phil H; Castro, Victor M et al. (2018) Genetic validation of bipolar disorder identified by automated phenotyping using electronic health records. Transl Psychiatry 8:86
Fossey, Robyn; Kochan, David; Winkler, Erin et al. (2018) Ethical Considerations Related to Return of Results from Genomic Medicine Projects: The eMERGE Network (Phase III) Experience. J Pers Med 8:
Prado, Maria G; Iversen, Maura D; Yu, Zhi et al. (2018) Effectiveness of a Web-Based Personalized Rheumatoid Arthritis Risk Tool With or Without a Health Educator for Knowledge of Rheumatoid Arthritis Risk Factors. Arthritis Care Res (Hoboken) 70:1421-1430
Sparks, Jeffrey A; Iversen, Maura D; Yu, Zhi et al. (2018) Disclosure of Personalized Rheumatoid Arthritis Risk Using Genetics, Biomarkers, and Lifestyle Factors to Motivate Health Behavior Improvements: A Randomized Controlled Trial. Arthritis Care Res (Hoboken) 70:823-833
Wei, Wei-Qi; Li, Xiaohui; Feng, Qiping et al. (2018) LPA Variants Are Associated With Residual Cardiovascular Risk in Patients Receiving Statins. Circulation 138:1839-1849
Can, Anil; Castro, Victor M; Dligach, Dmitriy et al. (2018) Elevated International Normalized Ratio Is Associated With Ruptured Aneurysms. Stroke 49:2046-2052
Can, Anil; Rudy, Robert F; Castro, Victor M et al. (2018) Low Serum Calcium and Magnesium Levels and Rupture of Intracranial Aneurysms. Stroke 49:1747-1750

Showing the most recent 10 out of 30 publications