We propose to create a massively scalable toolkit to enable large, multi-center Patient-centered Information Commons (PIC) at local, regional and, national scale, where the focus is the alignment of all available biomedical data per individual. Such a Commons is a prerequisite for conducting the large-N, Big Data, longitudinal studies essential for understanding causation in the Precision Medicine framework while simultaneously addressing key complexities of Patient Centric Outcome Research studies required under ACA (Affordable Care Act). This agenda entails the four following aims:
Aim 1 : Create an individual patient data identification and retrieval toolkit that is robust across distributed data of wide variety and geographically scattered. Robustness with regard to a variety of organizational structures and national scalability is emphasized.
Aim 2 : Generate a complete diagnostic and prognostic 'data'picture of a patient across multiple sources of data, some of which are noisy and sparse.
Aim 3. Enable robust decentralized computation on large-scale data with the Patient-centered Information Commons Big Data Science Platform (PIC-DSP), particularly in configurations where data are generated in locations other than where computational resources are most available.
Aim 4 : Create three patient-centered information commons instances (PICIs) to test all aspects of the toolkit developed. We have selected neurodevelopmental disorders as our first PICI, as it fulfills several criteria (wide variety of data types and scales, collaborator engagement, multiple healthcare institutions, and opportunity to rigorously test and refine features of the tool).

Public Health Relevance

The proposed Patient-centered Information Commons will allow investigators to link and analyze patient level data on a large scale in population size but also data variety: from clinical health record, to prospectively gathered research data, survey and administrative data, genomic, imaging, socio-behavioral, and environmental data. This will allow these investigators to achieve new levels of precision in diagnosis and prognosis as well as measuring the conduct and quality of medical practice.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Specialized Center--Cooperative Agreements (U54)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-BST-R (52))
Program Officer
Brooks, Lisa
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Harvard Medical School
United States
Zip Code
Luo, Yuan; Szolovits, Peter (2016) Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records. Biomed Inform Insights 8:29-38
Brown, Adam S; Patel, Chirag J (2016) MeSHDD: Literature-based drug-drug similarity for drug repositioning. J Am Med Inform Assoc :
McGinnis, Denise P; Brownstein, John S; Patel, Chirag J (2016) Environment-Wide Association Study of Blood Pressure in the National Health and Nutrition Examination Survey (1999-2012). Sci Rep 6:30373
Patel, Chirag J; Manrai, Arjun K; Corona, Erik et al. (2016) Systematic correlation of environmental exposure and physiological and self-reported behaviour factors with leukocyte telomere length. Int J Epidemiol :
Patel, Chirag J; Pho, Nam; McDuffie, Michael et al. (2016) A database of human exposomes and phenomes from the US National Health and Nutrition Examination Survey. Sci Data 3:160096
Brown, Adam S; Kong, Sek Won; Kohane, Isaac S et al. (2016) ksRepo: a generalized platform for computational drug repositioning. BMC Bioinformatics 17:78
Leppert, John; Patel, Chirag (2016) Perspective: Beyond the genome. Nature 537:S105
Manrai, Arjun K; Wang, Brice L; Patel, Chirag J et al. (2016) REPRODUCIBLE AND SHAREABLE QUANTIFICATIONS OF PATHOGENICITY. Pac Symp Biocomput 21:231-42
Hoogendoorn, Mark; Szolovits, Peter; Moons, Leon M G et al. (2016) Utilizing uncoded consultation notes from electronic medical records for predictive modeling of colorectal cancer. Artif Intell Med 69:53-61
Li, Junlong; Zhao, Lihui; Tian, Lu et al. (2016) A predictive enrichment procedure to identify potential responders to a new therapy for randomized, comparative controlled clinical studies. Biometrics 72:877-87

Showing the most recent 10 out of 14 publications