The Training Component of the "Patient-centered Information Commons or "PIC" has chosen to focus on three major elements that will rely (1) on the strength of this team's existing infrastructure at the Center for Biomedical Informatics at Harvard Medical School and (2) the new science proposed for the Data Science Research component of this proposal to support the overall goals of the Big Data to Knowledge initiative. Direct training of the next generation of leaders is offered in two forms, a pre-doctoral-level distributed training initiative and an undergraduate research internship. With the goal of attracting students to the field of big data science, the competitive Distributed Pre-doctoral Program will target students currently enrolled in quantitatively-focused graduate programs across the country who have passed their qualifying exams and would like to engage in a distance collaborative project with faculty at PIC, thereby exposing them to opportunities not available at their local schools. The undergraduate research internship (Summer Institute in Bioinformatics and Integrative Genomics) will offer a nine week, intensive immersion in didactic lectures with leading big data scientists and a mentored research project with PIC faculty. A second major element will develop a series of instructional "Big Data" videos that will be publically available to the community. Choice of topics will be developed in consultation with the Consortium members. Lastly, the PIC training and science teams will host both an annual Big Data Conference and a series of monthly Lectures which will be available to the community via videography (for the Conference) and WebEx (for the Lecture series). Success of these initiatives will be evaluated by a defined set of metrics, including surveys and outcomes assessment.

Public Health Relevance

Insuring the next generation of scientists capable of understanding and applying the cutting edge technologies necessary to the acquisition and management of the increasingly huge volumes of data enabled by technology advancement that has exceeded our ability to fully utilize its byproducts is essential to the rapid advancement of biomedical research in general and precision medicine in particular.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Specialized Center--Cooperative Agreements (U54)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-BST-R (52))
Program Officer
Brooks, Lisa
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Harvard Medical School
United States
Zip Code
Luo, Yuan; Szolovits, Peter (2016) Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records. Biomed Inform Insights 8:29-38
Brown, Adam S; Patel, Chirag J (2016) MeSHDD: Literature-based drug-drug similarity for drug repositioning. J Am Med Inform Assoc :
McGinnis, Denise P; Brownstein, John S; Patel, Chirag J (2016) Environment-Wide Association Study of Blood Pressure in the National Health and Nutrition Examination Survey (1999-2012). Sci Rep 6:30373
Patel, Chirag J; Manrai, Arjun K; Corona, Erik et al. (2016) Systematic correlation of environmental exposure and physiological and self-reported behaviour factors with leukocyte telomere length. Int J Epidemiol :
Patel, Chirag J; Pho, Nam; McDuffie, Michael et al. (2016) A database of human exposomes and phenomes from the US National Health and Nutrition Examination Survey. Sci Data 3:160096
Brown, Adam S; Kong, Sek Won; Kohane, Isaac S et al. (2016) ksRepo: a generalized platform for computational drug repositioning. BMC Bioinformatics 17:78
Leppert, John; Patel, Chirag (2016) Perspective: Beyond the genome. Nature 537:S105
Manrai, Arjun K; Wang, Brice L; Patel, Chirag J et al. (2016) REPRODUCIBLE AND SHAREABLE QUANTIFICATIONS OF PATHOGENICITY. Pac Symp Biocomput 21:231-42
Hoogendoorn, Mark; Szolovits, Peter; Moons, Leon M G et al. (2016) Utilizing uncoded consultation notes from electronic medical records for predictive modeling of colorectal cancer. Artif Intell Med 69:53-61
Li, Junlong; Zhao, Lihui; Tian, Lu et al. (2016) A predictive enrichment procedure to identify potential responders to a new therapy for randomized, comparative controlled clinical studies. Biometrics 72:877-87

Showing the most recent 10 out of 14 publications