This program presents the opportunity to provide a rich genomic data resource to propel pediatric disease research. Key elements to a successful program will be the provision of high quality genome sequence data on well-phenotyped patients and their families; the collection and accessibility of data to the research community in an intuitive manner; and the integration of genetic data with phenotypic information in the context of this program and comparison to other large data resources. The ultimate goal is to assemble a complete catalogue of genes that underlie structural birth defects and pediatric cancer and to enable the use of this information to better understand disease mechanism, diagnostic opportunities and therapeutic direction. We propose to establish a sequencing center at the Broad Institute to serve a resource for the Gabriella Miller Kids First Research Program, as we have done in support of other large flagship NIH genome projects. Our center brings the domain expertise is high throughput data generation, processing and analysis and disease gene discovery required to meet the objectives of the GMKF Program. We will apply deep, high-quality phased whole genome sequencing data on selected samples. We are prepared to apply our well-tested methods for extraction of DNA from a range of sample types, most importantly saliva samples and paraffin-embedded material which are key to pediatric and cancer research. Over the three years period we will process at least 18,000 samples pushing the boundary on new data types and lower cost. We are flexible to a mix of cohort types, whether they are trio based (for structural birth defects) or quads (in cancer studies). We will work with study PIs to introduce new data types from 10X Genomics that we have shown will enable phasing of variants into distinct haplotypes and structural variation discovery. We will also work with investigators to perform follow up and functional validation as needed. A key feature of our center is our implementation of a robust analytical framework for variant assessment and disease gene discovery, which takes advantage of Broad investigators' world-leading roles in statistical genetics, functional annotation, and clinical variant interpretation as well as access to exome and genome data from over 250,000 reference samples. This has enabled us to build a systematic pipeline for gene discovery that will be made freely available to the GMKF program and collaborators. With data produce and processed in a consistent way, we can offer seamless integration of GMKF data into our analytic framework. For many of the diseases targeted by pediatric research community, confident discovery of causal genes will require aggregation of cases across centers around the world. We offer to enable a new standard for data sharing in clinical genomics by rapidly releasing genetic and phenotype data, accelerating collaboration and facilitating robust disease gene discovery.
The overall goal of this project is to generate high quality sequence data to help researchers understand the underlying mechanisms of disease, leading to more refined diagnostic capabilities and ultimately more targeted therapies or interventions.