Bioinformatics is the application of statistics and computer science to the field of molecular biology. It has emerging as a field unto itself, as the datasets that are generated by modern biomedical researchers easily exceeds what can be directiy analyzed. Core C will work with the data generated from massive parallel sequencing from human, mouse and zebrafish, to extract variants that are potential to cause disease. The PIs of Cores A, B and C have worked together extensively in the past, and have an established track record of productivity in the area of next generation sequencing (NGS) data analysis. Dr. Bafna has worked broadly in bioinformatics and genomics in the development computational methodologies employing novel algorithms and statistical techniques for NGS datasets. We envision that the WES data generated from Core B will be delivered to Core C for extraction ofthe potentially deleterious sequence variants (PDSVs), which will be delivered back to each of the Projects for segregation analysis and further validation. This will be accomplished by developing the four key pipelines of Core C: 1] WES data tracking and storage pipeline, 2] WES data analysis pipeline, 3] Mutation identification pipeline, 4] Comparative genomics pipeline. The analysis of WES datasets is presented in this application as a series of filters that is applied to the primary sequence to extract all relevant variants, and then apply a heuristic ranking strategy to detect the PDSVs mostly likely associated with the phenotype. The output of this FILTER and PRIORITIZE programs are then reported as both SNPs and INDELs in a ranked fashion, for later validation and segregation testing. Further analysis will help uncover the contribution of these genes to common disease as well as genome- wide gene-gene interactions using other software we have developed. We are also well-positioned to take full advantage of the 3rd generation DNA sequencers, and are excited that UCSD will serve as one of the national HHMI PacBio Sequencing Centers. These tools, together with the outstanding and unique human and animal resources, will make for a powerful combination to investigate new causes of structural brain disorders.
The Bioinformatics Core (Core C) will work with all Projects and Cores to integrate large datasets for analysis and ranking of likely causative variants. Core C will also maintain key metrics of next generation sequencing and report back deficiencies in coverage or systematic trends in data recovery. Core C will integrate new sequencing approaches in Core B and comparative genomic approaches to unify all projects.
Showing the most recent 10 out of 70 publications