Cancer is caused by somatic mutations within the genome of an initiating cell. These mutations take many forms including small single base substitutions, large insertions and deletions, chromosomal rearrangements, and so on. Mutations also vary with respect to their position relative to annotated gene loci. Some mutations occur within exons and have direct and readily predicted effects on protein sequence and function. Other mutations affect gene function indirectly by occurring within regulatory regions that influence gene expression and RNA splicing. Next generation sequencing has transformed the potential to explore the mutational landscapes of human cancers. However, rapid creation of massive complex datasets and a dearth of established methods for integrated analysis of this data have resulted in a critical research bottleneck. To date, research has focused heavily on the most easily detected and interpreted coding mutations occurring within known exons. Mutations in non-coding genes and regulatory elements that govern gene expression and splicing have been largely overlooked. Similarly, interpretation of the clinical significance of mutations has been limited to a handful of the most well characterized recurrently mutated `hot-spots' of certain genes. The proposed project will develop new tools to identify and characterize mutations with regulatory rather than protein coding consequences. Furthermore, we will develop resources to help the research community interpret the possible clinical relevance of these mutations. To explore these knowledge gaps and test our new tools we will apply them to a cohort of tumor samples from ongoing large scale genome/transcriptome sequencing projects at the Genome Institute. We have preliminary data to suggest that progression of these tumors may be driven by currently unknown regulatory mutations and that a subset of these may suggest novel therapeutic strategies. The Genome Institute at Washington University School of Medicine is one of few places in the world that successfully combines close interaction of physician scientists with a large-scale genome sequencing facility and world class computing infrastructure. The Genome Institute is a leader in the development of sequencing methods and bioinformatics tools needed for the proposed work. This is demonstrated by the candidate's comprehensive preliminary results. The candidate's mentor, Dr. Richard Wilson has an established track record of mentoring genomics scientists. Dr. Wilson has helped the candidate to establish an outstanding mentoring committee with the interdisciplinary skills needed to guide him in the proposed research. Dr. Wilson along with these additional mentors will collaboratively support and guide the candidate towards a successful independent career. The first specific aim, to be completed during the mentored phase will create new methods for integration of whole genome and transcriptome data as well as annotation and prioritization of somatic events. Particular emphasis will be placed on the characterization of non-coding mutations that affect gene regulation and splicing. The independent phase will move towards development of novel resources to help researchers interpret mutations in a clinical context. In both phases, the candidate's research will focus heavily on the bioinformatics aspect of these problems in a way that has minimal overlap but is highly complementary to the mentor's research program. In the long term, the candidate hopes to fill a growing need for bioinformatics investigators working in the area of cancer genomics. A K99 Pathway to Independence award will be invaluable to establishing him as an independent investigator in a field that is in need of experts specializing in bioinformatics and data analysis
Recent advances in sequencing technology have allowed for rapid generation of massive whole genome, exome and transcriptome datasets from patient tumors. Initial analysis of these data has focused almost entirely on mutations predicted to affect protein coding sequences while non-coding mutations affecting regulation of gene expression have been largely overlooked. Furthermore, the clinical consequence of most mutations, coding or non-coding remains poorly understood and the resources needed to elucidate these relationships are lacking. The research proposed here will address these knowledge gaps by creating methods, resources and tools to identify novel regulatory mutations driving progression of breast, liver, AML, and other cancer types.
Showing the most recent 10 out of 11 publications