Our leadership spans the translational spectrum - clinical (Chute, Robinson, Koeller, Hamosh), biological (Haendel, Hoatlin, Doheny), and computational (Mungall, Su, Liu, McWeeney, Overby), with expertise in a wide variety of data sources, types, and models as well as data integration strategies, standards, and algorithms. We are invested in open science, reproducibility, and lead efforts in developing open software, data standards, and crowdsourcing curation platforms. Our vision is to demonstrate connectivity between rare disease and common diseases via genes, pathways, and pathophysiology. We will include disease-phenotype associations, enriched with temporal information and decomposed into biological units. Innovative integration of mechanism and function will allow creation of candidate mechanistic graphs for each rare disease. We will use graph matching and probabilistic techniques to support basic research hypothesis testing as well as clinical inquiry (diagnosis, prognosis, and treatment selection). Finally, our team is deeply committed to enabling the collective use of all public biomedical data by making it interoperable and openly accessible for all users, in all contexts. Semantics matter to integration. The figure highlights the landscape of existing data resources, each contain a portion of data with specific, relevant meaning (A). Aggregation alone often results in loss of meaning (B). Semantic and probabilistic integration approaches provision for more advanced query answering capabilities (C). We have first hand experience overcoming challenges found within large-scale integration projects in general, but more important, we are very familiar with data sources and types this proposal aims to integrate. For example, knockdown of TP53 in zebrafish is used to reduce apoptosis; naive use of the data might attribute phenotypic effects to targeted genes. Other issues are in knowing when and how to integrate data where the associations between entities are not equivalent, such as when one source annotates a disease to a gene and another to a variant. Our existing infrastructure has successfully integrated and leveraged multimodal data for rare disease diagnosis. Here we extend these systems with new data types and new methodologies that generalize across diseases and contexts. The TransMed Knowledge Graph will have an intelligent, adaptive scaffolding for managing and linking the phenomenological worldview of clinical elements with the mechanistic emphases of basic science. Connections between biological entities and events will be represented either directly, or through chaining, enabling the use of powerful algorithms for query and inference. Elements in the graph will be stratified by either classical, rigid taxonomies (disease nosologies, tissue and cell type) or through dynamic groups based on shared mechanisms of molecular pathophysiology. External data can be compared using different criteria. For instance, two patients (one rare disease, one common disease) may be distant in classical nosology, but neighbors in pathway space - suggesting a treatment. The graph will be seeded from open data sources containing diverse data types, supplemented by knowledge from the literature and clinical data. TransMed will also be readily connected to other data stores using a quality identifier strategy, methods that predict probability of equivalency from associated metadata, and algorithms that match graphs based on similar members. Summary. Familiarity with the data, combined with our technical experience, and connection to real-world use cases positions us well to be both relevant and successful in our vision.
Shen, Feichen; Liu, Sijia; Wang, Yanshan et al. (2017) Leveraging Collaborative Filtering to Accelerate Rare Disease Diagnosis. AMIA Annu Symp Proc 2017:1554-1563 |