An emerging challenge at the intersection of informatics and life sciences is constituted by data generated from studies that involve analyzing the effect of molecular probes at a system-wide organism level. This complex data comprises two distinct components: structural description of molecules and image or video-based record of phenotypic responses exhibited by the organism(s) being studied. Such (structure-phenotype) data is increasingly being generated in both basic and applied sciences across a wide spectrum of research activities ranging from those that aim to understand the functional role of genes in model organisms to studies that seek to discover new drugs against diseases. There is however, a singular lack of methods and systems at the state-of-the-art which allow integrated storage and analysis of such structure-phenotype data. This research will address this problem by designing algorithms and systems for content-based representation, storage, querying, and analysis of structure-phenotype information. In particular, scientists working on both basic and translational problems where structure-phenotype data is generated will be able to query and reason with the information within a single framework the likes of which currently do not exist. The methods and systems developed under this award will be validated through collaborations with domain specialists. Furthermore, these systems will be made publicly available. Finally, this project will attract and train students to work at the interface of computer science, biology, and chemistry, with a focus on broadening participation both at the undergraduate and graduate levels and engage public audiences in high-quality learning experiences.
This project will pursue an integrated research, educational, and outreach program to develop the area of structure-phenotype (SP) data management and analysis encompassing algorithmic techniques in structure representation, image analysis, query-retrieval, and analysis of information related to conjugated SP data. Reasoning with SP data is increasingly becoming critical in bio-chemical sciences and plays an important role in understanding the basis of life as well as in development/improvement of therapeutics against diseases. From the computer science perspective, SP data consists of complex, multidimensional entities which present critical challenges for effective and efficient design of representation, retrieval, modeling, and analysis techniques. The research envisaged in this proposal will investigate a series of interrelated problems, under three foci: (1) Designing techniques for robust representation of structure-phenotype data, (2) Development of a unified indexing approach to support content-based query-retrieval of structural and phenotypic data, and (3) Designing techniques for analysis of structure-phenotype data from whole-organism microscopy studies as well as building publicly available proof-of-concept systems for focused biological domains and data-sets. The techniques, systems, and results developed as part of this investigation will be tested and verified in collaboration with domain specialists. The educational component of the proposal includes activities aimed at attraction and intensive mentoring of women and underrepresented minority students. Finally, the plan includes curriculum development to educate students about SP data analysis and engaging public audiences in high-quality learning experiences.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.