This Small Business Innovation Research Phase II project proposes to develop a system for automated classification of biological samples and discovery of biomarkers. The system will be designed to perform comprehensive pattern analysis of state-of-the-art biochemical separations generated by comprehensive two-dimensional chromatography (GCxGC) with high-resolution mass spectrometry (HRMS). The pairing of GCxGC and HRMS combines highly effective molecular separations with precise elemental analysis. A critical challenge for effective utilization of GCxGC-HRMS for biochemical sample classification and biomarker discovery is the difficulty of analyzing and interpreting the massive, complex data for metabolomic features. The quantity and complexity of the data, as well as the large dimensionality of the metabolome, and the possibility that significant chemical characteristics may be subtle and involve patterns of multiple constituents, necessitate investigation and development of new bioinformatics. The principal technical objective is an innovative framework for comprehensive feature matching and analysis across many samples. Specifically, the framework will incorporate advanced methods for multidimensional peak detection, peak pattern matching across large sample sets, data alignment, comprehensive feature matching, and multi-sample analyses (e.g., classification and biomarker discovery) with large sample sets. The anticipated result is a commercial system for automated multi-sample analysis.
The broader impact/commercial potential of this project will be realized through improved informatics for biological classification and biomarker discovery. These tools will enable researchers to better understand biochemical processes and to discover metabolic biomarkers, which could lead to improved methods for disease diagnoses and treatments. These information technologies will foster utilization of advanced GCxGC-HRMS instrumentation, thereby contributing to the impetus for future instrument development. The informatics developed in this project also will be relevant for other classification problems involving multidimensional, multispectral data, including other applications (such as biofuels), other types of chemical analyses (such as multidimensional spectroscopy), and other fields (such as remote-sensing multispectral geospatial imagers). This project will contribute to national competitiveness in the global market for analytical technologies and will contribute to workforce development by involving students in research experiences through internships and student projects. Software developed in the project and an example dataset will be available to educational institutions to allow students to more easily explore biochemical complexity.