Recent technological advances in high-throughput experimental analysis enable modern biologists to collect data at the genome scale, and use them to decipher biological principles and uncover the elements of complex behaviors in living organisms. The advances and changes in the research paradigm require development of a set of tools that biologists need for scientific discovery. Among these, computational and data analysis tools are essential, and are largely provided by the fields of data mining, bioinformatics and statistics. We propose to introduce a new approach to functional genomics studies, and hypothesize that the global expression profile of any organism could provide a universal phenotype for direct prediction of biological function. We will develop a set of computational tools to treat such phenotypes, perform the corresponding data analyses and infer predictive functional models from experimental data. Our efforts will be based on an arsenal of state-of-the-art data mining approaches. We will adapt existing tools and develop new ones to help us infer reliable predictions, to find what biological changes took place following environmental or genetic change, and to explain the relevant biological background. Our methods will infer interactions between global expression profiles, mutant fitness, gene function and annotation, and classical biological phenotypes, such as chemotaxis, morphogenesis and development. Using correlation studies, the methods will decompose expression profiles to biologically meaningful components that will enable us to reason on the functional changes and their relations at the genome scale. Most importantly, we will test and adjust these tools in collaboration with the biological projects, ensuring their practical utility. We will package our methods into open-source toolboxes, using component-based design and a visual programming paradigm to make the tools accessible to users that are not programmers or computer experts. We will make that package freely available to the research community. This project will also define and maintain the information infrastructure for the entire program, and will provide databases that will store experimental information and related data on hundreds of mutants. Finally, we will develop server-based software to provide public access to the vast amounts of data produced by this program and to selected data analysis tools through the world wide web.
Showing the most recent 10 out of 64 publications