The aim of this Program Project is to discover the function of most genes in the Dictyostelium genome. In Project III we will develop computational techniques to infer gene function and reconstruct gene networks from high-throughput phenotyping, transcriptional profiling, and chromatin footprinfing data, collected in Projects 1 and 11. Our hypothesis is that the increased precision and completeness of these new data sets, made possible by Next Generation sequencing, will allow us to infer powerful predictive models. First, we will design and implement PIPA, a high-throughput sequencing data analysis pipeline. PIPA will be component based and will integrate emerging tools from the community (R, Galaxy, bowtie, top-hat, etc.). It will provide a unified, easy-to-use web-based access to the Program's experimental data. Next, we will devise methods that will query PIPA and consider transcription, competitive growth and chromatin binding information to infer gene function. Integrative data mining to construct consensus gene network models will fuse these emerging hypotheses while considering available external data from other organisms. We will use consensus gene networks as scaffolds upon which we can predict gene function, propose additional experiments, and add layers of informafion from other experiments. We will also use the gene networks as background knowledge for experiment prioritization, the proposal of new mutant-based screens, and the development of new phenotype prediction models. Finally, we propose to implement the new methods within modern server based software architecture with visualization-rich interactive interfaces. The most significant aspect of this part of the project is the design of an infrastructure and interfaces that will make the entire planned data analytics transparent and operable by biologists with no computer science background. Our software will be freely available to the research community and well integrated with dictyBase, a primary Dictyostelium community resource.
The lack of appropriate analytical methods reduces he utility of high-dimensional, genome-scale biological data. Using diverse, rich, high-quality phenotypic and transcriptional profiling data sets we will devise new computational methods to accurately infer gene function, helping us to better understand biological processes and equipping other researchers with improved means to analyze their own biomedical data.
Showing the most recent 10 out of 64 publications