This subproject is one of many research subprojects utilizing the resources provided by a Center grant funded by NIH/NCRR. The subproject and investigator (PI) may have received primary funding from another NIH source, and thus could be represented in other CRISP entries. The institution listed is for the Center, which is not necessarily the institution for the investigator.
The specific aims of this grant are as stated in the original proposal. Since the grant was funded for 3 years rather than for 5 years, the investigator will focus more on development of specific analysis and visualization techniques and less on building general data-handling infrastructure than was originally proposed. Year 1 of the grant was to be devoted to acquisition of baseline knowledge and implementation of basic software infrastructure. Future work will continue as outlined in the proposal. Personnel: Funding for the grant did not officially arrive until November, 2004, delaying the initial hiring of staff until spring 2005. However, during the fall 2004 semester, the PI worked with two minority students on projects specifically addressing two aspects of the specific aims as described below. To compensate for lost time due to these delays, the investigator hired two half-time graduate research assistants (Egle Pilipaviciute and Dragana Veljkovic) and two undergraduate programmers (Jason Edwards and James Packer) in January of 2005. The full-time research software developer position was filled at the beginning of March, 2005 by Cory Burkhardt, a summa cum laude graduate of UTSA. The investigator also hired David Bigham, a recent UTSA graduate who is entering the CS Master?s program in the fall of 2005, to work for three months during the summer of 2005. David will be writing scripts to access microarray databases from Matlab. The investigator also will devote a significant portion of her time in June and July of 2005 to this project. Software development: During the course of this year, it was determined that software for this project will be developed on two platforms: Matlab and Davis. We evaluated several alternative platforms including GeneSpring and R and decided that neither of these platforms was flexible enough to support the types of visualizations needed for this work. Matlab, which has an enormous library of sophisticated algorithms, has undergone major improvements in its bioinformatics toolbox and data handling capabilities. Matlab supports user-developed GUIs (graphical user interfaces). After an application has been developed in Matlab, it can be compiled into a standalone application that does not require a Matlab license. Dragana Veljkovic worked on Matlab wavelet implementations and did some prototype development with Matlab GUIs. Davis (Data Viewing System) is a data visualization platform written in Java by the investigator and her students. Davis provides a data handling infrastructure that is not available in Matlab. In particular, it supports multiple simultaneous synchronized views and can be used to view data from the web. It is also a good platform for developing streaming algorithms for handling large data sets. Considerable personnel time during this grant year has been devoted to stabilizing the Davis platform to enable future development. The program was reorganized by the investigator during the period from November to April so that it would be easier to add new visualizations and new types of data. Cory Burkhardt rewrote the underlying synchronization and timing mechanisms for Davis and has begun documenting the program, implementing configuration profiles, and writing a user?s guide. Jason Edwards and James Packer reworked the preferences that allow the user to set visualization parameters such as color maps. Egle Pilipaviciute worked on the implementations of global KL decomposition techniques. The investigator implemented a wavelet capability in Davis for doing multi-resolution visualizations as well as techniques that combined KL decomposition with wavelets. Acquisition of baseline knowledge: The investigator continued to acquire background knowledge. She regularly attended the bioinformatics seminar, a weekly meeting in which graduate students present important papers in computational bioinformatics. She attended a tutorial in microarray technology at the RCMI meeting in December. She is planning to attend the IEEE Computational Systems Bioinformatics Conference, Aug. 8?11, 2005, along with tutorials associated with that meeting. She is also a member of doctoral dissertation committees of two students who are working in bioinformatics. She met with her RCMI mentor twice this year: at Coupled60, a conference that was held in Houston, TX in February and at the NSF Collaborative Research in Computational Neuroscience Meeting in April in Washington, DC. Summary of progress on specific aims:
Specific aim 1 : Develop new approaches for the visual analysis of microarray data sets. A. Apply dimension-reducing techniques such as KL decomposition. The investigator jointly supervised (with Yufeng Wang of Biology) MBRS-RISE student Maribel Sanchez in an independent study. Using Matlab and GeneSpring, Maribel applied KL decomposition to analyze cell cycles in the Malaria data challenge data set produced by the DeRisi lab at UC San Francisco for the CAMDA 2004 (Critical Assessment of Microarray Data) contest. She found that KL decomposition captured the cell cycle and was able to predict the phase of the data. B. Develop and apply general techniques for the analysis of waves. No work was done on this specific aim beyond the cell cycle analysis of part A and the development of general wave techniques as part of Davis. C. Develop visual techniques for understanding gene cluster relationships. CS PhD student Robert Baltimore did an independent study on visualization of clustering for microarray data. In particular, he looked at techniques for clustering microarray data in restricted directions. D. Develop techniques for structural analysis of microarray data. MBRS/RISE student Magdaliz Gorritz did an independent study in which she gathered a large number of microarray data sets as well as links to other information. She worked using the program R to run simultaneous analysis on a large number of these data sets. This data will be used as test data for the techniques being developed.
Specific aim 2 : Develop new visualization techniques for multi-scale analysis and exploration of microarray data sets. A. Create a web-based data browser for navigating microarray data sets at multiple levels. No specific progress was made on navigating microarray data. However, wavelet analysis for multi-resolution analysis was implemented in Davis. B. Integrate online databases with the data browser. In the spring of 2005, the investigator supervised an independent study with Li Zhao, a CS master?s student interested in bioinformatics. Li worked with the COG databases and the supplemental data provided in the paper ?Use of Logic Relationships to Decipher Protein Network Organization? by Bowers et al. (Science 306:2246-2249, 2004). She implemented their algorithm to use phylogenetic profiles of triplets of proteins to infer network relationships. We plan to use these logic relationships to annotate clusters in microarray data. This is work in progress. C. Use 3D technology and navigation to explore microarray data. Master?s thesis student Mark Robinson continued to work on the development of techniques for overlaying two surfaces in 3D in order to compare scalar data sets. Master?s thesis student Rachel Smith is developing algorithms and an implementation to use a data glove to navigate through data in 3D. Both Mark and Rachel are conducting user studies and have approved human subjects? forms. Undergraduate student Jason Johnson is investigating the feasibility of using VTK (Visualization toolkit) to do 3D visualization in Davis. We have made progress in using these technologies but are not at the stage of applying them to microarray data. New collaborations: Another aspect of this development grant is the formation of new collaborations in bioinformatics. The investiagor has started three new research collaborations this year as a direct result of her involvement with the RCMI program: 1) Nicholas Hatsopoulos ? University of Chicago, performs multi-electrode recordings in monkey motor cortex. These records produce large amounts of spatial-temporal data with wave-like activity. His data will be useful for looking at data handling and multi-resolution issues. The investigator has formatted this data for Davis visualization. One of his undergraduate honors thesis students, Doug Rubio, visited and worked with the investigator for two days in December to learn the wave techniques and to discuss what analysis will be applicable to this data. Nicholas came as an RCMI seminar speaker in February and the collaboration will continue this summer through Doug on an analysis of the spatial dependence of directionality in the data. 2) Colleen Witt ? a postdoctoral fellow from Berkeley, works on cell motility during T-cell development. She is a former student of Richard LaBaron, and Richard suggested the collaboration. The investigator has written a suite of analysis tools in Matlab to look at cell motility characteristics in two-photon microscopy data. This experience will allow the investigator to assist other researchers who will be using the two-photon microscopy RCMI core facility that should come on line next year. 3) Matthew Gdovin ? UTSA RCMI project director, works on central respiratory chemoreception. After a discussion of his data at the April RCMI meeting in Houston, the two investigators realized that the wavelet signal analysis techniques would be applicable to the respiration data. The two investigators will collaborate directly and through their graduate students, Vonnie Veit and Dragana Veljkovic.
Showing the most recent 10 out of 181 publications