Structural Visualization in Bioinformatics

Robbins, Kay

Abstract

This subproject is one of many research subprojects utilizing theresources provided by a Center grant funded by NIH/NCRR. The subproject andinvestigator (PI) may have received primary funding from another NIH source,and thus could be represented in other CRISP entries. The institution listed isfor the Center, which is not necessarily the institution for the investigator.A.
Specific Aims The specific aims of this grant are as stated in the original proposal. B. Studies and Results Year 3 of the grant was focused on tools to handle large data sets and multiple scales. We have made progress in three areas: 1) assembling and handling of microarray datasets, 2) analysis with workflows and hypothesis testing and 3) navigation and multiple scale analysis by lifting to abstract feature spaces. Progress on these efforts is described below. I also wrote a grant proposal for and was awarded a UTSA faculty development leave. This leave will allow me to work on research without teaching or service responsibilities during the fall semester of 2007.1) Assembling and handling of microarray datasets. An essential step in performing structural visualization across large groups of microarray data sets is that the data be translated into a common format for comparison. The NCBI GEO (Gene Expression Omnibus) has gathered a large collection of microarray experiments in a single database. They provide the data in two formats: XML (MINiML) and SOFT. We have developed programs to download and parse the data into individual data files and to automatically assemble this data into comma separated spreadsheets for visualization and analysis.A major logistical problem with comparing datasets across platforms is that different microarray platforms use different, and sometimes inconsistent, methods for identifying genes on the microarrays. We have decided to use the NCBI Gene ID as the standard identifier in our work. Unfortunately, some common microarray platforms use the GenBank Accession Number instead of the GeneID. These platforms require a circuitous translation to Gene ID, which Cory Burkhardt has semi-automated. He has downloaded, reformatted and identified most of the microarray experiments from the top 20 platforms in the NCBI Geo database. We have also developed various programs for parsing the data from the XML specifications and have developed software for serializing the data that isn't being used during program execution using Java SoftReference technology. This allows us to deal with more data than will fit into memory in a clean, object-oriented way. Jason Edwards and James Packer developed a CacheManager Architecture for handling large microarray datasets. This architecture will be deployed in the microarray analysis tool that we have under development. We will also investigate converting Davis (our Java-based data visualization platform) to use this caching technique in the coming year.2) Analysis with workflows and hypothesis testing. This year we focused on developing a workflow-based tool infrastructure based on wizards. Wizards are a familiar application interface in which the user navigates through a procedure in a step-by-step process using Next and Back buttons. In their undergraduate honors theses, Jason Edwards and James Packer developed the infrastructure to support a simple prototype workflow for comparing the behavior of two genes across many microarray platforms. Their prototype application is written in Java and is called MicroMetal (Microarray Meta Analysis). This is the starting point for more general workflow-hypothesis testing approach.3) Navigation and multiple scale analysis by lifting to abstract feature spaces. Doctoral student Dragana Veljkovic and I continue to work on the development of techniques for abstracting features for comprehensive multi-scale navigation of datasets. The idea is that a time window of a spatiotemporal data set (e.g., a multi-electrode recording or a series of microarray experiments) is represented by a low-dimensional subspace (for example, the two-dimensional space spanned by its largest two principal components). The data in each time window then becomes a single point in feature space. The distance between two feature points is computed using a distance metric on the subspaces. We can then project this feature space into a plane using a manifold learning algorithm such as ISOMAP. We are developing navigation and summary techniques that describe the distribution of very large datasets by summarizing their abstract features. A paper describing a MATLAB tool that implements these navigation techniques is under preparation.Other activities: We continue to develop and test Davis (DAta VIewing System). Our collaborators, particularly David Senseman and his students, are using Davis extensively for their research. We have documented Davis and created video tutorials to make learning to use Davis easier.Update on collaborations formed because of this grant: Another aspect of this development grant is the formation of collaborations in biosciences. The following collaborations that were formed last year have proceeded:1) Nicholas Hatsopoulos, University of Chicago, with Doug Rubino from his laboratory: Our paper entitled 'Propagating waves mediate information transfer in the motor cortex' was published in Nature Neuroscience in December. Doug, who was an undergraduate when this collaboration started, has visited several times. He entered a PhD program in neuroscience at the University of San Diego in fall semester of 2006. 2) Colleen Witt, director of the RCMI imaging facility at UTSA and I are continuing to talk about potential integration of visualization and modeling with imaging. We are planning to revise and resubmit our Texas ARP research program grant proposal if the program is offered this year. Dr. Witt supervised minority student Alejandro Montelongo's undergraduate honors thesis on this work (completed spring 2007). I am a member of this student's thesis committee.C. SignificanceAccessible tools for analysis of microarray data are needed to fully realize the potential of high-throughput data-driven biology. In addition, other types of data (such as phenotype information) must be integrated in order to apply the results to health-care. Current tools tend to be simplistic in their data handling and present results that are difficult to relate to actual biological questions. The development of general data-handling infrastructure, meaningful visualizations, and fusion of diverse types of information is critical. The goal of this project is to build tools that allow hypothesis-driven inquiry of structure across multiple data sets at multiple scales in order to derive higher-level insight into fundamental mechanism. We have made some initial progress towards this type of deployment.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Center for Research Resources (NCRR)
Type: Research Centers in Minority Institutions Award (G12)
Project #: 5G12RR013646-10
Application #: 7715341
Study Section: National Center for Research Resources Initial Review Group (RIRG)

Project Start: 2008-08-01
Project End: 2009-07-31
Budget Start: 2008-08-01
Budget End: 2009-07-31
Support Year: 10
Fiscal Year: 2008
Total Cost: $153,507
Indirect Cost

Institution

Name: University of Texas Health Science Center San Antonio
Department
Type: Other Domestic Higher Education
DUNS #: 800189185

City: San Antonio
State: TX
Country: United States
Zip Code: 78249

Related projects

Publications

Ye, Ruquan; Dong, Juncai; Wang, Luqing et al. (2018) Manganese deception on graphene and implications in catalysis. Carbon N Y 132:623-631

Everett, James; Collingwood, Joanna F; Tjendana-Tjhin, Vindy et al. (2018) Nanoscale synchrotron X-ray speciation of iron and calcium compounds in amyloid plaque cores from Alzheimer's disease subjects. Nanoscale 10:11782-11796

Ortega, Eduardo; Ponce, Arturo; Santiago, Ulises et al. (2017) Structural damage reduction in protected gold clusters by electron diffraction methods. Adv Struct Chem Imaging 2:12

Rodriguez, Roberto A; Chen, Liao Y; Plascencia-Villa, Germán et al. (2017) Elongation affinity, activation barrier, and stability of A?42 oligomers/fibrils in physiological saline. Biochem Biophys Res Commun 487:444-449

Mimun, L Christopher; Ajithkumar, G; Rightsell, Chris et al. (2017) Synthesis and characterization of Na(Gd0.5Lu0.5)F4: Nd3+,a core-shell free multifunctional contrast agent. J Alloys Compd 695:280-285

Raphael, Itay; Webb, Johanna; Gomez-Rivera, Francisco et al. (2017) Serum Neuroinflammatory Disease-Induced Central Nervous System Proteins Predict Clinical Onset of Experimental Autoimmune Encephalomyelitis. Front Immunol 8:812

Srinivasan, Anand; Torres, Nelson S; Leung, Kai P et al. (2017) nBioChip, a Lab-on-a-Chip Platform of Mono- and Polymicrobial Biofilms for High-Throughput Downstream Applications. mSphere 2:

Rozinek, Sarah C; Thomas, Robert J; Brancaleon, Lorenzo (2016) Biophysical characterization of the interaction of human albumin with an anionic porphyrin. Biochem Biophys Rep 7:295-302

Plascencia-Villa, Germán; Ponce, Arturo; Collingwood, Joanna F et al. (2016) High-resolution analytical imaging and electron holography of magnetite particles in amyloid cores of Alzheimer's disease. Sci Rep 6:24873

Mendoza-Cruz, Rubén; Bazán-Díaz, Lourdes; Velázquez-Salazar, J Jesús et al. (2016) Helical Growth of Ultrathin Gold-Copper Nanowires. Nano Lett 16:1568-73

Showing the most recent 10 out of 181 publications

Comments

Be the first to comment on Kay Robbins's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: