Meta-analyses of microarray experiments require the usage of meta-data annotations, however these annotations are often a barrier because they usually entail significant manual evaluation. Combining data from assays in different experiments for analysis is challenging as it requires suitably transforming these data so that they are on equal footing. In particular, extra care needs to be taken to ensure that the results are not driven by confounding factors but rather by biologically-relevant ones. Consideration of annotations can improve meta-analyses through guiding choice of experiments, assays (within each experiment), data transformations, and analysis procedures. We propose to develop software that will extract annotations for use in meta-data analyses and which should motivate better annotation of microarray experiments using established standards. Standardized experiment annotations can be generated using the MGED Ontology (MO) and can be extracted from files based on the MAGE (MicroArray Gene Expression) standard that have information covering the MIAME (Minimal Information About a Microarray Experiment) checklist. Standardized MO-based assay annotations are also available from MAGE based files, but further relevant information (such as treatment descriptions) also resides in free-text annotation fields in these files. Thus, in order to get fully standardized annotation for assays, more work is needed than just extracting the MO terms associated with them.
Our first aim i s to develop software that will extract annotations either directly from appropriate MAGE fields or parse them as needed from free-text descriptions. The annotations will be used to generate dissimilarity measures between experiments and assays based on shared annotation. The software will need to recognize synonymous terms when terms from different experiments or assays for the same annotation (e.g., organism part) are drawn from different sources.
Our second aim i s to develop software to compute with annotations based on these measures, e.g. to find experiments or assays related to a query experiment/assay, or to cluster experiments or assays based on their annotation (as opposed to clustering based on gene expression profiles). These clusters can be used as the basis for organizing experiments/assays and performing meta-analyses of gene expression profiles. Additionally, annotation-based dissimilarity measures can be used to evaluate existing (gene expression profile based) clusters of experiments or assays and the annotation itself can be input into analyses aimed at identifying over-enriched terms.Narrative: Microarray technology has been used to understand the molecular basis of diseases including heart, lung, blood, and sleep disorders and cancer. We will develop software applications to demonstrate the feasibility and utility of using microarray annotations to drive meta-analyses and quality control (QC) of experiments. The applications will be tested on files from the public repository ArrayExpress but are meant to work with appropriate files from any source. To the best of our knowledge, the usage of annotations for this purpose has not been explored previously and therefore the risk of the proposal is that it is exploring uncharted territory and the severity and type of pitfalls are unclear. The potential high impact of the proposed applications to the bench biologist is the ability to generate additional insights from their microarray results. Moreover, these methods would eventually be extensible to annotated experiments employing high-throughput technologies other than microarrays. This can further facilitate integration of data of different types to address a scientific question of interest. These benefits may encourage better annotation of experiments and use of standards.