Micro-RNAs (miRNAs) are small non-coding RNAs that control gene expression by regulating messenger (mRNA) translation. miRNAs can act as oncogenes or tumor suppressors, and the differential expression of miRNAs has been correlated with cancer diagnosis, staging, and prognosis, and has been used to nominate therapeutic targets. Much of what is known about miRNAs in cancer comes from genome-scale miRNA expression profiling. Unfortunately, despite hundreds of published studies using miRNA profiling data, at present a basic cancer biologist lacks tools to survey the differential expression of one or more miRNAs in a specific cancer type or across the global collection of miRNA expression profiling studies. Barriers include identifying available data, platform heterogeneity, analysis methods and meaningful presentation of results. Thus, a useful solution must address a number of challenges, including the growing number of miRNAs, multiple technologies and reporters used to measure miRNA expression, disparate clinical and experimental facts, and producing biologically meaningful analyses. Here we propose to develop a solution for a biologist seeking to explore a single miRNA or a miRNA signature across the global collection of cancer miRNA data sets. To accomplish this, our overall goal is to collect all publicly available cancer-related high throughput miRNA data, to standardize the disparate data at three levels - sample data, expression data, and statistical analyses - and to present the data in a consistent, comparable format that is also fully integrated with existing, mRNA and DNA copy data in Oncomine. In Phase I we will 1) Establish and implement sample metadata curation strategy for 3 micro-RNA profiling datasets to demonstrate feasibility of applying a controlled vocabulary and ontology to miRNA sample metadata;2) Establish and implement platform mapping strategy for 3 micro-RNA profiling datasets to demonstrate feasibility of standardizing disparate miRNA platforms into a single, unified format;3) Perform differential expression analysis on 3 micro-RNA profiling datasets to demonstrate feasibility of creating automatically standardized analyses following the curation and mapping steps conducted in Aims 1 and 2. Upon successful completion of Phase I, we propose the following Phase II aims: 1) Development and Implementation of a scalable process for capturing and curating micro-RNA genomics data and integration into Oncomine by developing software to support the scalable catalog and capture of miRNA sample data;2) Development of a scalable micro-RNA genomics platform mapping and data warehouse strategy and integration into Oncomine, by developing tools to accommodate dynamic naming conventions for micro-RNAS and mapping to common identifiers, and 3) Development of automated analysis methods for analyzing micro-RNA profiling datasets and integration into Oncomine by developing automated differential expression, co-expression, outlier, and meta- analysis capability across the miRNA database, and integration within the established Oncomine database.
Despite the substantial efforts of scientist to understand the molecular basis of cancer, these research gains have been difficult to translate into clinical practice, and cancer remains a leading cause of mortality in the United States. This proposal seeks to make micro-RNA data - which is clearly correlated with cancer diagnosis, staging, and prognosis - easily accessible to cancer researchers via the cancer genomic portal Oncomine. If successful, this effort will improve public health by providing researchers with additional data and tools to understand and treat this biologically complex disease.