I. IUTA: a tool for effectively detecting differential isoform usage from RNA-Seq data Most genes in mammals generate several transcript isoforms that differ in stability and translational efficiency through alternative splicing. Such alternative splicing can be tissue- and developmental stage-specific, and such specificity is sometimes associated with disease. Thus, detecting differential isoform usage for a gene between tissues or cell lines/types (differences in the fraction of total expression of a gene represented by the expression of each of its isoforms) is potentially important for cell and developmental biology. We present a new method IUTA that is designed to test each gene in the genome for differential isoform usage between two groups of samples. IUTA also estimates isoform usage for each gene in each sample as well as averaged across samples within each group. IUTA is the first method to formulate the testing problem as testing for equal means of two probability distributions under the Aitchison geometry, which is widely recognized as the most appropriate geometry for compositional data (vectors that contain the relative amount of each component comprising the whole). Evaluation using simulated data showed that IUTA was able to provide test results for many more genes than was Cuffdiff2 (version 2.2.0, released in Mar. 2014), and IUTA performed better than Cuffdiff2 for the limited number of genes that Cuffdiff2 did analyze. When applied to actual mouse RNA-Seq datasets from six tissues, IUTA identified 2,073 significant genes with clear patterns of differential isoform usage between a pair of tissues. Both simulation and real-data results suggest that IUTA accurately detects differential isoform usage. We believe that our analysis of RNA-seq data from six mouse tissues represents the first comprehensive characterization of isoform usage in these tissues. IUTA will be a valuable resource for those who study the roles of alternative transcripts in cell development and disease. II. Analysis of large-scale gene expression and DNA methylation data from The Cancer Genome Atlas Melanoma is highly aggressive and its incidence has been increasing world-wide and both genetics and environmental exposure are significant contributors to its etiology. Melanoma is often metastasized to a distal site before being diagnosed, thus causing the majority of death of skin cancer. The Cancer Genome Atlas (TCGA) consortium measured genome-wide gene expression using RNA-seq for 336 skin cutaneous melanoma (SKCM) samples among which 272 were clinically classified as metastatic SKCM tumors and the remaining 64 as primary SKCM tumors.
We aim ed to identify gene signatures that separate the primary SKCM from the metastatic SKCM samples. Our initial analysis showed that the primary and metastatic SKCM samples shared enough similarity at the gene expression level so that misclassification rates were unacceptably high. We reasoned that some of the primary SKCM tumors plausibly might have evolved to resemble the metastatic tumors in their gene expression. This idea led us to propose an alternative computational method to find a gene signature for accurate classification but does so while making explicit allowance for allegiance switching moving samples from one group to the other, e.g., primary to metastatic or vice versa. Based on an iterative stochastic search algorithm that delivers nearly optimal gene signatures for classification, our alternative algorithm is rooted in the groups defined by clinical classification but allows for switching between groups when a sample is clearly discordant with other group members based on its gene expression profile. We began by seeking such near-optimal partitioning of the 336 samples into the primary and metastatic groups based on the gene expression data using the clinical classification as the guide/basis. Specifically, our algorithm gives each of the 336 samples a small but equal probability to be switched to the other group at each iteration (e.g., from metastatic to primary, or vice versa). We carried out a massive computational search for gene signatures (a set of 20 genes) that provide a near optimal partitioning of the groups while keeping the clinical classification for most of the 336 samples but reassigning a few to the other group. Distinguishing between the newly re-assigned primary and metastatic partitioning now possible based on gene expression data. The search carried out 5,000 independent runs of our alternative stochastic search algorithm to generate 5,000 near-optimal gene signatures and 5,000 sets of near-optimal partitioning of the groups. By examining how often a sample was assigned to the primary and metastatic groups, we could estimate the proportion of runs where the sample was classified as a primary or metastatic SKCM tumor. We found that nearly all the clinically classified metastatic tumors were consistently assigned to the metastatic tumor group in 90-100% of the runs whereas the clinically classified primary SKCM tumors were often reassigned to the metastatic tumor group in proportions ranging from 2% to 80%. This result suggests that the gene expression profiles of many primary tumors resemble those of metastatic tumors to various degrees. Gene ontology analysis of the 500 most frequently selected genes (those appearing most frequently in the 5,000 gene signatures) suggested that the top-ranked genes are enriched in ectoderm and epidermis development, epithelia and epidermal cell differentiation, kerationization, and regulation of inflammatory and defense response. In summary, we have developed a unique computational method that not only assesses the relevance of genes in sample classification but also classifies each sample probabilistically to uncover the true tumor status. Our analysis may provide useful information for treatment and disease management. We also have several long standing collaborations with intramural investigators. Specifically, a) Identifying differentially expressed genes in wild-type Zfp36l3 and Zfp36l3 knockout (KO) mouse placentas using Affymetrix and Agilent arrays and deep sequencing (mRNA-seq) (PI Blackshear). b) Identifying Zfp36l3 target by RNA-seq analysis (PI Blackshear). c) Role of Med13 in embryo development (PI Williams) d) Genome-wide tamoxifen induced ER alpha binding specificity (PI Korach).

Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
U.S. National Inst of Environ Hlth Scis
Zip Code
Ungewitter, Erica K; Rotgers, Emmi; Kang, Hong Soon et al. (2018) Loss of Glis3 causes dysregulation of retrotransposon silencing and germ cell demise in fetal mouse testis. Sci Rep 8:9662
Roy, Sumedha; Moore, Amanda J; Love, Cassandra et al. (2018) Id Proteins Suppress E2A-Driven Invariant Natural Killer T Cell Development prior to TCR Selection. Front Immunol 9:42
Miao, Yi-Liang; Gambini, Andrés; Zhang, Yingpei et al. (2018) Mediator complex component MED13 regulates zygotic genome activation and is required for postimplantation development in the mouse. Biol Reprod 98:449-464
Nguyen, Thuy-Ai T; Grimm, Sara A; Bushel, Pierre R et al. (2018) Revealing a human p53 universe. Nucleic Acids Res :
Li, Yuanyuan; Umbach, David M; Li, Leping (2017) Putative genomic characteristics of BRAF V600K versus V600E cutaneous melanoma. Melanoma Res 27:527-535
Fan, Zheng; Ahn, Mihye; Roth, Heidi L et al. (2017) Sleep Apnea and Hypoventilation in Patients with Down Syndrome: Analysis of 144 Polysomnogram Studies. Children (Basel) 4:
Ren, Natalie S X; Ji, Ming; Tokar, Erik J et al. (2017) Haploinsufficiency of SIRT1 Enhances Glutamine Metabolism and Promotes Cancer Development. Curr Biol 27:483-494
Li, Yuanyuan; Kang, Kai; Krahn, Juno M et al. (2017) A comprehensive genomic pan-cancer classification using The Cancer Genome Atlas gene expression data. BMC Genomics 18:508
Lowe, Julie M; Nguyen, Thuy-Ai; Grimm, Sara A et al. (2017) The novel p53 target TNFAIP8 variant 2 is increased in cancer and offsets p53-dependent tumor suppression. Cell Death Differ 24:181-191
Stumpo, Deborah J; Trempus, Carol S; Tucker, Charles J et al. (2016) Deficiency of the placenta- and yolk sac-specific tristetraprolin family member ZFP36L3 identifies likely mRNA targets and an unexpected link to placental iron metabolism. Development 143:1424-33

Showing the most recent 10 out of 36 publications