RNA-Seq is a recently developed technology capable of providing comprehensive, nucleotide sequence-level survey of the RNA population in a sample of cells. The purpose of this project is to develop rigorous statistical methods and efficient computer programs that will allow the effective analysis of the massive amount of data produced by RNA-Seq experiments. Specifically, we will conduct research with the following aims.
Aim 1 : Modeling non-uniformity of read rates: It is known that read rates can vary substantially depending on the position of the reads on the same transcript and that such non-uniformity can induce biases in expression quantification. We will model how the read rate may depend on local sequence context, and design methods to correct for biases caused by non-uniform rates.
Aim 2 : Inference of isoform-specific expression: Even when the isoforms are known, the issue of how paired-end data can be incorporated into the statistical framework for quantitative inference of isoform expression is an open problem. We will develop the necessary statistical theory and methods to resolve this important issue.
Aim 3 : Mapping, alignment and detection of splice junctions: We will design computational methods to map and alignment the reads to the reference genome, and will develop methods for the detection of splice junctions based on the alignment results.
Aim 4 : De Novo inference of isoforms: The results of the previous aims will be integrated and extended to develop a statistical framework for inferring the set of expressed isoforms in a genetic locus. Based on this framework, we will design algorithms to discover the set of expressed isoforms and to quantify their expressions.
Aim 5 : Development of software for RNA-Seq data analysis: We will create a software application to support the analysis of RNA-Seq data. Starting from raw sequence reads as input, this software will allow the mapping to known transcript databases, discovery and display of new transcripts or isoforms, visualization of reads and computation of isoform-specific expression and associated statistical summaries. By creating the statistical and computational tools to enable extraction of useful information from RNA-seq data, this project will accelerate many areas of research relevant to human health.

Public Health Relevance

Dr. Wong and his lab members will conduct research on several problems related to the analysis of mRNA data produced by massively parallel sequencing technologies. They will develop statistical models for the inference of isoforms and isoform-specific expression. By creating the tools to enable extraction of useful information from RNA-seq data, this project will accelerate many areas of research relevant to human health.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Pazin, Michael J
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Stanford University
Biostatistics & Other Math Sci
Schools of Arts and Sciences
United States
Zip Code
Qiu, Haiyan; Lee, Sebum; Shang, Yulei et al. (2014) ALS-associated mutation FUS-R521C causes DNA damage and RNA splicing defects. J Clin Invest 124:981-99
Lo, Wing-Sze; Gardiner, Elisabeth; Xu, Zhiwen et al. (2014) Human tRNA synthetase catalytic nulls with diverse functions. Science 345:328-32
Au, Kin Fai; Sebastiano, Vittorio; Afshar, Pegah Tootoonchi et al. (2013) Characterization of the human ESC transcriptome by hybrid sequencing. Proc Natl Acad Sci U S A 110:E4821-30
Hiller, David; Wong, Wing Hung (2013) Simultaneous isoform discovery and quantification from RNA-seq. Stat Biosci 5:100-118
Tan, Meng How; Au, Kin Fai; Yablonovitch, Arielle L et al. (2013) RNA sequencing reveals a diverse and dynamic repertoire of the Xenopus tropicalis transcriptome over development. Genome Res 23:201-16
Brady, Jennifer J; Li, Mavis; Suthram, Silpa et al. (2013) Early role for IL-6 signalling during generation of induced pluripotent stem cells revealed by heterokaryon RNA-Seq. Nat Cell Biol 15:1244-52
Mu, John C; Jiang, Hui; Kiani, Amirhossein et al. (2012) Fast and accurate read alignment for resequencing. Bioinformatics 28:2366-73
Ma, Li; Wong, Wing Hung; Owen, Art B (2012) A sparse transmission disequilibrium test for haplotypes based on Bradley-Terry graphs. Hum Hered 73:52-61
Jia, Yichang; Mu, John C; Ackerman, Susan L (2012) Mutation of a U2 snRNA gene causes global disruption of alternative splicing and neurodegeneration. Cell 148:296-308
Peterson, Kevin A; Nishi, Yuichi; Ma, Wenxiu et al. (2012) Neural-specific Sox2 input and differential Gli-binding affinity provide context and positional information in Shh-directed neural patterning. Genes Dev 26:2802-16

Showing the most recent 10 out of 12 publications