RNA-Seq is a recently developed technology capable of providing comprehensive, nucleotide sequence-level survey of the RNA population in a sample of cells. The purpose of this project is to develop rigorous statistical methods and efficient computer programs that will allow the effective analysis of the massive amount of data produced by RNA-Seq experiments. Specifically, we will conduct research with the following aims.
Aim 1 : Modeling non-uniformity of read rates: It is known that read rates can vary substantially depending on the position of the reads on the same transcript and that such non-uniformity can induce biases in expression quantification. We will model how the read rate may depend on local sequence context, and design methods to correct for biases caused by non-uniform rates.
Aim 2 : Inference of isoform-specific expression: Even when the isoforms are known, the issue of how paired-end data can be incorporated into the statistical framework for quantitative inference of isoform expression is an open problem. We will develop the necessary statistical theory and methods to resolve this important issue.
Aim 3 : Mapping, alignment and detection of splice junctions: We will design computational methods to map and alignment the reads to the reference genome, and will develop methods for the detection of splice junctions based on the alignment results.
Aim 4 : De Novo inference of isoforms: The results of the previous aims will be integrated and extended to develop a statistical framework for inferring the set of expressed isoforms in a genetic locus. Based on this framework, we will design algorithms to discover the set of expressed isoforms and to quantify their expressions.
Aim 5 : Development of software for RNA-Seq data analysis: We will create a software application to support the analysis of RNA-Seq data. Starting from raw sequence reads as input, this software will allow the mapping to known transcript databases, discovery and display of new transcripts or isoforms, visualization of reads and computation of isoform-specific expression and associated statistical summaries. By creating the statistical and computational tools to enable extraction of useful information from RNA-seq data, this project will accelerate many areas of research relevant to human health.

Public Health Relevance

Dr. Wong and his lab members will conduct research on several problems related to the analysis of mRNA data produced by massively parallel sequencing technologies. They will develop statistical models for the inference of isoforms and isoform-specific expression. By creating the tools to enable extraction of useful information from RNA-seq data, this project will accelerate many areas of research relevant to human health.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Li, Rongling
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Stanford University
Biostatistics & Other Math Sci
Schools of Arts and Sciences
United States
Zip Code
Qiu, Haiyan; Lee, Sebum; Shang, Yulei et al. (2014) ALS-associated mutation FUS-R521C causes DNA damage and RNA splicing defects. J Clin Invest 124:981-99
Lo, Wing-Sze; Gardiner, Elisabeth; Xu, Zhiwen et al. (2014) Human tRNA synthetase catalytic nulls with diverse functions. Science 345:328-32
Brady, Jennifer J; Li, Mavis; Suthram, Silpa et al. (2013) Early role for IL-6 signalling during generation of induced pluripotent stem cells revealed by heterokaryon RNA-Seq. Nat Cell Biol 15:1244-52
Hiller, David; Wong, Wing Hung (2013) Simultaneous isoform discovery and quantification from RNA-seq. Stat Biosci 5:100-118
Tan, Meng How; Au, Kin Fai; Yablonovitch, Arielle L et al. (2013) RNA sequencing reveals a diverse and dynamic repertoire of the Xenopus tropicalis transcriptome over development. Genome Res 23:201-16
Ma, Li; Wong, Wing Hung; Owen, Art B (2012) A sparse transmission disequilibrium test for haplotypes based on Bradley-Terry graphs. Hum Hered 73:52-61