The Song Lab consists of computer scientists, statisticians, and mathematicians who are fully committed to ad- vancing biology. We develop ef?cient computational tools and robust statistical methods to facilitate the research of the broad biomedical community, while also getting deeply involved in data analysis to make new biological discoveries. In particular, we have been making notable contributions to the ?eld of population genomics, where we have obtained signi?cant theoretical results and developed useful inference tools that are generalizable to complex models and scalable to big data. In the past ?ve years, our research has branched out to other ar- eas of genomics, including bulk and single-cell gene expression analysis; mRNA translation dynamics; structural biology; immunology; and metagenomics. Technological advances in sequencing and experimental assays have greatly increased the availability of various kinds of genomic data, enabling us to catalog genetic and epigenetic variation in diverse populations, and to probe fundamental biological processes (e.g., transcription and translation) in unprecedented detail. This development is providing a number of new opportunities for basic and biomedical research, but often the data are noisy and multifaceted, while the underlying biology is very complex, thus presenting both theoretical and computational challenges for analysis and interpretation. New ef?cient and robust statistical inference tools, as well as theoretical analysis of mathematical models, are much in need of development to bring the promise of the big data era in biology to full fruition. The central goal of our research program is to meet these important challenges. Over the next ?ve years, we will continue to carry out basic research in both population genomics and computa- tional genomics, and develop a suite of useful analytical tools, paying attention to sound mathematical modeling, rigorous statistical estimation, and computational scalability. In particular, we will tackle several key technical challenges in population genomics, and develop both likelihood-based and likelihood-free methods to enable in- ference under more complicated, realistic models than previously possible. We will also develop novel inference methods to analyze, integrate, and interpret various types of genomic data, and carry out theoretical analysis of mathematical models to elucidate the intricate details of both transcription and translation processes. In addition, we will continue to collaborate with empirical and experimental biologists to pursue basic research questions in biology, as we have done fruitfully in the past.

Public Health Relevance

This project will develop a suite of statistical and computational methods for genomics that are robust and ef?cient. This research will enable inference under complex population genetic models, and help biomedical researchers integrate information from various types of genomic data to reveal fundamental biological processes, thereby broadly facilitating efforts to understand the genetic basis of human biology and disease risk.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Unknown (R35)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Janes, Daniel E
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California Berkeley
Engineering (All Types)
Biomed Engr/Col Engr/Engr Sta
United States
Zip Code