The Song Lab consists of computer scientists, statisticians, and mathematicians who are fully committed to ad- vancing biology. We develop ef?cient computational tools and robust statistical methods to facilitate the research of the broad biomedical community, while also getting deeply involved in data analysis to make new biological discoveries. In particular, we have been making notable contributions to the ?eld of population genomics, where we have obtained signi?cant theoretical results and developed useful inference tools that are generalizable to complex models and scalable to big data. In the past ?ve years, our research has branched out to other ar- eas of genomics, including bulk and single-cell gene expression analysis; mRNA translation dynamics; structural biology; immunology; and metagenomics. Technological advances in sequencing and experimental assays have greatly increased the availability of various kinds of genomic data, enabling us to catalog genetic and epigenetic variation in diverse populations, and to probe fundamental biological processes (e.g., transcription and translation) in unprecedented detail. This development is providing a number of new opportunities for basic and biomedical research, but often the data are noisy and multifaceted, while the underlying biology is very complex, thus presenting both theoretical and computational challenges for analysis and interpretation. New ef?cient and robust statistical inference tools, as well as theoretical analysis of mathematical models, are much in need of development to bring the promise of the big data era in biology to full fruition. The central goal of our research program is to meet these important challenges. Over the next ?ve years, we will continue to carry out basic research in both population genomics and computa- tional genomics, and develop a suite of useful analytical tools, paying attention to sound mathematical modeling, rigorous statistical estimation, and computational scalability. In particular, we will tackle several key technical challenges in population genomics, and develop both likelihood-based and likelihood-free methods to enable in- ference under more complicated, realistic models than previously possible. We will also develop novel inference methods to analyze, integrate, and interpret various types of genomic data, and carry out theoretical analysis of mathematical models to elucidate the intricate details of both transcription and translation processes. In addition, we will continue to collaborate with empirical and experimental biologists to pursue basic research questions in biology, as we have done fruitfully in the past.

Public Health Relevance

This project will develop a suite of statistical and computational methods for genomics that are robust and ef?cient. This research will enable inference under complex population genetic models, and help biomedical researchers integrate information from various types of genomic data to reveal fundamental biological processes, thereby broadly facilitating efforts to understand the genetic basis of human biology and disease risk.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Unknown (R35)
Project #
5R35GM134922-02
Application #
10063943
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Janes, Daniel E
Project Start
2019-12-01
Project End
2024-11-30
Budget Start
2020-12-01
Budget End
2021-11-30
Support Year
2
Fiscal Year
2021
Total Cost
Indirect Cost
Name
University of California Berkeley
Department
Engineering (All Types)
Type
Biomed Engr/Col Engr/Engr Sta
DUNS #
124726725
City
Berkeley
State
CA
Country
United States
Zip Code
94710