This action funds an NSF Postdoctoral Research Fellowship in Biology for FY 2011, Intersections of Biology and Mathematical and Physical Sciences. The fellowship supports a research and training plan in a host laboratory for the Fellow at the intersection of biology and statistics. The title of the research and training plan for this fellowship to Rori Rohlfs is "Developing statistical tools to examine the evolution of gene expression." The host institution for this fellowship is the University of California, Berkeley, and the sponsoring scientists are Drs. Rasmus Nielsen and Sandrine Dudoit.

The regulation of the degree to which a particular gene is used, called gene expression, has been proposed as an important mechanism explaining much of species diversity yet a rigorous statistical framework that models gene expression evolution has not yet been established. This research uses the Ornstein-Uhlenbeck process to build a model of the evolution of gene expression. It takes into account the existing phylogeny, relationships between genes and between species, factors limiting constraints on gene expression, and an optimally fit expression level. After demonstrating the model's biological and statistical validity, the research uses a data set of mammalian gene expression, as measured with new RNA-Seq technology. Specific hypotheses regarding the evolution of gene expression levels in mammals are tested. For instance, based on the model, the likelihood ratio test determines if genes expressed in human brain are undergoing rapid expression adaptation along the human species lineage. A goal is to produce an open source software package of the model for broad use in studies of expression evolution across biological disciplines.

Training goals include strengthening scientific collaborations and developing and distributing tools that will be accessible widely to the scientific community. Broader impacts include outreach to the public through collaborations with science media and public museums on the underlying causes of differences between species. Educational outreach includes designing and teaching a course at K-12 schools and community colleges.

Project Report

Long-standing theories in evolutionary biology posit that changes in gene regulation contribute substantially to physical differences between species. Direct evolutionary analyses of gene usage (or expression) were impeded by the difficulty of obtaining comparable gene expression measurements across species. Next generation sequencing technologies have made these sorts of measurements accessible through RNA-Seq. Early evolutionary gene expression studies produced a number of interesting results, but were limited by statistical methods that lacked an evolutionary framework (i.e. these methods failed to account for different evolutionary relationships between species). In this project, I developed and implemented an evolutionary framework for gene expression between species. This method directly accounts for expression variation both within and between species, making it possible to test a number of specific hypotheses including: non-evolutionary expression variance (for example, differences in expression due to environmental conditions) expression drift (expression levels change randomly without constraint over evolutionary time) stabilizing selection on expression level (when a particular expression level is more evolutionarily fit than others, resulting in species maintaining similar expression levels over time, with some variance) adaptive shifts in expression level (while expression levels are stable over evolutionary time, for a particular species a change in expression level becomes advantageous (perhaps in response to a change of niche)) expression level divergence between species (unusually high expression level differences between species, based on what is expected given within species expression variance, likely evidence of expression level adaptation) expression level diversity within species (unusually high expression level differences within species, based on what is expected given between species expression variance, likely evidence of short term expression level response to environmental factors) Specifically testing these hypotheses enables rigorous analyses of expression level evolution, greatly expanding our understanding of how changes in expression level contribute to adaptive differences between species. The software I developed implementing this method is being made freely available. I applied my model to a publicly available expression data set across 15 mammals (mostly primates). I identified candidate genes for environmental response by testing for expression diversity, including HSPA8, a heat shock gene which varies in expression level according to environmental conditions. I also identified candidate genes for expression level adaptation for example, F10, a blood coagulation factor that has extremely high expression in armadillos as compared to other mammals. It has previously been found that armadillo blood coagulates remarkably quickly compared to other mammals. This suggests that selective pressure on F10 expression level in armadillos caused an expression increase, resulting in a physiological difference. Simultaneous to this evolutionary expression level research, I investigated population genetic assumptions in forensic genetic identification methods. I specifically examined familial searching, where a database is queried for a partial genetic match to a sample of unknown origin, which may be caused by a close genetic relationship between the individual who left the sample and the partially matching individual in the database. I implemented the familial searching method used in California and quantified how often unrelated individuals and distant relatives (half-sibs, half-cousins, second cousins) are mistaken as first degree relatives (siblings, parent-offspring). In the case of such a mis-identification, the family of the mistaken individual would be investigated which would not result in an identification of the individual who left the sample. In similar work, I started a collaboration to investigate the validity of basic population genetic assumptions in a large database (>100,000 individuals). I found that the usual way of estimating how often a pair of unrelated individuals has partially matching genetic profiles may not accurately reflect the frequency of partial matches. I showed that this is ameliorated by better accounting for how the frequency of a genetic variant differs between population groups. In the beginning of my time on this grant, I established a collaboration with a high school biology teacher at Berkeley High School where I taught interactive lessons on contemporary population genetics and evolution research and organized other postdocs and graduate students to do the same. These lessons center active learning about motivated topics in biology to engage high school students in the scientific process, provide an excellent resource for high school teachers, and allow researchers an opportunity to see the impact of their work. Over the years this program grew, including seven lessons taught to approximately 120 students in classes of three teachers each year. I implemented the program to be sustainable for both researchers and teachers, so that now as my time as a postdoc is ending, the program is being carried on. I developed curriculum for and taught a course on forensic genetics for incoming biology majors to spark curiosity about the scientific process and make students comfortable with quantitative reasoning. This course leveraged the appealing and relevant application of forensic genetics to motivate students to practice quantitative scientific analysis with population genetics principals applied in forensic genetics.

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Application #
1103767
Program Officer
Michael Vanni
Project Start
Project End
Budget Start
2012-01-01
Budget End
2015-01-31
Support Year
Fiscal Year
2011
Total Cost
$189,000
Indirect Cost
Name
Rohlfs Rori V
Department
Type
DUNS #
City
Seattle
State
WA
Country
United States
Zip Code
98195