Differences in our individual genomes give rise to most of human diversity. A decade removed from the Human Genome Project, much of how the genome directs phenotype still remains a mystery. Consortiums like ENCODE seek to identify functional DNA elements in humans and other model organisms by correlating functional outputs with sequence using genome-wide data sets. However, these studies do not necessarily improve our ability to interpret how DNA elements act in new contexts or when mutated. Such an understanding will be critical to predict the effects of sequence alterations on phenotype and to engineer biology for future medicinal or technological purposes. Combinations of DNA elements act as codes controlling particular functions like transcription, splicing, localization, and silencing. Deciphering these codes is difficult, as the limited set of natural variants is typically insufficient to control for variables such as sequence composition or element combinations. Proving that particular sequences have causative effects on gene expression requires carefully controlled reverse genetic studies. Conducting such experiments on genome-wide scales is difficult because of our inability to (1) rapidly alter the sequence and context of individual genetic elements and (2) quantify the consequences of thousands of such changes. My central vision is to decipher cis-regulatory codes controlling gene expression by scaling reverse genetics experiments to genomic scales using multiplexed measurements of defined synthetic DNA libraries. I will build upon my work developing next-generation gene synthesis technologies and multiplexed reporter assays to systematically determine how sequences governing mammalian gene expression act in concert by doing thousands of controlled experimental tests simultaneously. Here, we will apply these technological developments to study how genetic regulatory elements control the process of pre-mRNA splicing. The major sequence elements controlling splicing, namely the splice donor, acceptors, and branch sites, do not convey enough information to specify exon inclusion or exclusion alone. Other regulatory elements, such as exonic or intronic splicing enhancers and suppressors, are known to affect splicing in a complex code that can vary based on tissue or cell type. We will systematically interrogate and refine the splicing code by leveraging the new technological developments proposed here. Studying splicing will help focus development of a complete suite of tools and technologies, which will later let us attack other forms of cis-elements controlling gene regulation.
Any individual's human genome will have several million deviations from the consensus human genome sequence. Understanding whether these mutations are relevant or important is difficult because most mutations are uncommon and we do not have perfect understanding of how sequence affects function. Here we will develop new high throughput methodologies to understand how and the extent to which this genetic variation affects function to develop better genetic diagnostics and therapeutic interventions.