Causal inference of gene regulatory networks with application to breast cancer

Fu, Audrey

Abstract

In the investigation of the mechanisms behind gene regulation and its impact on diseases, two lines of research have been largely separately carried out in recent years. On the one hand, gene regulatory networks and protein interaction networks have been under extensive study, especially in systems biology, where genetic variation is usually ignored. On the other hand, mutations, indels (insertions and deletions), and copy number variants have been identified for many diseases in genome-wide association studies. It is therefore of immense interest to understand how genetic variation influences disease through gene regulatory networks. To construct these networks, at least three key pieces of information are important: gene expression, transcription factor binding, and genotypes (especially at expression quantitative trait loci;that is, eQTLs). In particular, the later two enable causal inference in the network construction, although how to use them in a probabilistic and rigorous way has not been systematically explored. With my extensive experience in Bayesian statistics, I aim to develop statistical models and efficient computational strategies, drawing on recent advances in graphical models and causal inference, to construct causal regulatory networks involving genetic variation and TF binding. I will use breast cancer as a disease model and apply the proposed methodologies to different subtypes. Topological features of the inferred regulatory networks may suggest potentially different mechanisms in breast cancer subtypes. With the proposed research, I will not only develop general analysis methodologies to integrate various types of high-throughput genomics data and provide open-source software, but also establish their relevance to disease studies. With solid training in theoretical and applied statistics, as well as extensive experience collaborating with experimental biologists and working with a variety of biological data, I aim to make the transition from a statistician to a computational biologist and to become an independent investigator. I aspire to be not only an expert in developing sophisticated and rigorous statistical models and supplementing these models with efficient algorithms, but also a scientist capable of generating and testing my own hypotheses, either by myself or in collaboration with experimental biologists. The proposed K99/R00 award, involving one year of the mentored phase and three years of the independent phase, would greatly facilitate this transition, providing the unique opportunity for me to gain not only experience in genomic research in human, but also skills and experience in the wet lab, such that I can conduct some experiments on my own and eventually run an independent lab that focuses on computational research but also allows for experimental exploration.

Public Health Relevance

Genes interact with each other, forming gene regulatory networks, to produce and influence phenotypes. Many genetic variations in the genome - including mutations, indels (insertions and deletions), and copy number variation - have been identified for many diseases in genome-wide association studies. It is of immense interest to understand how these genetic variations influence disease through the networks. I will develop general statistical models and computational strategies for this purpose, and apply these methods specifically to breast cancer in order to understand the mechanisms behind different subtypes.