Gene expression pattern is determined by the complex network over cis-regulatory elements and trans-acting factors. RNA-seq quanti?cation of allele-speci?c expression and genotype data from matched individuals provide opportunities to understand how this gene regulatory network is wired and modi?ed by genetic variants. So far, analyses of such datasets have been performed only on a single-gene basis, ignoring the complex network over many interacting genes, and only with data collected in batch, treating each sample as equally valuable, even though RNA-seq and genome sequence data from each sample are informative only in speci?c circumstances. To address these limitations, we propose to combine allele-speci?c expression quantitative trait locus (eQTL) mapping with genetical genomics approach to reconstruct gene networks by treating genetic variants as naturally- occurring perturbations of allele-speci?c expression and to actively guide the data collection process to ef?ciently capture the most informative naturally occurring perturbations in data. The computational framework we propose to develop is the ?rst to address this problem and will include 1) probabilistic graphical models for representing and learning gene networks perturbed by cis- and trans-acting eQTLs and 2) active sample selection algorithms for assessing for which samples to collect additional RNA-seq or genotype data and updating the current network model with new samples. We will apply our computational technique to simulated, mouse intercross, and the eQTLGen Consortium data to reconstruct gene networks perturbed by genetic variants and to compare the performance of active and batch learning strategies. In particular, we will explore the possibilities of implementing active data collection strategy in rodent studies in a collaborative research between a computational biologist and a mouse geneticist. The proposed research will provide biomedical researchers with a general computational framework for unraveling the gene regulatory mehanisms and cis-/trans-acting eQTLs that give rise to diseases with cost-effective data collection strategies.

Public Health Relevance

Our proposed research will provide biomedical researchers and clinicians with a powerful statistical framework for collecting and analyzing RNA-seq and genome sequencing data to understand the gene regulatory mechanism and genetic architecture of various different tissue types and diseases. Our computational tool will enhance our understanding of the disease-related expression quantitative trait loci (eQTLs) by further distinguishing between cis- and trans-acting eQTLs with high accuracy, while achieving this accuracy with as few samples as possible via active learning. Our analysis of data from LGSM AIL mice and the eQTLGen Consortium will generate a set of candidate cis- and trans-acting eQTLs that explain the gene expression variability in the given tissue type and will produce an active learning framework that researchers can adopt to design their own studies that involve active data collection.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Exploratory/Developmental Grants (R21)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Sofia, Heidi J
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Carnegie-Mellon University
Biostatistics & Other Math Sci
Schools of Arts and Sciences
United States
Zip Code