This award is funded under the American Recovery and Reinvestment Act of 2009 (Public Law 111-5).
A current challenge for genetics and genomics research is discovering DNA sequences that alter the expression of genes operating during development and various physiological processes. This project will develop the statistical method of structural equation modeling to mine genome-wide DNA sequence and gene expression data to identify these connections. The research relies on methods developed in the area of computational Bayesian statistics to overcome computational difficulties posed by the scale and complexity of genomic data, which make extracting relevant biological information difficult.
This research will develop software for analyzing genomic data that will be used by researchers in the fields of medical, agricultural, and conservation genetics to discover important biological network relationships between DNA sequences and the expression of genes. The software will be used as a starting point for investigating why and how DNA sequences produce differences among individuals in both appearance and susceptibility to genetic diseases. The underlying statistical methods of the software will result in faster and more efficient computational network analysis for data collected at the scale of the entire genome.