Genomes evolve and diversify through different mechanisms, including small point mutations, and large structural variations (SV). As entire populations of individuals get sequenced, we observe a complex mosaic of patterns. Some of these are characteristic of a selective constraint such as tolerance to lack of oxygen (for highlander populations), or lactose tolerance. In one aim of the proposal, the investigators develop computational techniques for identifying characteristic genetic patterns to identify genes that are adapting to these selective constraints. The other aims to reconstruct regions with complex variation patterns such as the Killer cell Immunoglobulin-like Receptor (KIR) region. KIR diversity plays a significant role in mediating immune response, helping with an understanding of diseases including rheumatoid arthritis, control of HIV disease progression as well as the success rate of cell replacement therapy for certain leukemias (blood cancer). The investigators will use a mix of techniques from combinatorial algorithms, machine learning, and population genetics to decode the genetic patterns. The proposal has broader impact in the field as part of a larger effort to develop efficient computational tools for genetic analysis; a critical problem in the modern era of inexpensive sequencing. The tools and technologies described here will have a direct impact on understanding the genetic diversity of populations, and towards a personalized approach to healthcare.
The proposal seeks to decipher the observed genetic variation across populations using two thrusts. In one thrust, it looks to haplotype genomic structural variation, and discover the genomic architecture of complex immunological regions like KIR and HLA. In a second thrust, the investigators analyze patterns of variation that are indicative of selective constraints. For selection signatures, the investigators will provide a better understanding of currently available tests using the scaled site frequency spectrum, and use an algorithmic approach to identify a better discriminator. For the rearranged genomic regions, the investigators will use optimization algorithms to adjust read coverage in highly repetitive regions. The proposal has broader impact in the field as part of a larger effort to develop effcient computational tools for genetic analysis; a critical problem in the modern era of inexpensive sequencing. The tools and technologies described here as well will have a direct impact on understanding the genetic diversity of under-represented populations, and towards a personalized approach to healthcare. The proposed research is tightly connected to undergraduate and graduate education, as all research here will be directly incorporated in interdisciplinary classes. The PI has a strong track record mentoring womena and other under-represented students in Computer Science.