As one of the three major components of the adaptive immune system, antibodies are essential to mounting successful disease responses. Though conservative estimates of the size of an individual's germline antibody repertoire are on the order of 1014, immune responses to chronic disease states suggest that the true space of antibody diversity is, in fact, far larger. Given the complexity of its organization and behavior, direct connections between disease responses and IGH function have been elusive. To date, only a small fraction of the antibodies generated by the immune system have been linked to specific disease stimuli. Furthermore, until recently there was only a single complete assembly of the IGH variable genomic region? which served as the sole reference sequence for the locus. Only in 2013 was a second assembly completed, this time from a single haplotype. The contrast between the two assemblies, with the newer assembly containing more than 100kb of previously uncharacterized sequence, several completely new IGHV genes, and numerous structural variations, makes it clear that the degree of diversity at this locus is under-appreciated. We will develop new algorithms and experimental approaches, utilizing recent advances in sequencing and molecular biology, to enable high-throughput extraction and resolution of the IGH locus and other hypervariable genomic regions. We have developed ?hybrid? approaches that combine third-generation long-read sequencing technology with short-reads from second-generation platforms. Our group and others have shown that such approaches are capable of far outperforming previous strategies in resolving complex structural variation, assembly contiguity, and haplotype phasing. We will build on these approaches to examine the IGH locus using selective targeting as well as through existing whole genome sequencing (WGS) data. All informatics tools for performing these analyses will be released as open-source software for the community. In order to target such large loci effectively and at reasonable cost, we have developed new enrichment strategies that are highly specific, do not rely on amplification (and therefore maintain epigenetic modifications), and maintain DNA contiguity in the region (to enable accurate reconstruction of constituent haplotypes). We will apply this methodology on available cell lines; this will be the first application of non-clone-based targeting of any genomic locus of this size. In addition, given the heavy reliance of the genomics community on short-read technologies for high-throughput targeted genotyping, we will develop a molecular protocol and bioinformatics approaches to pair the new 10X Genomics technology with a custom IGH capture panel as means to generate long-range haplotypes from short-read sequencing data. Together, this proposal will not only improve our understanding of IGH, potentially leading to better diagnostics and therapeutics, but will also provide a framework for studying other hypervariable, and biomedically important, gene regions.
Immunoglobulins are an essential component of the adaptive immune system, but due to their complex genomic organization direct connections between antibody response and disease stimuli have been elusive. We propose new algorithmic and experimental approaches, utilizing recent advances in molecular biology and sequencing, that will enable extraction and assembly of IGH haplotypes within an individual. These methods will dramatically improve our understanding of IGH while providing a framework for resolving other complex loci.