During the course of an immune response, B cell that initially bind antigen with low affinity through their Immunoglobulin (Ig) receptor are modified through cycles of somatic hypermutation and affinity-dependent selection to produce high-affinity memory and plasma cells. This affinity maturation is a critical component of T cell dependent adaptive immune responses. It helps guard against rapidly mutating pathogens and underlies the basis for many vaccines. The ability to detect selection in experimentally-derived Ig sequences is a critical part of many studies. Such techniques are useful not only for understanding the response to pathogens, but also to determine the role of antigen-driven selection in autoimmunity, B cell cancers, and the diversification of pre-immune repertoires in certain species. Despite its importance, quantifying selection in B cell Ig sequences is fraught with difficulties. The necessary parameters for statistical tests (such as the expected frequency of replacement mutations in the absence of selection) are non-trivial to calculate, and results are not easily interpretable when analyzing more than a handful of sequences. In this proposal, we bring together a multi- disciplinary group to develop and make available new computational methods for quantifying and visualizing immune selection in somatically hypermutated B cells. Existing models of somatic hypermutation targeting will be improved based on new experiments to significantly expand the available number of unselected mutations. In addition, we will extend models for the mutability of each nucleotide by determining the most informative neighboring positions. This will identify hot/cold-spot motifs that extend beyond two neighboring bases and allow for gapped motifs. Next, building on these improved models for mutation targeting, we will implement and validate new statistical tests for detecting selection by analyzing somatic hypermutation patterns. Current methods that consider the frequency of replacement mutations will be extended to account for amino acid traits. Furthermore, new methods based on lineage tree analysis will be developed for clonally-related sequences. Finally, we will develop methods to visualize large-scale datasets from emerging technologies that allow comprehensive analysis of entire B cell repertoires. All of these methods will be made available through our existing Ig sequence analysis website, which will also be significantly expanded to automate virtually all steps of the analysis pipeline.
The analysis of somatic mutation patterns in B cell Ig sequences to detect selection is a critical part of many studies. Such techniques are useful not only for understanding the response to pathogens, but also to determine the role of antigen-driven selection in autoimmunity, B cell cancers, and the diversification of pre-immune repertoires in certain species. In this proposal, we develop several computational methods to detect section with higher sensitivity and specificity, as well as visualization methods that will be helpful to analyze large-scale sequence data sets that are possible with new sequencing techniques.