High-throughput informatics for antibodies

MacCarthy, Thomas

Abstract

Somatic hypermutation (SHM) of the Immunoglobulin (Ig) loci is a fundamental process in generating antibody diversity. High-throughput methods for profiling mutation spectra of Ig genes such as Roche 454 deep sequencing remain inaccurate and are expensive in part due to specialized bioinformatics required for post-processing. Recent improvements to the Illumina MiSeq platform allowing paired-end 2x300 nt reads will enable more accurate IgV sequencing at a >60 times lower cost than the 454 platform. As costs fall further, platforms such as MiSeq will become common lab equipment, but will only be useful if the appropriate bioinformatics tools are available. Accurate deep sequencing of the Ig loci is rapidly becoming the standard in a broad range of clinical applications including determining prognosis and detection of Minimal Residual Disease in B cell malignancies, characterization of autoimmune diseases and evaluating vaccine responses.
In Aim 1 we will develop a user-friendly bioinformatics pipeline (SHMPrep) to improve mutation calls for IgV sequences from the Illumina MiSeq platform. We hypothesize that statistical modeling of independent PCR vs sequencing error effects can improve the quality of MiSeq IgV sequences to levels comparable to Sanger sequencing. The pipeline will be integrated with a previously developed analysis tool (SHMTool) to allow non computer experts, such as most clinicians, to process MiSeq IgV datasets on an ordinary desktop computer. As high-throughput data accumulates it becomes more important to have analysis methods for the data we already have rather than producing yet more data. IgV mutation spectra depend on many factors including base composition, abundance and location of activation induced deaminase (AID) hot and cold spots, Pol-? hot spot composition and overall mutation frequency. This complexity makes it difficult to compare mutation spectra from different IgV regions.
In Aim 2 we will develop statistical methods for comparing different IgV regions taking into account sequence composition as well as mutation saturation and strand bias, which is important in identifying repair defects in immunodeficiencies such as AIDS and in B-cell malignancies and other cancers. We still understand little about the differences between the IGHV genes. Why are there so many V regions and such strong associations between particular Ig genes and immune responses? In Aim 3 we will develop a statistical model for predicting mutation frequencies that will allow known molecular interactions to be represented, for example, the interaction between AID targeting and error-prone mismatch repair. Predicted mutation frequencies from the model will be used by SHMTool to provide a comparative benchmark in situations where no control dataset is available. The model will be used to characterize each IGHV gene at a deeper level than was previously possible, allowing cross-species comparisons. In the longer term such a model will facilitate a better understanding of evolutionary changes in the IGHV genes and repertoire.

Public Health Relevance

Somatic hypermutation (SHM) of Immunoglobulin variable (V) regions is fundamental to the generation of antibody diversity. High-throughput V-region sequencing has a broad range of clinical applications including determining prognosis and detection of Minimal Residual Disease in B cell malignancies, characterization of autoimmune diseases and evaluating responses to infections and vaccines. We will develop the bioinformatics tools necessary for processing of SHM data obtained from high-throughput platforms as well as developing methods for comparison of mutations from different V-regions and even different organisms.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 5R01GM111741-04
Application #: 9392565
Study Section: Cellular and Molecular Immunology - B Study Section (CMIB)
Program Officer: Marino, Pamela

Project Start: 2015-01-01
Project End: 2019-11-30
Budget Start: 2017-12-01
Budget End: 2019-11-30
Support Year: 4
Fiscal Year: 2018
Total Cost
Indirect Cost

Institution

Name: State University New York Stony Brook
Department: Biostatistics & Other Math Sci
Type: Biomed Engr/Col Engr/Engr Sta
DUNS #: 804878247

City: Stony Brook
State: NY
Country: United States
Zip Code: 11794

Related projects


NIH 2018 R01 GM	High-throughput informatics for antibodies MacCarthy, Thomas / State University New York Stony Brook
NIH 2017 R01 GM	High-throughput informatics for antibodies MacCarthy, Thomas / State University New York Stony Brook
NIH 2016 R01 GM	High-throughput informatics for antibodies MacCarthy, Thomas / State University New York Stony Brook
NIH 2015 R01 GM	High-throughput informatics for antibodies MacCarthy, Thomas / State University New York Stony Brook	$296,508

Publications

Shapiro, Maxwell; Meier, Stephen; MacCarthy, Thomas (2018) The cytidine deaminase under-representation reporter (CDUR) as a tool to study evolution of sequences under deaminase mutational pressure. BMC Bioinformatics 19:163

Shapiro, Maxwell; Meier, Stephen; MacCarthy, Thomas (2018) Correction to: The cytidine deaminase under-representation reporter (CDUR) as a tool to study evolution of sequences under deaminase mutational pressure. BMC Bioinformatics 19:256

Dong, Qiwen; Smith, Kyle R; Oldenburg, Darby G et al. (2018) Combinatorial Loss of the Enzymatic Activities of Viral Uracil-DNA Glycosylase and Viral dUTPase Impairs Murine Gammaherpesvirus Pathogenesis and Leads to Increased Recombination-Based Deletion in the Viral Genome. MBio 9:

Yuan, Chaohui; Chu, Charles C; Yan, Xiao-Jie et al. (2017) The Number of Overlapping AID Hotspots in Germline IGHV Genes Is Inversely Correlated with Mutation Frequency in Chronic Lymphocytic Leukemia. PLoS One 12:e0167602

Chen, Jeffrey; MacCarthy, Thomas (2017) The preferred nucleotide contexts of the AID/APOBEC cytidine deaminases have differential effects when mutating retrotransposon and virus sequences compared to host genes. PLoS Comput Biol 13:e1005471

Patten, Piers E M; Ferrer, Gerardo; Chen, Shih-Shih et al. (2016) Chronic lymphocytic leukemia cells diversify and differentiate in vivo via a nonclassical Th1-dependent, Bcl-6-deficient process. JCI Insight 1:

Maul, Robert W; MacCarthy, Thomas; Frank, Ekaterina G et al. (2016) DNA polymerase ? functions in the generation of tandem mutations during somatic hypermutation of antibody genes. J Exp Med 213:1675-83

Shin, Jeewoen; MacCarthy, Thomas (2016) Potential for evolution of complex defense strategies in a multi-scale model of virus-host coevolution. BMC Evol Biol 16:233

Shin, Jeewoen; MacCarthy, Thomas (2015) Antagonistic Coevolution Drives Whack-a-Mole Sensitivity in Gene Regulatory Networks. PLoS Comput Biol 11:e1004432

Wei, Lirong; Chahwan, Richard; Wang, Shanzhi et al. (2015) Overlapping hotspots in CDRs are critical sites for V region diversification. Proc Natl Acad Sci U S A 112:E728-37

Comments

Be the first to comment on Thomas MacCarthy's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: