Scientific understanding of adaptive immune receptors (i.e. antibodies and T cell receptors) has the potential to revolutionize prophylaxis, diagnosis, and treatment of disease. High?throughput DNA sequencing and functional experiments have now brought the study of adaptive immune receptors into the big?data era. To realize this potential of these data they must be matched with appropriately powerful analytical techniques. Existing probabilistic and mechanistic models are insufficient to capture the complexities of these data, while a nave application of machine learning cannot leverage our profound existing knowledge of the immune system. The goal of this project is to blend deep learning with mechanistic modeling in order to predict and understand the evolution and function of adaptive immune receptors.
Aim 1 : Develop generative models of immune receptor sequences that capture the complexity of real adaptive immune receptor repertoires. These will combine deep learning along with our knowledge of VDJ recombination, and provide a rigorous platform for detailed repertoire comparison.
Aim 2 : Develop quantitative mechanistic models of antibody somatic hypermutation that incorporate the underlying biochemical processes. Estimate intractable likelihoods using deep learning to infer important latent variables, and validate models using knock?out experiments in cell lines.
Aim 3 : Develop hybrid deep learning models to predict binding properties from sequence data, combining large experimentally?derived binding data with even larger sets of immune sequences from human immune memory samples. Incorporate structural information via 3D convolution or distance?based penalties. These tools will reveal the full power of immune repertoire data for medical applications. We will obtain more rigorous comparisons of repertoires via their distribution in a relevant space. These will reveal the effects of immune perturbations such as vaccination and disease, allowing us to pick out sequences that are impacted by these perturbations. We will have a greater quantitative understanding of somatic hypermutation in vivo, and statistical models that appropriately capture long?range effects of collections of mutations. We will also have algorithms that will be able to combine repertoire data and sparse binding data to predict binding properties. Put together, these advances will enable rational vaccine design, treatment for autoimmune disease, and identification of T cells that are promising candidates for cancer immunotherapy.

Public Health Relevance

Adaptive immune receptors (i.e. antibodies and T cell receptors) enable our body to fight off disease, ?remember? pathogens, and train the immune system through vaccination. Immunologists have learned via high?throughput sequencing that adaptive immune receptors have a truly remarkable diversity. In this proposal, we develop machine?learning methods for these sequence data, which will allow us to predict the maturation, statistical distribution, and binding properties of adaptive immune receptors, and thus to better design vaccinations, autoimmune disease treatment, and immunotherapy treatment for cancer.

National Institute of Health (NIH)
National Institute of Allergy and Infectious Diseases (NIAID)
Research Project (R01)
Project #
Application #
Study Section
Genetic Variation and Evolution Study Section (GVE)
Program Officer
Gondre-Lewis, Timothy A
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Fred Hutchinson Cancer Research Center
United States
Zip Code