The natural immune response to foreign pathogens involves a complex coordination of cells, including an adaptive response to select and secrete antibodies into circulation. Individuals who have recovered from a pathogenic infection retain immune memory and continue to circulate pathogen-specific antibodies. For many infectious diseases like respiratory syncytial virus, Ebola, and poxviruses, antibodies with neutralizing activity to multiple viral strains have been discovered from human survivors. The discovered antibodies are highly valuable as potential biologic therapeutics for the broader population, as the antibodies have been naturally optimized to defend against human pathogens. Efforts to discover antibodies from humans recovering from coronavirus infection are underway, SARS-CoV-2 in particular, but are hampered by the limitations of existing antibody discovery platforms. Current approaches require screening for live B cells actively producing pathogen-specific antibodies, which are sensitive to cell death and rarely found in blood. In contrast, antibody protein is stable and pathogen-reactive antibodies are abundant in serum. While protein is the ideal material to start with, characterization of polyclonal antibody (pAb) protein presents new challenges. Recent advances in mass spectrometry and analysis have shown individual antibody candidates can be derived from affinity-purified pAb proteins when a sufficiently matched B-cell genetic antibody repertoire is provided.
We aim to develop algorithms to supplant the need of a genetic antibody repertoire, and de novo identify antibody candidates from limited complexity pAb samples. This is achieved by improvements to de novo peptide sequencing using machine learning, and targeted assembly of specificity determining regions (CDR3s) and antibody frameworks using de Bruijn graphs. The proposed software will provide estimates of clonal diversity and candidate sequences that can be synthesized and tested for reactivity. In addition to addressing needs for infectious diseases, as demonstrated with an urgent unmet need to stop the COVID-19 pandemic, the software also applies to clinical and biomedical research needs in autoimmune disease, and commercial interests in replacing polyclonal antibody reagents with highly reproducible monoclonal antibody equivalents.
The antibody repertoire circulating in serum is the most active component to an immune response, yet is unable to be directly queried without aid from the genetic antibody repertoire from cells. Our innovative algorithms enable querying and sequencing of antibodies directly from protein - opening the door to previously inaccessible autoimmune and infectious disease research and therapeutic antibody discovery.