Recent advances in high-throughput technologies have led to the broad applicability of immunogenomics in studying the adaptive immune repertoire. These technologies are capable of generating large-scale datasets that can be used across a wide range of biological domains, including immunology. The project will provide systematics computational resources for understanding the mechanisms and evolution of adaptive immune systems. This will be achieved by delivering robust and easy-to-use open-source software as well as empirical results in form of easy-to-use databases assembled by applying proposed bioinformatics methods on diverse and large-scale genomic datasets. The project will facilitate collaborations across disciplines and will bring together researchers and students from computer science, life science, and bioinformatics, leading to stronger interactions across these communities. Additionally, the project will develop an interactive educational platform for learning and training in big data analytic techniques using python-based interactive notebooks. The online platform will be specifically tailored towards students with limited prior exposure to computational sciences. The platform will be made available at the national level for faculty and students enrolled at teaching-focused institutions.

The project will develop efficient and scalable bioinformatics methods for improving current V(D)J reference databases and characterizing T and B cell receptor repertoire across a variety of vertebrate species. Specifically, the project will develop 1) robust and scalable methods to assemble V(D)J alleles from next-generation sequencing data, 2) accurate and robust species- and strain-specific methods to assemble B and T cell receptor repertoire from next-generation sequencing data. Additionally, the project will enrich existing immunogenomics databases of V(D)J alleles and receptor sequences across various vertebrate species by applying the developed methods across hundreds of thousands of samples. To promote the dissemination of obtained results, the assembled immune receptor sequences will be shared as an easy-to-use database with a rich set of functionalities. The developed database will allow life science researchers to systematically compare somatic events that give rise to receptor variation in vertebrate species and provide novel insight into the evolution of adaptive immunity. Results of the project can be found at

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

National Science Foundation (NSF)
Division of Biological Infrastructure (DBI)
Application #
Program Officer
Jean Gao
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Southern California
Los Angeles
United States
Zip Code