We propose to build a cloud-based integrated solution for scalable, customizable, privacy-preserving, and interactive antibody repertoire analysis. Immune repertoire sequencing (IR-seq) has become a useful tool in both basic research and clinical settings. As the heart of the adaptive immunity to infection and many vaccines, the abundance and diversity composition of the B cell receptor (BCR) and its dynamic changes in health and diseases bear information of how to evaluate immune health, perform disease diagnosis and prognosis, and measure vaccination effect. However, there is a computational bottleneck for large scale antibody lineage construction, a lack of decomposable pipeline modules that preserve privacy and ownership, and a missing gap for interactive linage analysis and visualization. In this Phase I grant, we will (1) break the bottlenecks of pipeline processing and scale up the core algorithms to handle large sequence data sets; (2) protect private data and proprietary processing algorithms with modularized pipeline, integrated cloud-local processing, and data perturbation methods; 3) develop end-to-end web services for pipeline composition and interactive analysis visualization in a cloud-based deployment solution. Existing commercial efforts are mostly focusing on cancer related IR-seq analysis aiming to trace the disappearing of cancer cells after therapy, which solely focus on cataloging sequence species and abundance. This kind of analysis is much simpler and easier, compared to analyzing IR-seq data in infection and vaccination. Providing insights on host immune responses is a much more challenging but much needed task. Once the pipeline is built, it can be readily adapted to analyze cancer IR-seq data. Also, existing algorithms and optimizations that have been developed for other big data analysis can be further developed and applied to the IR-seq data analysis. We will use a publically available BCR repertoire data on an influenza vaccination cohort and a TCR repertoire data on an aging cohort to test the feasibility of the project. The long term goal of this proposal is to build cloud based accessible and customizable services for experts as well as non-specialists.
We aim to provide an integrated solution for the ingestion, processing, analysis, exploration and visualization, interpretation and sharing of data generated by deep sequencing of full length antibody and TCR repertoire. The success of this Phase I SBIR will provide a solid foundation for the product launch of a commercial cloud based solution in Phase II, during which we will continue our investigation on big data security and privacy to facilitate compliance with institutional policies and design and develop programmable APIs and tools to facilitate integrations with more third party modules and services.

Public Health Relevance

Fueled by the exponential decline in sequencing costs, next-generation sequencing (NGS) of the immune repertoire is increasingly being applied to gain insights into the adaptive immune response in healthy individuals and in those with a wide range of diseases, such as autoimmune disease, allergy, HIV and cancer. We are building cloud based data processing and analysis tools with service oriented architecture that can easily integrate existing and future analysis components developed by us or other teams as well as data sources and services distributed over the web and the cloud.

Agency
National Institute of Health (NIH)
Institute
National Institute of Allergy and Infectious Diseases (NIAID)
Type
Small Business Innovation Research Grants (SBIR) - Phase I (R43)
Project #
5R43AI136357-02
Application #
9644003
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Minnicozzi, Michael
Project Start
2018-02-06
Project End
2021-01-31
Budget Start
2019-02-01
Budget End
2021-01-31
Support Year
2
Fiscal Year
2019
Total Cost
Indirect Cost
Name
Immudx, LLC
Department
Type
DUNS #
080509967
City
Austin
State
TX
Country
United States
Zip Code
78750