We propose to build a cloud-based platform and website for the processing, storage, analysis, comparison, and visualization of data generated by deep sequencing of the T cell receptor (TCR) repertoire. Profiling the abundance and expansion of clonally T cell subpopulations (as determined by TCR sequence) fills an unmet diagnostic need for monitoring the immune system during disease progression or in response to immunotherapy. Although several groups have used next-generation sequencing (NGS) to profile the TCR repertoire, widespread adoption of this and other NGS methods is hampered by a lack of bioinformatics tools and resources (Kahn, 2011;Pollack, 2011). Analysis tools do exist for generic NGS applications, but a technology gap remains for specialized applications such as TCR repertoire sequencing. GigaGen performs TCR repertoire sequencing as a service, and offering analysis tools would give us a competitive advantage over other providers while broadening the market for immune signature profiling methods in general. We have already developed a computational pipeline to process raw sequencer output, but we see a clear customer need for tools to interpret and analyze TCR repertoire data. In this Phase I grant, we will: (1) expand our existing pipeline to fully automate NGS data processing and transfer;(2) build a website for users to access and manage their data;and (3) develop algorithms, tools, and workflows that allow users to analyze complex data using cloud computing. The test of feasibility will be for immunology researchers with little computational expertise or resources to perform the following operations on their data through a standard web browser: (1) track abundance and expansion of user-specified TCR sequences across multiple data sets;(2) compare TCR repertoire data sets using resampling-based statistical tests that run in the cloud;and (3) create and configure informative, publication-quality data visualizations. The infrastructure we develop in Phase I will enable broad commercialization of TCR repertoire sequencing in Phase II. We also plan to use the infrastructure we develop as the basis for building a public website and repository for immune sequencing data, to be co-hosted with an educational institution or government entity.
New DNA sequencing methods have the potential to revolutionize how we monitor a patient's immune system during disease progression and treatment. We are building web-based data processing and analysis tools that allow clinicians and researchers to use DNA sequencing data to better understand the immune system.