The Specific Aim of this Phase II SBIR grant application is to launch ClonoByte, a website and cloud-based platform for the processing, storage, analysis, comparison, and visualization of immune signature data sets generated by deep sequencing of immune repertoires. There is a rapidly growing demand for T cell receptor (TCR) and immunoglobulin (Ig) repertoire sequencing as a routine clinical and research practice, but the lack of computational resources and tools is blocking widespread adoption. There is thus a strong additional need for resources to translate immune repertoire data generated by NGS into a format useful to immunology researchers and clinicians. The innovation of our product is to enable researchers with little or no bioinformatics expertise or resources to analyze and interpret immune repertoire sequencing data using a cloud-based platform. In Phase I, we first built individual computational algorithms and components for processing, analysis, and visualization of immune repertoire data, particularly (i) primary processing of sequencing data for V/J gene identification and complementarity-determining region 3 (CDR3) extraction, (ii) calculation of sample oligoclonality and sample-sample similarity, (iii) visualization of comparisons of multiple data sets, and (iv) statistical significnce testing of findings via a Monte Carlo method. We tested the computational modules with actual immune repertoire data generated in-house for internal research and for external customers. We then integrated several of the key components into the alpha version of ClonoByte. In Phase II, we will take the following steps to launch ClonoByte: (i) integrate remaining standalone computational modules into ClonoByte for beta release; (ii) perform software quality control (QC) and quality assurance (QA) and front- end usability testing on ClonoByte beta; and (iii) refine ClonoByte beta to launch version. We will be successful if the website can achieve the following in the hands of end-users: (i) securely upload raw immune repertoire sequencing FASTQ files for automated primary processing; (ii) track abundance and expansion of clonotype sequences across multiple annotated data sets; and (iii) compare data sets on the basis of whole- repertoire similarity, sample diversity, and behavior of specific clonotypes. After Phase II the website will be fully available for commercial use by immune repertoire researchers worldwide. GigaGen's pricing plan will be to offer free use of ClonoByte for the first 10 samples, and then charge a $50/year subscription fee for users that upload more than 10 samples. Though ClonoByte will enable analysis of immune repertoire sequencing data generated by most amplification kits or third-party services, GigaGen will use ClonoByte's best-in-class features and ease of use to promote its own GigaMune(R) lab services and amplification kits for immune repertoire sequencing.
New DNA sequencing methods have the potential to revolutionize how we monitor a patient's immune system during disease progression and treatment. We are building web-based data processing and analysis tools that allow clinicians and researchers to use DNA sequencing data to better understand the immune system.