The first reference human genome sequence was completed in 2000 and published in early 2001. That landmark accomplishment was greeted with great enthusiasm about the possibility that a catalog of human genes could rapidly yield insight into the molecular basis for human disease. However, it soon became clear that nearly every disease represented a complex phenotype and that unraveling the potential genetic causes would require the analysis of patient populations and the integration of clinical/phenotype information with genomic data if substantial progress were to be made. Recognizing the need for such data and for access to biospecimens, many of the institutes at the National Institutes of Health, including the National Institute on Drug Addiction (NIDA), publi and private hospitals, universities, healthcare organizations, private and public charities, and private companies began amassing biorepositories (or biobanks). Indeed, a 2012 study identified 636 biobanks in the US and we estimate that there are thousands of biobanks worldwide. While hundreds of millions of samples have been archived, and diverse clinical and phenotypic data have been collected on these samples, accessing the underlying data present particular challenges. Enabling broader use of these samples by providing tools to analyze both the associated clinical, survey, and phenotype data will be essential to gain the greatest value from the already large investment in assembling the data. Indeed, NIDA has identified the creation of a data repository and software tools for addiction-related clinical research data as a priority area for funded development. At GenoSpace, we have developed advanced, user-friendly tools for the secure, scalable, and robust storage of complex clinical, survey, phenotype, and multi-omic data. In working on a variety of projects, we recognized that these same tools could be adapted to the information stored in biobanks, providing users with not only with a simple, intuitive way of defining cohorts and selecting samples for further study, but also of analyzing the existing data and any additional information that are collected to search for correlations within the data that could be used to shed insight into disease. In this Fast Track application, we propose to adapt our existing research and cohort-identification platform to capture the complex data associated with the banked samples in the NIDA Center for Genetic Studies (NIDA CGS;http://zork.wustl.edu/nida/) biobank (Phase I) and to extend the platform to allow users to perform complex analysis of the existing data as well as to securely upload and integrate their own data to enable more complex analyses to search for genetic and multi-omic correlations with clinically relevant endpoints.
The National Institute of Drug Addiction (NIDA) identified the creation of a data repository and software tools for addiction-related clinical research data as one of its key priorities for investment in small business innovative research (SBIR). GenoSpace, LLC is proposing to adapt its robust, scalable research platform to aggregate broad-based clinical, phenotype, and survey data collected for samples in the NIDA Center for Genetic Studies Biobank and to support intuitive graphically-driven complex queries of that data and information (Phase I). Having accomplished that task, we propose to develop advanced analytical tools to support queries across data types and to enable users to securely integrate their own data for analysis (Phase II).