Core B: Biobanking, Data Management and Bioinformatics Abstract The first round of NIH H3Africa funding for AWI-Gen enabled us to build up significant infrastructure and human capacity for biobanking, data management and bioinformatics. The core for this activity is the central hub at the Sydney Brenner Institute for Molecular Bioscience (SBIMB), University of the Witwatersrand, Johannesburg (Wits) but the participating centers all have significant data management capabilities as well. The SBIMB Biobank is now an approved biobanking facility for the processing, storage, handling and management of DNA samples, and has been approved by the relevant ethics committee of the University. We have a dedicated team of laboratory scientists who have gained much experience in dealing with variable sample quality to produce DNA samples with high yields and low levels of protein contamination. In terms of sample tracking and management. We have implemented a customized open-source laboratory information management system (LIMS) called the Ark. The Ark supports our specific requirements and our data administrator has been integral in its development and customization for our purposes. We curate samples not only from the AWI-Gen study, but from many smaller research projects. The primary tool for collecting demographic and phenotypic data is REDCap, which facilitates data capture at our collaborative centres across the four African countries; Burkina Faso, Ghana, Kenya, and South Africa (Klipin et al 2014). Each site has its own REDCap system, and data are backed up at Wits and then imported into a SQL database. The quality control regimen for data collection and curation in phase 1 of AWI-Gen worked well, and we will make further improvements based on our experience. A critical outcome of phase 1 was building an effective cross-site data management team through data management workshops and considerable informal interactions. Maintaining and building on these good working relationships with the data management teams from each of the collaborative centers, will be a priority for phase 2. In phase 1, we worked closely with the Data Management Task Force of the H3A Bioinformatics Network (H3ABioNet) both in training (giving and receiving) and building H3A-consortium wide SOPs and infrastructure and commit to doing so in phase 2. Our computational facilities based at Wits are sufficient for data storage and the bulk of the analytic work. In addition, we have access to resources at the SA Centre for High Performance Computing and our collaborator Bhatt has pledged access to her computational resources at Stanford. We have experience in cloud computing and are working with the Bioinformatics Network on strategies for deployment in H3A. We have satisfactory internet connectivity and have demonstrated capacity to move >100TB of data for a large project. Our bioinformatics team now has significant experience in several GWAS, NGS and metagenomics projects. The Wits node of the H3ABioNet (Bioinformatics network) was the first to receive accreditation in a bioinformatics workflow (GWAS) by an independent, international scientific review process. We have strength in analyzing population diversity and structure and are proposing developing new tools. We are part of the team designing the H3A Custom Array, our group has taken the leadership in two components of the planned H3A-wide genome analysis paper using the novel genomes that was generated in phase 1 of H3A. Our collaborators Bhatt, McQueen and Zeggini bring great depth of bioinformatics experience to the project. We will organize training and analysis activities to both deepen our advanced capacity and critically to broaden the bioinformatics skills within our collaborating centers to ensure that they have independent capacity for bioinformatics in the future. (Reference in support of our role in the implementation and support for REDCap at the University of the Witwatersrand: Klipin M, Mare I, Hazelhurst S, Kramer B. 2014 The Process of Installing REDCap, a Web Based Database Supporting Biomedical Research: The First Year. Applied Clinical Informatics 5(4), pp. 916-929, 2014. DOI: 10.4338/ACI-2014-06-CR-0054.)
Showing the most recent 10 out of 23 publications