Cutting-edge computational bioinformatics would be aided by a true database management system (DBMS) that is extensible, fast, and accesses and manages heterogeneous data sources under local control of the researcher. VisualMetrics' (VMC) Phase I efforts resulted in the innovative design of a """"""""field stream"""""""" DBMS that is: inherently extensible; 40 times smaller and 60 to 350 times faster than a relational DBMS; and smaller and as fast as tailored data formats. VMC's Phase II efforts will: l) broaden the number of molecular biology data repositories linked to the system; 2) use field streams for managing and reconciling heterogeneous data schemas and metadata; 3) incorporate semi-automated techniques for """"""""data scrubbing"""""""" and annotation; and 4) package these capabilities in a graphical interface with built-in query and analysis capabilities suitable for the computational bioinformatics researcher with limited programming expertise. VMC's system is specifically designed to easily incorporate and integrate new data characterizations and annotations without change to existing data. The system supports the richest data characterizations possible (ASN.1 and beyond) and diverse data types including multimedia. Superior performance results from direct data access. Our field stream DBMS also has broad applicability to data warehousing or other commercial markets where extensibility, flexibility and performance are critical.
Data warehousing and genome informatics are both high-growth and multi-billion dollar markets. Both are fueling exponential growth in data volumes with demands for ever-more sophisticated computations. VMC's technology is well-suited to get these requirements. Success on this Phase II effort will bring visibility to VMC and position the genetic research community to influence and early adopt this next-generation data management system tailored to bioinformatics.
Lushbough, Carol; Bergman, Michael K; Lawrence, Carolyn J et al. (2010) BioExtract server--an integrated workflow-enabling system to access and analyze heterogeneous, distributed biomolecular data. IEEE/ACM Trans Comput Biol Bioinform 7:12-24 |
Lushbough, Carol M; Bergman, Michael K; Lawrence, Carolyn J et al. (2008) Implementing bioinformatic workflows within the bioextract server. Int J Comput Biol Drug Des 1:302-12 |
Zhu, Wei; Brendel, Volker (2002) Gene structure identification with MyGV using cDNA evidence and protein homologs to improve ab initio predictions. Bioinformatics 18:761-2 |