Well-known and publicly available bioinformatics and biology databases include the NCBI [NCBI] and the NCI [NCI] databases. NCBI is a national resource for molecular biology information, and includes a genetic sequence database (GenBank) [Benson2002], human genes and genetic disorder database (OMIM) [McK1998], and molecular modeling database (MMDB) containing 3-D macromolecular structures [Wang2002]. The CBCTR database [CBCTR] in NCI provides clinical data for specimens distributed to clinical researchers. Examples of other bioinformatics databases include PDB [Sch2002], BioSig [Par2000a, Par2000b], CCDB [CCDB], and ECHBD [ECHBDa, ECHBDb]. PDB (Protein Data Bank) is the single worldwide repository for the processing and distribution of 3-D biological macromolecular structure data. Physical sciences spatial databases include SkyServer [Sza2002], ADEPT [Jan2002, Smith2001] and DIMES [Yang2001]. SkyServer provides online access to the public Sloan Digital Sky Survey [Sza2000] data. ADEPT is a distributed digital library of spatial map sets covering most of the wortd and includes images from satellite, space shuttle, aerial, and other sources. DIMES is an earth science data system that accepts metadata submissions in any valid XML format, thus placing no restrictions on metadata entries. There are extensive environmental data management databases located at the Environmental Protection Agency's Databases and Software website [EPA2008]. The Integrated Risk Information System (IRIS) is a database system designed to manage information on human health effects that may result from exposure to various substances in the environment. IRIS was developed and is currently maintained by the EPA's National Center for Environmental Assessment (NCEA) within the Office of Research and Development (ORD) [IRIS2008]. There is an existing online system for tracking toxic waste facilities in the US that is hosted by the NIH's National Library of Medicine called TOXMAP [TOXMAP]. This site combines mapping of toxic waste sites, can overtay layers for US census data, income date, and particularly detailed cancer occurrence data. This system provides us with a model on which to improve upon. We anticipate that when our study is completed that we can host a similar site that provides for mapping of contaminated well sites and preterm birth rates. What is particularty unique about the PRoTECT database system that will be developed is its capability for integration of multi-disciplinary datasets. In the CenSSIS database we worked with a wide range of applications including embryo viability, humanitarian demining, as well as many forms of cancer. While this diversity may present challenges to other database systems, given the experience we have had managing more than 70,000 datasets from biomedical, marine, geophysical and environmental applications, we are confident and have developed a systematic approach to handling the heterogeneity of the data that will be produced in PRoTECT.
Showing the most recent 10 out of 163 publications