Biosense is rapidly incorporating new data sources and data types. As Biosense grows, sensitivity and specificity of detection will depend on how the data integration problems are addressed, including data delays, sparse data from some regions, and heterogeneous input signals. Without automated approaches to these fundamental problems, it will be difficult for Biosense to scale. We will develop a systematic pipeline of PHIN-compliant methods that will automate the process of evaluating and integrating new signals into Biosense in a manner that maximizes sensitivity and specificity. The three main stages of the pipeline are: 1) Assessing and adjusting for data availability - Biosense data acquisition is continually subjected to crippling systemic delays that drastically reduce the timeliness and radically undermine the sensitivity of the system. We will increase sensitivity and specificity of detection by evaluating data completeness and compensating for missing data using model-based extrapolation. We will also use a multivariate approach to help distinguish between changes in data availability and changes in actual event counts. 2) Determining optimal aggregation approaches - The approach to data aggregation directly affects sensitivity and specifcity of detection. We will increase sensitivity and specificity of detection by systematically determining the best level of aggregation at which to model the data. We will also use unsupervised clustering approaches to group data in the manner that maximizes sensitivity and specificity. 3) Integrating multiple signals - As Biosense grows to include additional data sources and analytic methods, the number of signals that need to be tracked will quickly grow to a level that overwhelms the Biosense Biointelligence Monitors. We will increase sensitivity and specificity by optimally integrating multiple signals using a nonparametric multivariate modeling approach. We will also develop empirically optimized multivariate threshold functions to integrate multiple univariate test statistics. The PHIN-compliant methods developed will be released into open source for the benefit of the public health community. These tools can be used by Biosense profesisonals to evaluate new and existing data sources, assess and adjust for data delays, and optimallydata aggregate the data and integrate it into the existing Bisoense system.

Agency
National Institute of Health (NIH)
Institute
Public Health Practice Program Office (PHPPO)
Type
Research Project (R01)
Project #
1R01PH000040-01
Application #
7098592
Study Section
Special Emphasis Panel (ZPH1-SRC (99))
Program Officer
Cyril, Juliana K
Project Start
2005-09-30
Project End
2008-09-29
Budget Start
2005-09-30
Budget End
2006-09-29
Support Year
1
Fiscal Year
2005
Total Cost
$459,329
Indirect Cost
Name
Children's Hospital Boston
Department
Type
DUNS #
076593722
City
Boston
State
MA
Country
United States
Zip Code
02115
Reis, Ben Y; Kohane, Isaac S; Mandl, Kenneth D (2009) Longitudinal histories as predictors of future diagnoses of domestic abuse: modelling study. BMJ 339:b3677
Reis, Ben Y; Kirby, Chaim; Hadden, Lucy E et al. (2007) AEGIS: a robust and scalable real-time public health surveillance system. J Am Med Inform Assoc 14:581-8
Reis, Ben Y; Kohane, Isaac S; Mandl, Kenneth D (2007) An epidemiological network model for disease outbreak detection. PLoS Med 4:e210