Biosense is rapidly incorporating new data sources and data types. As Biosense grows, sensitivity and specificity of detection will depend on how the data integration problems are addressed, including data delays, sparse data from some regions, and heterogeneous input signals. Without automated approaches to these fundamental problems, it will be difficult for Biosense to scale. We will develop a systematic pipeline of PHIN-compliant methods that will automate the process of evaluating and integrating new signals into Biosense in a manner that maximizes sensitivity and specificity. The three main stages of the pipeline are: 1) Assessing and adjusting for data availability - Biosense data acquisition is continually subjected to crippling systemic delays that drastically reduce the timeliness and radically undermine the sensitivity of the system. We will increase sensitivity and specificity of detection by evaluating data completeness and compensating for missing data using model-based extrapolation. We will also use a multivariate approach to help distinguish between changes in data availability and changes in actual event counts. 2) Determining optimal aggregation approaches - The approach to data aggregation directly affects sensitivity and specifcity of detection. We will increase sensitivity and specificity of detection by systematically determining the best level of aggregation at which to model the data. We will also use unsupervised clustering approaches to group data in the manner that maximizes sensitivity and specificity. 3) Integrating multiple signals - As Biosense grows to include additional data sources and analytic methods, the number of signals that need to be tracked will quickly grow to a level that overwhelms the Biosense Biointelligence Monitors. We will increase sensitivity and specificity by optimally integrating multiple signals using a nonparametric multivariate modeling approach. We will also develop empirically optimized multivariate threshold functions to integrate multiple univariate test statistics. The PHIN-compliant methods developed will be released into open source for the benefit of the public health community. These tools can be used by Biosense profesisonals to evaluate new and existing data sources, assess and adjust for data delays, and optimallydata aggregate the data and integrate it into the existing Bisoense system.
Reis, Ben Y; Kohane, Isaac S; Mandl, Kenneth D (2009) Longitudinal histories as predictors of future diagnoses of domestic abuse: modelling study. BMJ 339:b3677 |
Reis, Ben Y; Kirby, Chaim; Hadden, Lucy E et al. (2007) AEGIS: a robust and scalable real-time public health surveillance system. J Am Med Inform Assoc 14:581-8 |
Reis, Ben Y; Kohane, Isaac S; Mandl, Kenneth D (2007) An epidemiological network model for disease outbreak detection. PLoS Med 4:e210 |