The primary goal of this project is to develop novel statistical methods to handle spatial uncertainty in the event locations when conducting cancer risk estimation. We consider two different types of spatial uncertainty, specifically, 1) coarsenin due to the practice of releasing location information at an area level but not the point level and 2) geocoding error resulting from the use of geographic information systems software to convert residential addresses to geographic coordinates (i.e. longitudes and latitudes). Cancer epidemiologists can extract data from many different sources such as census, statewide health surveys, tumor registries and population-based case-control studies, and each source may yield data with different types of spatial uncertainty. Analytic methods are usually adversely affected by the presence of spatial uncertainty, resulting in biased parameter estimates, inflated standard errors, and reduced statistical power to detect spatial clustering and trends. To address these challenges, we propose a set of highly versatile estimation procedures to account for the spatial uncertainty and to efficiently combine data obtained from multiple sources. These procedures are based upon established theories on estimating equations and as such they can be easily implemented in practice. Compared with existing methods, the proposed methods are novel because 1) they permit the inclusion of individual-level risk factors for subjects with spatially uncertain locations, 2) the proposed intensity model admits a flexible semiparametric form and hence removes potentially restrictive assumptions such as the population density being constant over small geographic areas, and 3) they explicitly account for spatial correlation in the disease locations in both parameter estimation and statistical inference. In the substantive applications, we propose to supplement population-based case-control data with tumor registry data, census data and statewide health survey data. To the best of our knowledge, such an approach would be the first in the field and unparalleled. We will implement our proposed methods in a free, user-friendly R package. Our package will provide much- needed tools for more objective investigations of cancer risk factors by accounting for spatial uncertainty in the event locations. It will allow researchers to take advantage of the full spectrum of available data and use the data more effectively to reduce the burden of disease.
This project will develop novel statistical methods to handle spatial uncertainty in the event locations when conducting cancer risk estimation and to combine data obtained from multiple sources. The proposed methods provide much-needed tools for more objective investigations of cancer risk factors by accounting for the spatial uncertainty. They also allow researchers to take advantage of the full spectrum of available data and use the data more effectively to reduce the burden of disease.
|Chang, Xiaohui; Waagepetersen, Rasmus; Yu, Herbert et al. (2015) Disease risk estimation by combining case-control data with aggregated information on the population at risk. Biometrics 71:114-121|
|Huang, Hui; Ma, Xiaomei; Waagepetersen, Rasmus et al. (2014) A new estimation approach for combining epidemiological data from multiple sources. J Am Stat Assoc 109:11-23|
|Li, Yehua; Guan, Yongtao (2014) Functional Principal Component Analysis of Spatio-Temporal Point Processes with Applications in Disease Surveillance. J Am Stat Assoc 109:1205-1215|