The primary goal of this project is to develop novel statistical methods to handle spatial uncertainty in the event locations when conducting cancer risk estimation. We consider two different types of spatial uncertainty, specifically, 1) coarsenin due to the practice of releasing location information at an area level but not the point level and 2) geocoding error resulting from the use of geographic information systems software to convert residential addresses to geographic coordinates (i.e. longitudes and latitudes). Cancer epidemiologists can extract data from many different sources such as census, statewide health surveys, tumor registries and population-based case-control studies, and each source may yield data with different types of spatial uncertainty. Analytic methods are usually adversely affected by the presence of spatial uncertainty, resulting in biased parameter estimates, inflated standard errors, and reduced statistical power to detect spatial clustering and trends. To address these challenges, we propose a set of highly versatile estimation procedures to account for the spatial uncertainty and to efficiently combine data obtained from multiple sources. These procedures are based upon established theories on estimating equations and as such they can be easily implemented in practice. Compared with existing methods, the proposed methods are novel because 1) they permit the inclusion of individual-level risk factors for subjects with spatially uncertain locations, 2) the proposed intensity model admits a flexible semiparametric form and hence removes potentially restrictive assumptions such as the population density being constant over small geographic areas, and 3) they explicitly account for spatial correlation in the disease locations in both parameter estimation and statistical inference. In the substantive applications, we propose to supplement population-based case-control data with tumor registry data, census data and statewide health survey data. To the best of our knowledge, such an approach would be the first in the field and unparalleled. We will implement our proposed methods in a free, user-friendly R package. Our package will provide much- needed tools for more objective investigations of cancer risk factors by accounting for spatial uncertainty in the event locations. It will allow researchers to take advantage of the full spectrum of available data and use the data more effectively to reduce the burden of disease.

Public Health Relevance

This project will develop novel statistical methods to handle spatial uncertainty in the event locations when conducting cancer risk estimation and to combine data obtained from multiple sources. The proposed methods provide much-needed tools for more objective investigations of cancer risk factors by accounting for the spatial uncertainty. They also allow researchers to take advantage of the full spectrum of available data and use the data more effectively to reduce the burden of disease.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Research Project (R01)
Project #
1R01CA169043-01A1
Application #
8496431
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Zhu, Li
Project Start
2013-06-20
Project End
2017-04-30
Budget Start
2013-06-20
Budget End
2014-04-30
Support Year
1
Fiscal Year
2013
Total Cost
$448,311
Indirect Cost
$87,444
Name
University of Miami Coral Gables
Department
Miscellaneous
Type
Other Domestic Higher Education
DUNS #
625174149
City
Coral Gables
State
FL
Country
United States
Zip Code
33146
Huang, Hui; Ma, Xiaomei; Waagepetersen, Rasmus et al. (2014) A new estimation approach for combining epidemiological data from multiple sources. J Am Stat Assoc 109:11-23
Li, Yehua; Guan, Yongtao (2014) Functional Principal Component Analysis of Spatio-Temporal Point Processes with Applications in Disease Surveillance. J Am Stat Assoc 109:1205-1215