Compared with other diseases such as cardiovascular disease and diabetes, cancer is a relatively rare disease. The analysis of cancer incidence often suffers from the small population problem manifested in unreliable rate estimates, sensitivity to missing data and other data errors, and data suppression in sparsely populated areas. When creating maps of cancer incidence, the choice of areal unit of analysis (e.g., county or parish, zip code, census tract) and the geographic region of interest determine whether there will be sufficient numbers of cases in each area. For example, on the State Cancer Profiles website, cancer rates are mapped at the county or parish level. A map of Louisiana?s parish-level incidence rates for cancer of the brain and other nervous system would have rates suppressed for 43 (67%) of 64 parishes while a map of childhood cancer incidence would have rates suppressed for 53 (80%) parishes (see companion proposal from the Louisiana Tumor Registry (LTR)). In contrast, for California, brain/ONS and childhood cancer rates would be suppressed in only 13 (22%) and 21 (36%) of the state?s 58 counties, respectively. Meanwhile, rate variations within the largest counties or parishes such as Orleans, Jefferson, and East Baton Rouge in Louisiana and Los Angeles, San Diego, Alameda, and Santa Clara in California are not revealed. Rates in these areas have limited value to researchers and concerned citizens interested in describing cancer incidence patterns at finer geographic scales. Furthermore, within these county boundaries are areas with distinct concentrations of racial/ethnic groups and high and low socioeconomic status that may have different rates of cancer. Incidence rates may be generated for smaller and more homogeneous geographic units such as census tracts. The total population in a census tract (year 2000), however, ranges between 1,500 and 8,000 with an optimal size of 4,000, which would make these geographic units insufficient for estimating reliable tract-level incidence rates that would not jeopardize patients? privacy and confidentiality. Several geographic strategies have been proposed to mitigate the problem. Spatial smoothing computes average rates for each area of interest by incorporating rates in adjacent areas. Spatial smoothing methods include the floating catchment area method, kernel density estimation, empirical Bayes estimation, locally-weighted-average approaches, and adaptive spatial filtering. While spatial smoothing assists in the revealing of the overall trend of spatial patterns (see www.uiowa.edu/iowacancermaps for an example), the result is an estimate of the average rate derived from the area of interest and surrounding areas, but may not reflect the true rate for the area of interest. This proposal seeks to construct larger geographic areas from smaller areas in order for the total base population to be sufficiently large for generating reliable incidence rates. Geography has a long tradition of grouping areas together for the purposes of ?regionalization? or identifying ?spatial clustering?. Traditional methods place the first priority on attribute (e.g., sociodemographic characteristics) similarity within areas, and most are implemented manually or semi-automatically. Attribute information was first used to form initial regions and then applied several subjective rules and local knowledge to further adjust the region boundaries. Advancements in geographic information systems (GIS) technology have enabled researchers to develop methods automating the process. Two other earlier methods emphasized spatial proximity: space-filling curves to measure the nearness or spatial order of areal units and then grouped areas consecutively to reach a capacity constraint, and construction of regions of approximately equal population size by beginning with an area and adding the nearest areas to form each region with the desired threshold population. Neither of these methods however, account for within-area homogeneity of the attribute. Most recent work aims to develop GIS-based automated methods by accounting for spatial contiguity and attribute homogeneity within the derived areas. A preliminary assessment has identified two promising methods. A family of methods has been developed, termed ?Regionalization with dynamically constrained agglomerative clustering and partitioning (REDCAP)?, to identify clusters of areas. Using three distance definitions to measure attribute dissimilarity and two constraining strategies to account for spatial contiguity, REDCAP is a family collection of six methods. REDCAP allows users to specify the desired spatial contiguity, attribute dissimilarity, number of derived regions, and other parameters. A modified scale-space clustering (MSSC) method was devised to form a series of geographic areas. The scale-space theory is based on the notion that an image contains structures at different scales, and its more significant structures can be preserved as the scale of observation becomes coarser. Similar to this operation on an image, the MSSC method merges or melts areas of higher value with surrounding areas of lower values but similar structure to form larger areas. The process is guided by a clear objective of minimizing loss of information. The method does not depend on any probability distribution of the data and is robust for unsupervised hierarchical classification. Like REDCAP, the MSSC method does not guarantee that newly formed areas have a minimum population. Both the REDCAP and MSSC methods account for attribute similarity when grouping contiguous areas together. The major difference lies in the objective functions to be optimized during the clustering process. The REDCAP minimizes the total heterogeneity value (i.e., sum of squared deviations of all regions while the MSSC attempts to preserve the overall spatial structure by grouping around local maxima. Both methods have demonstrated advantages over other existing ones when evaluated for total heterogeneity, region size balance, internal variation, preservation of data distribution and spatial compactness. However, neither method has been applied to cancer studies. Analysis of cancer data merits special attention such as data confidentiality and privacy concerns, and offers unique challenges such as additional constraints (e.g., creating areas above threshold population and respecting important geopolitical boundaries). The proposed project plans to evaluate and modify these two methods to enhance the presentation and visualization of cancer surveillance data by geographic area. The study will combine adjacent similar small areas to mask identity while keeping areas with a sufficient number (e.g., ≥15) of cancer incidences and population (≥50,000) intact.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Research and Development Contracts (N01)
Project #
N01PC54402-13-0-5
Application #
7952665
Study Section
Project Start
2005-08-01
Project End
2010-07-31
Budget Start
Budget End
Support Year
Fiscal Year
2009
Total Cost
$63,529
Indirect Cost
Name
Louisiana State Univ Hsc New Orleans
Department
Type
DUNS #
782627814
City
New Orleans
State
LA
Country
United States
Zip Code
70112