Citizen-generated 311 reports are used by cities to identify service needs such as infrastructure repair, rodent infestations, heating outages, and illegal building use. Because citizen reports provide real-time condition assessment, city agencies analyze these data to understand and forecast problems and service demands. However, citizen reporting in response to conditions is not uniform; instead reporting frequency varies by socioeconomic and demographic group, cultural difference, differences in government trust, and access to e-government systems. That is, such reporting data carry systematic biases resulting from persistent spatial, racial, and economic inequalities. Consequently, predictive urban analytics based on citizen complaint data can result in discriminatory urban policy, planning, and decision-making, and misallocation of city resources, further reinforcing biases about neighborhood quality. This project seeks to improve efficacy of urban analytics based on citizen complaints (through 311 reports) by building statistical machine learning models to estimate reporting rate biases; providing tools to city decision makers, policy makers, and planers to visualize the spatial and socio-economic dependence of biases; and correct for the biases in responding to complaints --- leading to more just resource allocation.
This project involves three inter-related objectives: (1) to analyze the socio-spatial variance in the propensity to complain through the 311 system, (2) to understand the relationship between socioeconomic, demographic, and cultural factors and complaint behavior, and (3) to provide a methodology for city agencies to account for observed reporting biases, both in terms of reporting rate and potential severity of problems. To do so, the investigators develop a new methodological framework, integrating multiple data sources and incorporating approaches from machine learning and economics, for assessing, quantifying, and correcting reporting bias. Leveraging collaborations with New York City 311 (NYC311) and the Kansas City Office of Performance Management (DataKC), the research team will use data of more than 8,000,000 geo-located 311 reports annually in NYC and Kansas City from 2012 to 2017, code enforcement and building violation records (as validation data), neighborhood condition assessments, and a detailed citizen satisfaction survey of 21,046 individual responses from 2014 to 2017 covering all of Kansas City. These datasets will be integrated with detailed building and property data, socioeconomic and demographic data, and measures of community organization, social infrastructure, and political participation. Project outputs include: (1) a model to assess the probability of citizen reporting based on demographic, socioeconomic, cultural, and neighborhood factors, (2) a model to estimate under- and over-reporting behavior by neighborhood and to weight self-reported data for model training that accounts for observed biases, and (3) an interactive visualization tool to assist city managers, community organizations, and the general public in understanding spatial patterns of complaint reporting, the nature of reported problems, and the likelihood of under- and over-reporting. The insights of this project will form the basis for identifying, evaluating, and accounting for bias in citizen self-reported data, and produce transformative results that can contribute to the efficient and fair delivery of city services by leveraging predictive analytics and artificial intelligence. By modeling and improving the quality of citizen-generated data, the project provides a methodological basis for increasing citizens' participation (e.g. in governance, citizen science, and collaborative knowledge production) while ensuring that the data produced by such participation is representative, reliable, and useful.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.