The urban heat island is a phenomenon whereby urban areas have higher surface and atmospheric temperatures than surrounding suburban and rural regions. These higher temperatures lead to increases in building cooling energy use during summer, and can exacerbate urban air pollution, human thermal comfort, and heat waves. This project unites expertise in three fields---machine learning, civil and environmental engineering, and earth science---to develop a novel framework that integrates data-driven approaches and physics-based climate simulation models to better understand the physical process drivers of urban heat islands around the world. Better understanding the main drivers of urban heat islands in cities is critical for identifying the appropriate engineering solutions for mitigating urban warming from heat islands. This research will have broad societal impact by changing the ways that people participate in scientific data analysis tasks, and will build a stronger body of research in computational sustainability.

The proposed framework will integrate data-driven approaches and physically based models in one discovery process. It consists of three important steps: (i) latent feature discovery, which aims to automatically infer high-level feature representations from large scale observational data via deep networks. These latent features capture the complex nonlinear transformation of observed variables as a metaphor of latent physical processes; (ii) latent feature interpretation, which generates candidates of urban heat island drivers from the latent features via a compiled dataset. It provides insights into how these latent features are associated with physical processes; and (iii) urban heat island driver identification, which designs effective experiments to identify the causes of heat islands by varying observed variables from simulation models. The project is expected to advance the knowledge of key physical process drivers causing urban heat islands in cities. In addition, the developed framework has the potential to advance machine learning, including feature learning, causal analysis with confounders, and causal experiment design. The source code for computational tools and the data sets collected through the project will be freely disseminated to the broader research and educational community.

Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Southern California
Los Angeles
United States
Zip Code