The specific aims of the parent grant (R01 ES028033) are to develop new methods to assess the association between long-term exposure to multiple environmental agents (air pollution and weather) on several health outcomes (e.g. risk of cardiovascular disease, mortality); and identify vulnerable populations. As part of the work conducted in Aim 1 of the parent grant, we developed two scientific R packages that address two common challenges in environmental health: 1) how to estimate flexibly the exposure-response curve under a statistical approach that allows users to assess causality from analyses of observational data1 (GPSmatching); and 2) how to identify subgroups of the population that are more or less vulnerable to adverse health effects of environmental exposure2 (denovo). While most environmental health research traditionally relies on existing well-established statistical approaches, there is increasing interest in the community to adopt new cutting-edge statistical tools, such as GPSmatching and denovo. GPSmatching and denovo are valuable R packages and offer innovative functionalities to researchers, however, they are not well designed in terms of software engineering best practices to be accessible to a broader scientific community and are not optimized for larger datasets and efficient utilization on the cloud. With this administrative proposal, we plan to establish a sustainable collaboration with professional software engineers to redeploy our packages and disseminate them to a wider research community. More specifically, our aims include; refactoring and improving the source code of both the GPSmatching and denovo R packages in order to meet the best software engineering practices and community open source development, improving cloud readiness by optimizing and parallelizing the R packages, and porting the packages to Python. The refactoring of the GPSmatching and denovo software packages will enable a very large user base of researchers to efficiently utilize the tools, add capabilities to the codebase, and offer improvements to the algorithms. This proposal will make these tools widely accessible to the community and will establish workflows and procedures that will allow more easily for the future dissemination of other innovative statistical methods.

Public Health Relevance

As part of the work conducted in Aim 1 of the parent grant (R01 ES028033), we developed two scientific R packages that address two common challenges in environmental health: 1) how to estimate flexibly the exposure-response curve under a statistical approach that allows users to assess causality from analyses of observational data1 (GPSmatching); and 2) how to identify subgroups of the population that are more or less vulnerable to adverse health effects of environmental exposure2 (denovo). While most environmental health research traditionally relies on existing well-established statistical approaches, there is increasing interest in the community to adopt new cutting-edge statistical tools, such as GPSmatching and denovo, however, these packages are not well designed in terms of software engineering best practices to be accessible to a broader scientific community and are not optimized for larger datasets and efficient utilization on the cloud. We plan to establish a sustainable collaboration with professional software engineers to redeploy our packages which will enable a very large user base of researchers to efficiently utilize the tools, add capabilities to the codebase, and offer improvements to the algorithms.

Agency
National Institute of Health (NIH)
Institute
National Institute of Environmental Health Sciences (NIEHS)
Type
Research Project (R01)
Project #
3R01ES028033-03S1
Application #
10163485
Study Section
Program Officer
Joubert, Bonnie
Project Start
2020-09-18
Project End
2022-11-30
Budget Start
2020-09-18
Budget End
2020-11-30
Support Year
3
Fiscal Year
2020
Total Cost
Indirect Cost
Name
Harvard University
Department
Public Health & Prev Medicine
Type
Schools of Public Health
DUNS #
149617367
City
Boston
State
MA
Country
United States
Zip Code
02115