Even though computational scientists have high-speed computers and powerful software tools at their disposal, their ability to make scientific discovery is held back because much of their time is inefficiently spent in data exploration, manipulation and visualization, preliminary analyses, and hit-and-miss attempts to find the right settings for their software tools to be used effectively, rather than in running their computations at scale. For example, researchers doing gene sequencing (a process in which the individual base nucleotides in an organism's DNA are identified) have to figure what sequencing method to use, which software package - out of several - to use for that sequencing method, what settings to use for the software package, and so on. Researchers would greatly benefit by learning from prior use of the software tools at their disposal, that is, from past experiences and mistakes, both their own, and those of other researchers in the field. However, only a small percentage of them are able to do so. This exploratory project will investigate whether effective, intelligent guidance can be extracted from past experience on specific applications using artificial intelligence techniques and then provided in a tailored manner to the individual researcher. This will be done within the software tools being used by the researcher, and provided by the underlying cyberinfrastructure itself, so that anyone who uses these tools can benefit. This project will explore the feasibility of AI-based approaches for providing such assistance, and prototype, pilot and validate a range of capabilities, and widely disseminate results.

Current research and development in cyberinfrastructure(CI) for science focuses on scaling the CI: developing algorithms that scale better, automating pipelines, building scalable computational and data systems, and optimizing system resource allocation. This project, on the other hand, focuses on scaling the individual researcher, i.e. making her more effective, through the application of a portfolio of techniques, from observational studies of scientist-CI interactions, to end-to-end instrumentation for recording and tracking these interactions across the CI, building machine-learned models from these interactions, and embedding and experimenting with these models within human-machine teaming paradigms. These techniques will be given the best chance to succeed by applying them in a methodological and technical framework that focuses on specific applications. The proposed methods will vary participant expertise, test hypotheses of performance, and use transparent ?guide me? methodologies to establish dimensions of individual differences in designing guidance. The exemplar domain science is genomics, where the goal is to sequence and assemble the complete DNA of selected species, and where the work in this project could be particularly transformative. The project team is integrative, interdisciplinary and convergent; the investigators have expertise in genomics, software engineering, systems, data science, project management, and human-machine systems, and are working with a key CI provider in the Ohio Supercomputer Center, and collectively advising a small team of graduate students. The project is aligned with the NSF Big Ideas of Harnessing the Data Revolution, Growing Convergence Research and the Future of Work, and the Office of Advanced Infrastructure criteria for software cyberinfrastructure, since it is domain and computer sciences-driven, innovative, collaborative and convergent, strategically managed, and building on significant prior investments by NSF - a clear path to sustainability. This work could make computational scientists from many science domains transformatively more productive, leading to accelerated discovery. Additional broader impacts are through educational case-studies for computational science, contributions to instrumentation standards, and observational and empirical study methods for CI. A side contribution of this work will be in assessing the usability of CI tools . This will directly enable tool designers to build more usable tools. Broadening participation has been emphasized: one of the principal investigators is a woman. All three PIs have a history of recruiting and working with women students. At least one of the funded graduate students will be an incoming female student. The project team will additionally be closely collaborating with two women students and a woman post-doctoral researcher in the genomics laboratory.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Advanced CyberInfrastructure (ACI)
Type
Standard Grant (Standard)
Application #
1945347
Program Officer
Seung-Jong Park
Project Start
Project End
Budget Start
2020-01-01
Budget End
2021-12-31
Support Year
Fiscal Year
2019
Total Cost
$299,687
Indirect Cost
Name
Ohio State University
Department
Type
DUNS #
City
Columbus
State
OH
Country
United States
Zip Code
43210