This action funds an NSF National Plant Genome Initiative Postdoctoral Research Fellowship in Biology for FY 2020. The fellowship supports a research and training plan in a host laboratory for the Fellow who also presents a plan to broaden participation in biology. The title of the research and training plan for this fellowship to Dr. Andy Zhou is "Building quantitative models for plant transcription using convolutional neural networks for de novo promoter design for crop plant engineering". The host institutions for the fellowship are the Joint Bioenergy Institute, the Joint Genome Institute, and the University of California, Davis. The sponsoring scientists are Dr. Patrick M. Shih and Dr. Ronan C. O?Malley.

Uncovering the inner workings of plants through DNA and RNA sequencing can provide valuable insight on how to better engineer crop plants and is essential in meeting the increasing societal demand for renewable energy and environmental sustainability. However, these sequencing datasets provide a myriad of analyzable patterns and are often difficult to decipher without sophisticated computational analysis. The project will develop machine-learning models for plant biology and apply these models directly to develop agricultural biotechnology. These technologies will assist in enhancing the economic value of crop plants and benefitting society through innovating food, fuel, or chemical production using plants. The project emphasizes training of the fellow in scientific communication of findings to the general public by creating an interactive website to explore the project findings. The fellow will engage in mentorship roles to garner interest in scientific careers for local high school and college students of diverse and underrepresented backgrounds.

The project aims to develop state-of-the-art machine learning models for transcription initiation and chromatin accessibility in both model and crop plant systems in order to redefine core mechanisms of plant gene expression and circumvent the challenges facing engineering plant systems. The project leverages Assay for Transposase Accessible Chromatin (ATAC) and Capped Analysis of Gene Expression (CAGE) sequencing data to train deep convolutional neural networks to predict features important to gene expression in Arabidopsis and tomato. These machine learning models will be utilized for finding open chromatin to facilitate transgenic crop transformation, tunable promoters for transgene expression, and guide the design of de novo DNA parts for synthetic biology in model and crop plant systems. All sequencing data generated will be hosted on publicly-available repositories such as NCBI. Analysis code and machine learning models will be uploaded to GitHub and Kipoi model zoo for genomics.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Integrative Organismal Systems (IOS)
Application #
2009093
Program Officer
Gerald Schoenknecht
Project Start
Project End
Budget Start
2020-11-01
Budget End
2023-10-31
Support Year
Fiscal Year
2020
Total Cost
$216,000
Indirect Cost
Name
Zhou, Andy
Department
Type
DUNS #
City
Albany
State
CA
Country
United States
Zip Code
94706