This is an NSF Postdoctoral Research Fellowship in Biology, under the program Broadening Participation of Groups Under-represented in Biology. The fellow, Nuri Theresa Pierce, is conducting research and receiving training that is increasing the participation of groups underrepresented in biology. The fellow is being mentored by C. Titus Brown at the University of California-Davis. The goals of this project are to improve RNA sequencing analysis while teaching workshops to increase accessibility into the field of bioinformatics. Marine microbial diversity remains difficult to study using classic ecological techniques, but is now feasible via next generation sequencing of seawater samples. With increased output of sequencing technologies, there is an urgent need for tools that achieve high accuracy while processing large amounts of data quickly using minimal processing power. This fellow is addressing this issue by developing improved methods for analyzing sequencing datasets while increasing the accessibility and utility of reference data. Access to these new databases and analysis techniques will allow biologists to use sequencing data to tackle larger problems and has the potential to revolutionize our understanding of the diversity and function of ocean communities. Beyond these scientific goals, the fellow is conducting data science training and outreach to expose a wide array of students to bioinformatics research, to help build a more equitable, diverse, and inclusive academe. The fellow is teaching basic coding, a skill that is critical in recruiting the next generation of marine biologists, for whom bioinformatics will be an essential tool.

This project leverages two computational techniques, de bruijn graphs and indexing via Sequence Bloom Trees and Minimum Hashing, to improve RNA-Seq analysis. Aim 1 of the project is to improve genome-wide RNA-Seq (transcriptome) analysis by developing the de bruijn graph, rather than the consensus transcriptome, as the reference for annotation and expression. Computational experiments with Marine Microbial Eukaryotic Transcriptome Sequencing Project data are being used to assess improvements. The fellow is also using the indexing techniques described above to create a reduced representation of existing reference data to facilitate rapid computational querying for sequence similarity (Aim 2) and expression similarity (Aim 3) with new data. Throughout the project, the fellow is receiving training in software development, project management, open science practices, and grant writing to improve her technical and academic skillsets. In addition, the fellow is training a diverse set of students in data intensive biology through development of teaching materials and a series of coding workshops.

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Application #
1711984
Program Officer
Daniel Marenda
Project Start
Project End
Budget Start
2017-09-01
Budget End
2021-08-31
Support Year
Fiscal Year
2017
Total Cost
$207,000
Indirect Cost
Name
Pierce Nuri T
Department
Type
DUNS #
City
La Jolla
State
CA
Country
United States
Zip Code
92037