The African malaria mosquito Anopheles gambiae, because of its epidemiological importance, was the first disease vector, with a genome sequenced in 2002. Since then the PEST strain assembly remains the only available chromosome-level genome reference for this major African malaria vector. Although this assembly has been the workhorse for functional and population genomic studies of malaria mosquitoes for almost two decades, it is now failing to deliver the highest possible quality of the analyses as it is staggeringly imperfect by the modern standards. The assembly has serious deficiencies such as a large portion of unmapped contigs, sequencing and physical gaps, incorrect order and orientation of some scaffolds, and the presence of haplotypes derived from the sister species An. coluzzii. Moreover, the PEST strain of An. gambiae is no longer available and the existing assembly cannot be validated or improved with additional sequencing. As a result, a complete annotation and an accurate functional characterization of the An. gambiae genome cannot be performed. Also, the lack of a reliable reference represents a major impediment to population genomics studies, especially to those dealing with structural genomic variations. For a long time, the high cost of sequencing and the sheer difficulty of genome assembly has made major improvements of the mosquito genome prohibitive. Novel long-read sequencing technologies and innovative scaffolding approaches now allow developing de novo chromosome-level genome assemblies of superior quality at a reasonable cost. Also, the availability of polytene chromosomes ensures high- resolution genome mapping in An. gambiae. The main goal of this R21 project is to develop a chromosome-level genome assembly and to explore the structural genomic variations in the An. gambiae complex. This timely project will meet the demand for a new highly-finished genome assembly for the major African malaria vector based on the appropriate innovative tools and expertise of the PI and Co-I. Briefly, the project?s specific aims are to (1) Obtain a contiguous genome assembly for An. gambiae using Oxford Nanopore, Illumina sequencing, and chromosome-scale Hi-C scaffolding; (2) Validate the obtained assembly and construct a high-resolution physical genome map for An. gambiae using fluorescence in situ hybridization; (3) Characterize structural genomic variations in the An. gambiae complex. A new chromosome-level genome assembly for An. gambiae will transform research as it will allow the most complete functional annotation and the most detailed population analysis of malaria mosquitoes. The more complete assembly of heterochromatic sequences will improve our understanding of the genomic ?dark matter? and will stimulate epigenomic studies of this disease vector. The scientific community will have free access to the new assembly from VEuPathDB and NCBI.

Public Health Relevance

An important requirement for the success of genome-based strategies to malaria vector control is the availability of high-quality assemblies of mosquito genomes, which will enable researchers to identify gene targets that could be manipulated. Toward this goal, the proposed project will produce a chromosome-level genome assembly for Anopheles gambiae using Oxford Nanopore and Illumina sequencing, Hi-C scaffolding, and physical mapping.

National Institute of Health (NIH)
National Institute of Allergy and Infectious Diseases (NIAID)
Exploratory/Developmental Grants (R21)
Project #
Application #
Study Section
Vector Biology Study Section (VB)
Program Officer
Costero-Saint Denis, Adriana
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Virginia Polytechnic Institute and State University
Earth Sciences/Resources
United States
Zip Code