It is proposed that an integrated mapping and sequencing strategy be developed and implemented to more efficiently sequence cloned genomic DNA. The assembly of a given sequencing project will be based on several data sets, including restriction digest map data and dual-end sequence data from random plasmid subclones, as well as random M13 subclone sequence data. Initially, this integrated strategy will be tested using computer simulations of cosmids which have already been completed. Subsequently, methods to generate restriction enzyme map data will be investigated, and computer programs which generate maps of overlapping subclones will be tested. This strategy will be applied to sequence several cosmid and human cosmid DNA clones, and through collaboration with the informatics group at the Genome Sequencing Center, a modified sequence assembly program will be developed to utilize both relational and sequence data as assembly constraints. The relational map data will provide indirect assembly verification, which can be especially useful for sequences containing inverted or tandem repeat elements. In addition to reducing the number of clones necessary to complete a sequencing project, this information will aid in the selection of strategies and templates for gap closure. Since random plasmid subclone libraries do not show the same bias as random M13 subclone libraries against inverted repeat loop sequences, an additional advantage is that this strategy would provide mapped plasmid DNA templates in regions of inverted repeat elements, which would allow more efficient gap closure. Since the sequence database would contain both relational and sequence data, the ultimate goal of this research is to enable fully automated sequence closure. Finally, this method will be used to determine the feasibility of YAC-based sequence assembly in region of the C. elegans genome map which lacks cosmid coverage. If successful, this strategy will be applied to sequence other C. elegans YAC DNA as well as large insert human DNA to serve as a model for the sequence assembly of large regions of genomic DNA.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Postdoctoral Individual National Research Service Award (F32)
Project #
1F32HG000138-01
Application #
2208551
Study Section
Special Emphasis Panel (ZRG2-GNM (02))
Project Start
1995-08-31
Project End
Budget Start
1995-03-01
Budget End
1996-02-29
Support Year
1
Fiscal Year
1995
Total Cost
Indirect Cost
Name
Washington University
Department
Genetics
Type
Schools of Medicine
DUNS #
062761671
City
Saint Louis
State
MO
Country
United States
Zip Code
63130