The investigator will develop a software tool for assembling DNA fragments generated in megabase- scale shotgun sequencing projects. The software will be tested first on DNA fragments generated by computers from megabase DNA sequences and then on real DNA fragments from large-scale sequencing projects. The software will be freely distributed to nonprofit organizations. The investigator will assist the integration of the software into sequencing environments at genome centers. The objective of this project will be achieved by making two major improvements to a DNA sequence assembly program developed previously. The first improvement is to develop a strategy for solving the problems caused by repetitive sequences. In this strategy, all the fragments from a repetitive sequence are identified, and the uncertainties in assembly of the fragments are resolved using additional information on the fragments that flank copies of the repetitive sequence. The second improvement is to increase the capacity of the assembly program by developing a parallel version of the program in the PVM parallel programming environment on a local network of computers. The investigator will parallelize the two most time-consuming parts of the sequential program, the detection of overlaps among fragments and the construction of fragment alignments for contigs. The parallel sequence assembly program will be able to use the computation power of many computers to assemble tens of thousands of DNA fragments into sequences of low error. The investigator will improve the multiple sequence alignment program by addressing reading frame shifts in comparison of protein, cDNA and genomic DNA sequences.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG001502-02
Application #
2459845
Study Section
Special Emphasis Panel (ZRG2-GNM (03))
Project Start
1996-08-01
Project End
1999-07-31
Budget Start
1997-08-01
Budget End
1998-07-31
Support Year
2
Fiscal Year
1997
Total Cost
Indirect Cost
Name
Michigan Technological University
Department
Engineering (All Types)
Type
Schools of Engineering
DUNS #
065453268
City
Houghton
State
MI
Country
United States
Zip Code
49931
Huang, Xiaoqiu; Brutlag, Douglas L (2007) Dynamic use of multiple parameter sets in sequence alignment. Nucleic Acids Res 35:678-86
Wang, Jianmin; Huang, Xiaoqiu (2005) A method for finding single-nucleotide polymorphisms with allele frequencies in sequences of deep coverage. BMC Bioinformatics 6:220
Ye, Liang; Huang, Xiaoqiu (2005) MAP2: multiple alignment of syntenic genomic sequences. Nucleic Acids Res 33:162-70
Huang, Xiaoqiu; Ye, Liang; Chou, Hui-Hsien et al. (2004) Efficient combination of multiple word models for improved sequence comparison. Bioinformatics 20:2529-33
Huang, Xiaoqiu; Wang, Jianmin; Aluru, Srinivas et al. (2003) PCAP: a whole-genome assembly program. Genome Res 13:2164-70
Lin, Yaw-Ling; Huang, Xiaoqiu; Jiang, Tao et al. (2003) MAVG: locating non-overlapping maximum average segments in a given sequence. Bioinformatics 19:151-2
Huang, Xiaoqiu; Chao, Kun-Mao (2003) A generalized global alignment algorithm. Bioinformatics 19:228-33
Huang, X; Madan, A (1999) CAP3: A DNA sequence assembly program. Genome Res 9:868-77
Huang, X; Adams, M D; Zhou, H et al. (1997) A tool for analyzing and annotating genomic sequences. Genomics 46:37-45