The investigator will develop a software tool for assembling DNA fragments generated in megabase- scale shotgun sequencing projects. The software will be tested first on DNA fragments generated by computers from megabase DNA sequences and then on real DNA fragments from large-scale sequencing projects. The software will be freely distributed to nonprofit organizations. The investigator will assist the integration of the software into sequencing environments at genome centers. The objective of this project will be achieved by making two major improvements to a DNA sequence assembly program developed previously. The first improvement is to develop a strategy for solving the problems caused by repetitive sequences. In this strategy, all the fragments from a repetitive sequence are identified, and the uncertainties in assembly of the fragments are resolved using additional information on the fragments that flank copies of the repetitive sequence. The second improvement is to increase the capacity of the assembly program by developing a parallel version of the program in the PVM parallel programming environment on a local network of computers. The investigator will parallelize the two most time-consuming parts of the sequential program, the detection of overlaps among fragments and the construction of fragment alignments for contigs. The parallel sequence assembly program will be able to use the computation power of many computers to assemble tens of thousands of DNA fragments into sequences of low error. The investigator will improve the multiple sequence alignment program by addressing reading frame shifts in comparison of protein, cDNA and genomic DNA sequences.
Huang, Xiaoqiu; Brutlag, Douglas L (2007) Dynamic use of multiple parameter sets in sequence alignment. Nucleic Acids Res 35:678-86 |
Wang, Jianmin; Huang, Xiaoqiu (2005) A method for finding single-nucleotide polymorphisms with allele frequencies in sequences of deep coverage. BMC Bioinformatics 6:220 |
Ye, Liang; Huang, Xiaoqiu (2005) MAP2: multiple alignment of syntenic genomic sequences. Nucleic Acids Res 33:162-70 |
Huang, Xiaoqiu; Ye, Liang; Chou, Hui-Hsien et al. (2004) Efficient combination of multiple word models for improved sequence comparison. Bioinformatics 20:2529-33 |
Lin, Yaw-Ling; Huang, Xiaoqiu; Jiang, Tao et al. (2003) MAVG: locating non-overlapping maximum average segments in a given sequence. Bioinformatics 19:151-2 |
Huang, Xiaoqiu; Chao, Kun-Mao (2003) A generalized global alignment algorithm. Bioinformatics 19:228-33 |
Huang, Xiaoqiu; Wang, Jianmin; Aluru, Srinivas et al. (2003) PCAP: a whole-genome assembly program. Genome Res 13:2164-70 |
Huang, X; Madan, A (1999) CAP3: A DNA sequence assembly program. Genome Res 9:868-77 |
Huang, X; Adams, M D; Zhou, H et al. (1997) A tool for analyzing and annotating genomic sequences. Genomics 46:37-45 |