The major objective of this project is to develop algorithms and software for performing automated annotation of the coding regions of a large genomic DNA sequence. The investigators will improve an analysis and annotation tool (AAT) that uses fast database searching and rigorous alignment to locate exons of the genomic sequence and to define intron-exon boundaries. The new annotation software will be developed by integrating the improved AAT tool with gene prediction programs. The annotation software assembles exons produced by the improved AAT tool and exons predicted by the gene prediction programs into gene structures. Some of the exons produced by the improved AAT are used as constraints in the assembly. Another goal of this project is to develop a rigorous program for producing an optimal alignment between two DNA sequences. A novel feature about the program is that the coding frame information will be incorporated into the alignment model. An optimal alignment between two DNA sequences produced by the program shows the correspondence of the codons of the sequences. Thus, the alignment is also meaningful when the codons are translated into amino acids.
|Wang, Jianmin; Huang, Xiaoqiu (2005) A method for finding single-nucleotide polymorphisms with allele frequencies in sequences of deep coverage. BMC Bioinformatics 6:220|
|Ye, Liang; Huang, Xiaoqiu (2005) MAP2: multiple alignment of syntenic genomic sequences. Nucleic Acids Res 33:162-70|
|Huang, Xiaoqiu; Ye, Liang; Chou, Hui-Hsien et al. (2004) Efficient combination of multiple word models for improved sequence comparison. Bioinformatics 20:2529-33|
|Huang, Xiaoqiu; Wang, Jianmin; Aluru, Srinivas et al. (2003) PCAP: a whole-genome assembly program. Genome Res 13:2164-70|
|Huang, Xiaoqiu; Chao, Kun-Mao (2003) A generalized global alignment algorithm. Bioinformatics 19:228-33|