Text mining applications seek to alleviate the problems with identifying, searching, and extracting relevant information from large sets of literature. The goal of this proposal is to create a text mining application to extract protein point mutations from biomedical literature. The developed application will be used to extract point mutations from literature discussing human genetic disorders, and the retrieved database of point mutations used to design a DNA microarray chip for prenatal genetic diagnosis. First, the text mining application will be developed using machine learning and statistical natural language processing techniques. Second, the point mutation mining application will be applied to a large set of literature related to genetic disorders, and the retrieved point mutations deposited in an electronic database. This collection of point mutations will be examined to find polymorphisms in genes that are markers for genetic disorders. These polymorphic positions in the human genome will be gathered and used to design a DMA microarray chip that can genotype tens of thousands of single nucleotide polymorphisms. The microarray chip can be used for prenatal genetic diagnosis purposes to screen for hundreds of genetic disorders.
Lee, Lawrence C; Horn, Florence; Cohen, Fred E (2007) Automatic extraction of protein point mutations using a graph bigram association. PLoS Comput Biol 3:e16 |