Text mining applications seek to alleviate the problems with identifying, searching, and extracting relevant information from large sets of literature. The goal of this proposal is to create a text mining application to extract protein point mutations from biomedical literature. The developed application will be used to extract point mutations from literature discussing human genetic disorders, and the retrieved database of point mutations used to design a DMA microarray chip for prenatal genetic diagnosis. First, the text mining application will be developed using machine learning and statistical natural language processing techniques. Second, the point mutation mining application will be applied to a large set of literature related to genetic disorders, and the retrieved point mutations deposited in an electronic database. This collection of point mutations will be examined to find polymorphisms in genes that are markers for genetic disorders. These polymorphic positions in the human genome will be gathered and used to design a DMA microarray chip that can genotype tens of thousands of single nucelotide polymorphisms. The microarray chip can be used for prenatal genetic diagnosis purposes to screen for hundreds of genetic disorders.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Medical Informatics Fellowships (F37)
Project #
5F37LM008883-02
Application #
7156197
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
2005-10-01
Project End
2007-09-30
Budget Start
2006-10-01
Budget End
2007-09-30
Support Year
2
Fiscal Year
2007
Total Cost
$45,812
Indirect Cost
Name
University of California San Francisco
Department
Pharmacology
Type
Schools of Medicine
DUNS #
094878337
City
San Francisco
State
CA
Country
United States
Zip Code
94143
Lee, Lawrence C; Horn, Florence; Cohen, Fred E (2007) Automatic extraction of protein point mutations using a graph bigram association. PLoS Comput Biol 3:e16