This proposal seeks to obtain funding to acquire computing infrastructure to perform cross-disciplinary research in artificial intelligence, natural language processing, bioinformatics, social networks and related fields. In particular, the projects that will be supported by the infrastructure acquired include developing algorithms and architectures for large-scale ontology alignment, particularly in the biomedical domain; mining Wikipedia and similar Web-based sources for geographical, temporal and ontological information; performing named entity recognition in biomedical and other domains; performing computational motivational and content analysis of socially generated content such a blogs and micro-blogs; undertaking corpus-based computational linguistics research in under-studied and possibly endangered languages from India and other locations; developing better-performing algorithms for gene expression analysis; and developing, implementing and comparing algorithms for protein structure prediction, particularly proteins that contain coiled-coil structures. These projects deal with large amounts of data and information and processing such data and information requires large amounts of computing power. Our proposal seeks to acquire adequate and flexible computing hardware to facilitate problem-solving in these and other areas, so that current and future problems can be solved felicitously.

The infrastructure acquired will enable cutting-edge research by Ph.D., Masters and undergraduate students, including REU site students, in a variety of cross-disciplinary topics that employ ideas and innovations in artificial intelligence, machine learning and information retrieval. For example, the results of our research may enable creations of systems that discover overlaps and matches among large medical ontologies so that painstakingly created domain-specific information can be fused, compared and utilized better; and may assist in creating programs that assist in automatically understanding and/or visualizing content of socially-generated Websites such as Wikipedia and Twitter.

For further information see the project web site at the URL: www.cs.uccs.edu/~kalita.

Project Report

The RUI grant enabled the PI to acquire infrastructure to establish a new lab called the LINC (Language, Information and Computation) Lab. The College of Engineering and Applied Science at the University of Colorado, Colorado Springs provided about 400 square feet of space to create this lab. It met a crucial need for the fast-growing campus and the college, which is transitioning from pure teaching to research. Thus, the RUI grant has been of great impact institutionally in this transition process. With the grant, the newly established lab has bought several mid-range virtualization servers, each with 4 Six Core AMD Opteron 2.6 GHz processors, 64 GB of RAM and mirrored 80 GB hard drive; 5 general servers with 2 Quad Core IntelXeon 2.83 GHz processors 16 GB of RAM and a mirrored 500 GB hard drive; and 43 TB of unified storage. The equipment has been used by graduate and undergraduate students for ?research in computational linguistics, information retrieval and machine learning. Some of the projects for which the machines have been used are listed below.??The lab has been used for a multitude or projects, which would not have been possible without it. Here are some examples: 1. Development and testing of machine learning algorithms such as Multi-Class SVM classifiers, and Reducing training time for linear SVMs; 2. Information Retrieval research such as Streaming trend detection in Twitter, Summarizing Twitter posts, Syntactic normalization of Twitter messages, and Extracting event and temporal information from Wikipedia and news articles; 3. Natural Language Processing such as Aligning Wiktionary with natural language processing resources, Combining lexical resources for text analysis of game walkthroughs, and Creating multi-lingual dictionaries from existing online dictionaries and web corpora. Findings: Some of the findings include the following:??1. We have developed our own algorithms for summarizing microblog posts and compared them with a large number of existing multi-document summarization algorithms and ?found that really simple algorithms work as well as complex algorithms in the domain of short informal messages.??2. We have developed an algorithm for summarizing articles with high temporal content such as historical or some news articles using timestamps and temporal clustering 3. We have developed genetic algorithms for developing optimized soft keyboards for 8 Indian languages (Assamese, Bengali, Hindi, Gujarati, Punjabi, Telugu, Oriya and Kannads) and English. We also developed these keyboards for mobile devices as well. 4. We hypothesized that Twitter language can be considered a dialect of English and translation tools can be used to normalize the syntax of Twitter messages so that tools developed for natural language processing can be used to further process them. We were able to show that this hypothesis works well in building a syntax normalizer.?5. We have developed a couple of algorithms for multi-class SVM training that scale up better than recently published algorithms and perform as well as these other algorithms. ??6. We have developed an ontology alignment tool for biomedical ontologies that perform better than recently published algorithms. Training and Development: Prior to our buying the equipment provided for by this grant, there was no AI Lab at the University of Colorado, Colorado Springs. It was equipped with 5-10 year old machines retired from teaching labs when necessary. This grant has helped modernize our equipment. 24 NSF-funded REU student and at least another dozen students at undergraduate, MS and PhD levels have utilized the resources of lab during the lifetime of the project.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Type
Standard Grant (Standard)
Application #
0958576
Program Officer
Vijayalakshmi Atluri
Project Start
Project End
Budget Start
2010-05-01
Budget End
2012-04-30
Support Year
Fiscal Year
2009
Total Cost
$195,051
Indirect Cost
Name
University of Colorado at Colorado Springs
Department
Type
DUNS #
City
Colorado Springs
State
CO
Country
United States
Zip Code
80918