Efficient Representation and Manipulation of Large-Scale Biological Sequence Data

Aluru, Srinivas; Schnable, Patrick

Abstract

Storage of biomolecular sequences, and accessing them to determine sequence homologies is central to the current revolution in bioinformatics and computational biology. Besides search tools, the large size of biological data used by some important applications underscores the need for developing efficient out-of-core algorithms. The goal of the project is to design storage structures, algorithmic techniques, and software for disk-resident sequence data, and apply it to important applications in computational biology. To achieve this goal, a three-pronged strategy is used: Firstly, application requirements identified in collaboration with domain experts are being used to design fundamental storage structures for sequence data. This research spans the development of efficient out-of-core algorithms for well-known in-core data structures and also the design of new data structures suitable for targeted applications. Secondly, efficient algorithms for queries on disk-resident sequence data are being developed. Finally, the out-of-core techniques developed are integrated with application software in computational genomics such as EST clustering and fragment assembly. The goal is to develop faster algorithms, reduce the exorbitant main-memory requirements, or enable solution of larger problem instances, as appropriate.

The results of the research will be made accessible to computer scientists in the form of software libraries and molecular biologists in the form of application software. Efforts are being made to integrate the results of this research into popular tools used by molecular biologists. The interdisciplinary nature of the project is providing unique training opportunities for graduate students.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 0430853
Program Officer: Sylvia J. Spengler

Project Start
Project End
Budget Start: 2004-09-01
Budget End: 2008-08-31
Support Year
Fiscal Year: 2004
Total Cost: $440,494
Indirect Cost

Efficient Representation and Manipulation of Large-Scale Biological Sequence Data
Aluru, Srinivas Schnable, Patrick
Iowa State University, Ames, IA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments