Completing the sequence of the human genome will involve very high throughput sequencing centers that differ quantitatively and qualitatively from even the largest of today's """"""""large scale"""""""" sequencing laboratories. Increasing the size and throughput of a center by more than an order magnitude will substantially alter computing support needs with new requirements for adaptability and reliability as well as overall capacity. This project will examine these scaling issues in detail and will design an informatics architecture to support very high throughput sequencing. The cost of DNA sequence analysis is highly dependent on the accuracy required of the finished data. Models will be developed to predict expected error rates for different sequencing strategies and to assess the impact of likely error rates on data utility. The Genome Sequencing Center, at Washington University, will be used as a case study and test- bed. The objectives of this proposal are:
Specific Aim l. Designing a modular architecture to support very high throughput sequencing Specific Aim 2. Understanding the determinants of DNA sequence accuracy Specific Aim 3. Analyzing error prone DNA sequence a.Improved tools for the analysis of error prone DNA sequence b.Understanding the utility of error prone DNA sequence Specific Aim 4. Automating accurate consensus sequence generation Specific Aim 5. Developing quality assurance procedures for very high throughput DNA sequencing Specific Aim 6. Disseminating results

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG001391-03
Application #
2459840
Study Section
Special Emphasis Panel (SRC (04))
Project Start
1995-09-30
Project End
1999-07-31
Budget Start
1997-08-01
Budget End
1999-07-31
Support Year
3
Fiscal Year
1997
Total Cost
Indirect Cost
Name
Washington University
Department
Type
Other Domestic Higher Education
DUNS #
062761671
City
Saint Louis
State
MO
Country
United States
Zip Code
63130
Liu, Rongxiang; McEachin, Richard C; States, David J (2003) Computationally identifying novel NF-kappa B-regulated immune genes in the human genome. Genome Res 13:654-61
Rouchka, Eric C; Gish, Warren; States, David J (2002) Comparison of whole genome assemblies of the human genome. Nucleic Acids Res 30:5004-14
Liu, Rongxiang; States, David J (2002) Consensus promoter identification in the human genome utilizing expressed gene markers and gene modeling. Genome Res 12:462-9
Kan, Zhengyan; States, David; Gish, Warren (2002) Selecting for functional alternative splices in ESTs. Genome Res 12:1837-45
States, D J; Nowotny, V; Blackwell, T W (2001) Probabilistic approaches to the use of higher order clone relationships in physical map assembly. Bioinformatics 17 Suppl 1:S262-9
Kan, Z; Rouchka, E C; Gish, W R et al. (2001) Gene structure prediction and alternative splicing analysis using genomically aligned ESTs. Genome Res 11:889-900
Liu, R; Blackwell, T W; States, D J (2001) Conformational model for binding site recognition by the E.coli MetJ transcription factor. Bioinformatics 17:622-33
Kan, Z; Gish, W; Rouchka, E et al. (2000) UTR reconstruction and analysis using genomically aligned EST sequences. Proc Int Conf Intell Syst Mol Biol 8:218-27
Blackwell, T W; Rouchka, E; States, D J (1999) Identity by descent genome segmentation based on single nucleotide polymorphism distributions. Proc Int Conf Intell Syst Mol Biol :54-9
Huang, W; Fuhrmann, D R; Politte, D G et al. (1998) Filter matrix estimation in automated DNA sequencing. IEEE Trans Biomed Eng 45:422-8

Showing the most recent 10 out of 12 publications