This RFA calls for 30,000 to 300,000 genetic markers, based on single-nucleotide-polymorphisms (SNPs), to be created over the next few years. In response, we propose a pilot program that we believe can be scaled up to a whole-genome level, and which will provide a particularly important category of SNPs: those occurring in cDNA sequences (cSNPs). Many cSNPs can be extracted from the EST databases and we will exploit these as much as possible. However, we will also fill in the gaps in the databases, which come in two forms. For many of the less abundantly expressed genes, not many individuals have been sampled; and many genes are incompletely covered by ESTs. So, we will scan for cSNPs along the entire length of selected cDNA sequences, across a sample of 25 ethnically diverse individuals, deriving the full-length consensus cDNA sequence when one is not already available. Our cSNP discovery process will be sequence-driven. Although this is probably the most expensive approach, it is also the most comprehensive, as it ensures that nearly all common cSNPs will be found. Such a thorough approach is justifiable as most common cSNP are likely to be useful for the population-based association studies that are being planned in the growing efforts to understand genetically-complex diseases. Over the course of this 3 year grant, we will re-sequence 500 genes and create markers for all the common cSNPs that are found in this combination of new sequence data and existing EST data. We will create cDNA resources to better sample the full length of the cDNA sequence. We will evaluate a scoring technology (TDI) that has the potential to be arrayed. And we will develop software to facilitate the process of cSNP discovery, marker creation, and TDI scoring.

Agency
National Institute of Health (NIH)
Institute
National Institute of Environmental Health Sciences (NIEHS)
Type
Research Project (R01)
Project #
5R01ES009909-03
Application #
6178684
Study Section
Special Emphasis Panel (ZHG1-HGR-P (O1))
Program Officer
Velazquez, Jose M
Project Start
1998-09-30
Project End
2002-09-29
Budget Start
2000-09-30
Budget End
2002-09-29
Support Year
3
Fiscal Year
2000
Total Cost
$624,891
Indirect Cost
Name
University of Washington
Department
Biochemistry
Type
Schools of Medicine
DUNS #
605799469
City
Seattle
State
WA
Country
United States
Zip Code
98195
Wong, Gane Ka-Shu; Yang, Zhiyong; Passey, Douglas A et al. (2003) A population threshold for functional polymorphisms. Genome Res 13:1873-9
Wong, Gane Ka-Shu; Wang, Jun; Tao, Lin et al. (2002) Compositional gradients in Gramineae genes. Genome Res 12:851-6
Wang, Jun; Wong, Gane Ka-Shu; Ni, Peixiang et al. (2002) RePS: a sequence assembler that masks exact repeats identified from the shotgun data. Genome Res 12:824-31
Yu, Jun; Hu, Songnian; Wang, Jun et al. (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296:79-92
Yu, Jun; Yang, Zhiyong; Kibukawa, Miho et al. (2002) Minimal introns are not ""junk"". Genome Res 12:1185-9
Wong, G K; Passey, D A; Yu, J (2001) Most of the human genome is transcribed. Genome Res 11:1975-7
Wong, G K; Passey, D A; Huang, Y et al. (2000) Is ""junk"" DNA mostly intron DNA? Genome Res 10:1672-8