The goal of this proposal are to determine about 150 kb of DNA sequence spanning 20 or more genes that are known to be involved in cancer, to use the sequence information to help identify 10-15 biallelic variants within the 150 kb segments, and to score these variants on individuals in known pedigrees and in a large random sample to determine haplotypes and allele frequencies. Two rationales given for this study are that such haplotype information will be very valuable for studying the genetics of cancer susceptibility if there are founder alleles that predispose to cancer, and that the approaches used by the investigator's group can serve as a model for similar studies of other disease. In addition, a non-trivial amount of high-quality human genomic sequence data will be obtained at a competitive cost by this project. Particular effort will be taken to avoid duplication of sequencing targets, and various mechanisms for data release and communication with the sequencing community will allow updating and alterations of target priorities to occur during the course of the project. PCR- and hybridization-based methods will be used to identify several BAC clones for the 20 or more gene targets that will be sequenced. Because the number of clones will be relatively small, the investigators will perform fingerprinting and other analyses on the BACs to maximize the chances that the clones they choose to sequence are faithful representations of the genome. These genomic clones will be sequenced by a sequential shotgun method that has been in operation at Baylor for several years, and data will be used to assemble complete, contiguous and annotated sequence of each 150 kb region. This finished sequence information will be used to develop PCR assays that will amplify products from regions spaced throughout the 150 kb, and direct sequence of variants that are identified in this way will be determined, and allele-specific oligonucleotide hybridization will be used to genotype 300 ethnically diverse individuals from various populations as well as individuals in 20 CEPH pedigrees. This information will be used to establish allele frequencies for the polymorphic variants as well as to determine, as much as is possible with the 20 extended families, the phases of the polymorphisms so that haplotypes can be derived. Haplotype information for some of the common homologues will come from the initial sequencing of the large-insert clone, as the phase of the versions of the variants present on a clone are established upon sequencing. The data will be placed in a user-friendly format accessible on the world wide web for use by the research community. The investigators plan to update this public database quarterly.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Research Project (R01)
Project #
5R01CA075432-03
Application #
2896153
Study Section
Genome Study Section (GNM)
Program Officer
Couch, Jennifer A
Project Start
1997-09-22
Project End
2000-08-31
Budget Start
1999-09-01
Budget End
2000-08-31
Support Year
3
Fiscal Year
1999
Total Cost
Indirect Cost
Name
Baylor College of Medicine
Department
Genetics
Type
Schools of Medicine
DUNS #
074615394
City
Houston
State
TX
Country
United States
Zip Code
77030
Peng, Bo; Amos, Christopher I; Kimmel, Marek (2007) Forward-time simulations of human populations with complex diseases. PLoS Genet 3:e47
Peng, Bo; Kimmel, Marek (2007) Simulations provide support for the common disease-common variant hypothesis. Genetics 175:763-76
Gorlov, Ivan P; Kimmel, Marek; Amos, Christopher I (2006) Strength of the purifying selection against different categories of the point mutations in the coding regions of the human genome. Hum Mol Genet 15:1143-50
Polanska, Joanna; Kimmel, Marek (2005) A simple model of linkage disequilibrium and genetic drift in human genomic SNPs: importance of demography and SNP age. Hum Hered 60:181-95
Polanski, A; Kimmel, M (2003) New explicit expressions for relative frequencies of single-nucleotide polymorphisms with application to statistical inference on population growth. Genetics 165:427-36
Polanski, A; Bobrowski, A; Kimmel, M (2003) A note on distributions of times to coalescence, under time-dependent population size. Theor Popul Biol 63:33-40
Bobrowski, Adam; Wang, Ning; Chakraborty, Ranajit et al. (2002) Non-homogeneous infinite sites model under demographic change: mathematical description and asymptotic behavior of pairwise distributions. Math Biosci 175:83-115
Bonnen, Penelope E; Wang, Peggy J; Kimmel, Marek et al. (2002) Haplotype and linkage disequilibrium architecture for human cancer-associated genes. Genome Res 12:1846-53
Olofsson, Peter; Shaw, Chad A (2002) Exact sampling formulas for multi-type Galton-Watson processes. J Math Biol 45:279-93
Trikka, Dimitra; Fang, Zhe; Renwick, Alex et al. (2002) Complex SNP-based haplotypes in three human helicases: implications for cancer association studies. Genome Res 12:627-39

Showing the most recent 10 out of 11 publications