The goal of this proposal are to determine about 150 kb of DNA sequence spanning 20 or more genes that are known to be involved in cancer, to use the sequence information to help identify 10-15 biallelic variants within the 150 kb segments, and to score these variants on individuals in known pedigrees and in a large random sample to determine haplotypes and allele frequencies. Two rationales given for this study are that such haplotype information will be very valuable for studying the genetics of cancer susceptibility if there are founder alleles that predispose to cancer, and that the approaches used by the investigator's group can serve as a model for similar studies of other disease. In addition, a non-trivial amount of high-quality human genomic sequence data will be obtained at a competitive cost by this project. Particular effort will be taken to avoid duplication of sequencing targets, and various mechanisms for data release and communication with the sequencing community will allow updating and alterations of target priorities to occur during the course of the project. PCR- and hybridization-based methods will be used to identify several BAC clones for the 20 or more gene targets that will be sequenced. Because the number of clones will be relatively small, the investigators will perform fingerprinting and other analyses on the BACs to maximize the chances that the clones they choose to sequence are faithful representations of the genome. These genomic clones will be sequenced by a sequential shotgun method that has been in operation at Baylor for several years, and data will be used to assemble complete, contiguous and annotated sequence of each 150 kb region. This finished sequence information will be used to develop PCR assays that will amplify products from regions spaced throughout the 150 kb, and direct sequence of variants that are identified in this way will be determined, and allele-specific oligonucleotide hybridization will be used to genotype 300 ethnically diverse individuals from various populations as well as individuals in 20 CEPH pedigrees. This information will be used to establish allele frequencies for the polymorphic variants as well as to determine, as much as is possible with the 20 extended families, the phases of the polymorphisms so that haplotypes can be derived. Haplotype information for some of the common homologues will come from the initial sequencing of the large-insert clone, as the phase of the versions of the variants present on a clone are established upon sequencing. The data will be placed in a user-friendly format accessible on the world wide web for use by the research community. The investigators plan to update this public database quarterly.
Showing the most recent 10 out of 11 publications