To date, we have tested numerous single nucleotide polymorphisms (SNPs) in well over 118 genes, including some of the less established but intriguing candidates such as PRODH, RGS4, CHRNA7, PIP5K2A, and PPP3CC. Among our accomplishments, we have fully sequenced the 10 exons and 2.5 kb of the 5 flanking sequence of 180 proband chromosomes for dysbindin, sequenced two exons of MRDS1, and sequenced 1.5 kb of the GAD1 upstream region. A total of 21 new SNPs were discovered in these genes, 15 of which were genotyped in the clinical samples. Last year, we re-sequenced the exons and splice sites of GRM3 in 180 chromosomes, which led to the discovery of a few rare SNPs. We have likewise resequenced risk regions of KCNH2, ErbB4, PIk3d, FGF20, DAARP, and COMT and identified novel variants in these genes as well. We routinely submit our Taqman genotype assay to reproducibility checks by re-genotyping (avg. accuracy >99%) and spot accuracy checks done by double stranded sequencing (avg. >99% for most SNP assays). Genotypes are called manually within the ABI SDS software and confirmed. We perform Mendelian checks and higher order (e.g. multiple recombinant) error checking with the program MERLIN. Microsatellite genotyping has been performed in collaboration with the NIMH Mood and Anxiety Program. ? ? We measure linkage disequilibrium (LD) between markers with the D' and r2 statistics from cases and controls in parallel using the GOLD software package. All SNPs are tested for departures from Hardy-Weinberg equilibrium. For large numbers of loci, we use SNPHAP to reconstruct haplotypes and estimate their frequencies in unrelated individuals. For family based association studies of the discrete clinical phenotype, we use the programs FBAT, TDTPHASE and TRANSMIT for unknown phase haplotype estimation. Case-control analysis of individual SNPs and SNP haplotypes is done using logistic regression in STATA and COCAPHASE. All P values are computed empirically with 10,000 permutations or bootstraps as the programs provide. Tests of association to quantitative traits such as the intermediate phenotypes are performed by the FBAT and QTDT, which allows variance-components testing of family-based samples for association and transmission disequilibrium. The orthogonal model used is robust to population stratification because, analogous to the conventional TDT, it only considers transmissions from heterozygous parents. To control for possible artifacts due to allele frequency differences across ethnic groups, analysis limited to Caucasians is performed in parallel. We have also established a panel unlinked SNPs to use as a potential genomic control panel for case control association studies, including intermediate phenotype analyses, to address potential population admixture artifacts. ? ? In our genomics project we acquire extensive genetic variation data in our susceptibility genes and complete the catalog of genetic risk genes in our datasets. As part of the GCAP program, we have greatly increased the genotyping throughput by outsourcing. We project that about every 4 months for the next 2 years we will genotype a minimum of 768 SNPs, perform follow-up work on established genes and test novel genes. In addition, we outsource the majority of re-sequencing for SNP detection to DNA sequencing companies. All exons, splice sites, and 10 kb of 5 upstream region will be re-sequenced in an initial pass, then some regions of some genes are sequenced further (e.g. the introns or positive haplotypes) and/or more individuals. Because most functional SNPs and mutations are not in protein coding regions, it is critical to fully characterize transcripts species in several regions of post mortem human brain. To accomplish this, we routinely execute basic mRNA transcript characterization technologies such as 5' and 3' RACE and screening of full-length, normalized cDNA libraries from multiple brain regions. This work also serves to guide quantitative RT-PCR and in situ hybridization expression studies. ? ? Another project of central importance is the statistical analyses of gene-gene interactions. It is likely that certain gene and allele combinations interact epistatically to produce risk greater than that predicted by the individual odds ratios. It is also likely that some gene combinations will increase risk even in the absence of main effects in each gene. We are using the data driven analytic approach developed at Vanderbilt called multifactor dimensionality reduction (MDR) in an attempt to detect sets of interacting alleles that predict disease status. We also engage in collaborative discussions with Salford Systems, originator of the programs CART, MARS, and TREENET, to explore and execute other data mining strategies. Our statistical geneticist uses the wealth of data to model and test complex gene-gene and gene-environmental interactions, and establish some objective criteria for integrating statistical genetic (disease and intermediate phenotype) data with convergent biological data both to gauge overall significance of given genotype/haplotype: phenotype correlations and to evaluate attributable risk.
Showing the most recent 10 out of 17 publications