Methods 1) Software Development Because the non-independence of marker data is particularly relevant in next generation sequencing data, much of the theoretical work during the past year has focused on the testing, implementation and extension of Tiled regression, a linear regression based method for intra-familial tests of association that address non-independence both at the marker and observational level. Extensions implemented during the past year have focused on the incorporation of penalized regression methods and the use of simulation experiments to test the statistical properties of these methods when compared to the use of stepwise regression. The tiled regression methodology has been implemented in the Tiled Regression Analysis Package (TRAP); version 2.0 of the software includes additional penalized regression models and will be released in September, 2015.The package is freely available on the NHGRI website: http://research.nhgri.nih.gov/software/TRAP. 2) Inflated Type I error rate and non-normally distributed traits In this study, the effects of the minor allele frequency of the single nucleotide variant (SNV), the degree of departure from normality of the trait, and the position of the SNVs on type I error rates were investigated in the Genetic Analysis Workshop (GAW) 19 whole exome sequence data. To test the distribution of the type I error rate, five simulated traits were considered: standard normal and gamma distributed traits; two transformed versions of the gamma trait (log10 and rank-based inverse normal transformations); and trait Q1 provided by GAW 19. Tests of association were performed with standard linear regression and average type I error rates were determined for minor allele frequency classes. Rare SNVs (minor allele frequency < 0.05) showed inflated type I error rates for nonnormally distributed traits that increased as the minor allele frequency decreased. The inflation of average type I error rates increased as the significance threshold decreased. Normally distributed traits did not show inflated type I error rates with respect to the minor allele frequency for rare SNVs. There was no consistent effect of transformation on the uniformity of the distribution of the location of SNVs with a type I error Schwantes-An et al. BMC Proceed 2015, in press. Collaborations 1) Clinical characterization of NF1 (Dr. Douglas Stewart, NIH/NCI) Neurofibromatosis type 1 (NF1) is an autosomal dominant disorder of neuro-cutaneous tissue growth. Phenotype complexity in NF1 is hypothesized to derive from genetic modifiers unlinked to the NF1 locus. In this study, gene expression is hypothesized to confer risk for certain phenotypes in NF1. Gene expression in lymphoblastoid cell lines was tested for association of NF1- associated phenotypes and sequenced select genes with significant phenotype/expression correlations in a set of 79 individuals with NF1. Associations in sequence variants of these genes were tested with cafe-au-lait macule (CALM) count in a discovery cohort of 89 European-Americans with NF1. Two correlated, common SNPs (rs4660761 and rs7161 located between DPH2 and ATP6V0B) were significantly associated with the CALM count. Analysis with tiled regression (see methods development above) also found rs4660761 to be significantly associated with CALM count. In addition, rs1800934 and 12 other rare variants in the mismatch repair gene MSH6 were also associated with CALM count. In a mega-analysis of a combined cohort of 180 European-Americans, both rs7161 and rs4660761 were highly significant Pemov et al. PLOS Genet 2014. 2) Craniosynostosis Justice et al. Nat Genet 2012 as part of a long-term collaboration with Dr. Simeon (Boyd) Boyadjiev at UC Davis, reported a genome-wide association study (GWAS) for non-syndromic sagittal craniosynostosis and these associations were replicated in an independent Caucasian population of 186 unrelated probands with non-syndromic sagittal craniosynostosis and 564 unaffected controls. During the past year, zebra fish were used to test the expression of the previously identified conserved non-coding regulatory elements in order to determine if the expression of identified sequence variants differed from that of the wild type expression. To accomplish this, a putative regulatory element was created with site-directed mutagenesis and inserted into the Zebra fish Enhancer Detection (ZED) vector construct. The embryos were screened with fluorescent microscopy for red and green florescent protein (RFP and GFP, respectively) positive embryos. Embryos demonstrating RFP/GRP expression were grown to adulthood and bred with wild type fish. Several germline transmitting founders were identified for each ZED vector construct and their progeny were screened for patterns of RFP/GFP expression, again using fluorescent microscopy. The variant showed substantially enhanced GFP expression in the mid-brain, when compared to wild type expression Justice et al., 2015 submitted. 3) Variation in metabolites in the Irish Trinity Student Study (Dr. Larry Brody, NHGRI) Two studies were completed as part of a large collaboration with Dr. Larry Brody in the analysis of metabolic data from the Irish Trinity Study data. In the first, SNPs in candidate genes thought to influence vitamin B-6 metabolites were tested for associations with measured B-6 metabolites. Plasma PLP, PL, and PA was measured and 66 SNPs were genotyped. Seventeen SNPs in ALPL were associated with altered plasma PLP in candidate gene analyses. Five additional SNPs in ALPL were associated with altered plasma PLP. The association of the minor CC genotype of 1 ALPL SNP, rs1256341, with reduced ALPL expression in the HapMap Northern European ancestry population was consistent with the positive association between the CC genotype and plasma PLP in this study. No SNP was associated with altered plasma PL or PA Carter et al. J Nutr 2015. In the second, multivariate functional linear models were developed to connect genetic variant data to multiple quantitative traits adjusting for covariates for a unified analysis (Drs. Wang and Fan). Three types of approximate F-distribution tests based on PillaiBartlett trace, HotellingLawley trace, and Wilkss Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants in one genetic region. The approximate F-distribution tests provided more significant results than those of F-tests of univariate analysis and optimal sequence kernel association test (SKAT-O). Extensive simulations were performed to evaluate the false positive rates and power performance of the proposed models and tests. The approximate F-distribution tests were shown to control the type I error rates very well. The proposed methods were applied to four lipid traits in eight European cohorts, and three biochemical traits in the Trinity Students Study. The approximate F-distribution tests provide more significant results than those of the univaritate F-tests and SKAT-O for the three biochemical traits. The approximate F-distribution tests of the proposed functional linear models were more sensitive than those of the traditional multivariate linear models that in turn are more sensitive than SKAT-O in the univariate case. The analysis of the four lipid traits and the three biochemical traits detects more association than SKAT-O in the univariate case Wang et al. Genet Epidemiol 2015. Other ongoing collaborations include: 1) The ClinSeq project (Les Biesecker, NIH/NHGRI) 2) The Familial Scoliosis Project (Dr. Nancy Miller, University of Colorado) 3) Genetic analysis of neuro-anatomic quantitative traits in patients with ADHD. Dr. Philip Shaw (NIH/NHGRI)
Showing the most recent 10 out of 35 publications