Methods 1) Software Development The tiled regression methodology has been implemented in the Tiled Regression Analysis Package (TRAP); version 2.0 of the software includes additional penalized regression models and was released in June, 2016. The package is freely available on the NHGRI website: http://research.nhgri.nih.gov/software/TRAP. 2) Inflated Type I error rate and non-normally distributed traits In this study, the effects of the minor allele frequency of the single nucleotide variant (SNV), the degree of departure from normality of the trait, and the position of the SNVs on type I error rates were investigated in the Genetic Analysis Workshop (GAW) 19 whole exome sequence data. To test the distribution of the type I error rate, five simulated traits were considered: standard normal and gamma distributed traits; two transformed versions of the gamma trait (log10 and rank-based inverse normal transformations); and trait Q1 provided by GAW 19. Tests of association were performed with standard linear regression and average type I error rates were determined for minor allele frequency classes. Rare SNVs (minor allele frequency < 0.05) showed substantially inflated type I error rates for nonnormally distributed traits that increased as the minor allele frequency decreased. The inflation of average type I error rates increased as the significance threshold decreased. Normally distributed traits did not show inflated type I error rates with respect to the minor allele frequency for rare SNVs (Schwantes-An et al. 2015, in press). Extending the work of Schwantes-An et al. (2015) to tiled regression, Sung et al. (2015) investigated the effects of the minor allele frequencies of SNVs and the degree of departure from normality of a quantitative trait on type I error rates on GAW 17 exome sequence data. Similar to the study above, four simulated traits were generated including standard normal and gamma distributed traits and two transformations of the gamma distributed trait by log10 and rank-based inverse normal functions. Again, average type I error rates were obtained for MAF classes. Type I error rates from simple linear regression were compared to those from tiled regression. Type I error rates were substantially lower for tiled regression suggesting that this approach adequately handles the effects of variant correlation and multicollinearity and even the non-normality of the trait distribution. Finally, Sung et al. (2016) investigated the effects of different sets of critical values on type I error rates in tiled regression with genotype data from the Trinity Student Study (TSS). Two hundred replications of simulated null traits from the standard normal distribution were analyzed using four different sets of critical values for stepwise regression at each stage of tiled regression. Results indicate that the multicollinearity among the SNPs considered and the aggregate type I error rates decreased through the three tiling stages; the region-specific type I error rates were slightly lower than the nominal critical values at the tile level; and the critical value at the tile level was between two aggregate type I error rates defined under two different assumptions about the number of tests (the number of SVs and the number of tiles). Collaborations 1) Familial Idiopathic Scoliosis In a previous study of Familial Idiopathic Scoliosis/Kyphoscoliosis (FIS/KS) , a genome-wide linkage analysis of seven families with at least two individuals with kyphoscoliosis found linkage (P-value = 0.002) in a 3.5-Mb region on 5p13.3 containing only three known genes, IRX1, IRX2, and IRX4. In that study, the exons of IRX1, IRX2, and IRX4, the conserved NCEs in the region, and the exons of a nonprotein coding RNA, LOC285577, were sequenced. No functional sequence variants were identified. An intrafamilial test of association found several associated noncoding single nucleotide variants. The strongest association was with rs12517904 (P = 0.00004), located 6.5 kb downstream from IRX1. In one family, the genotypes of nine variants differed from the reference allele in all individuals with kyphoscoliosis, and two of three individuals with scoliosis, but did not differ from the reference allele in all other genotyped individuals. One of these variants, rs117273909, was located in a conserved NCE that functions as an enhancer in mice. To test whether the variant allele at rs117273909 had an effect on enhancer activity, zebrafish transgenesis was performed with overlapping fragments of 198 and 687 bp containing either the wild type or the variant allele. Our data suggests that this region acts as a regulatory element; however, its size and target gene(s) need to be identified to determine its role in idiopathic scoliosis. 2) Craniosynostosis Justice et al. (2012) reported a genome-wide association study (GWAS) for non-syndromic sagittal craniosynostosis and these associations were replicated in an independent Caucasian population of 186 unrelated probands with non-syndromic sagittal craniosynostosis and 564 unaffected controls. Zebra fish were used to test the expression of the previously identified conserved non-coding regulatory elements in order to determine if the expression of identified sequence variants differed from that of the wild type expression. To accomplish this, a putative regulatory element was created with site-directed mutagenesis and inserted into the Zebra fish Enhancer Detection (ZED) vector construct. The embryos were screened with fluorescent microscopy for red and green florescent protein (RFP and GFP, respectively) positive embryos. Embryos demonstrating RFP/GRP expression were grown to adulthood and bred with wild type fish. Several germline transmitting founders were identified for each ZED vector construct and their progeny were screened for patterns of RFP/GFP expression, again using fluorescent microscopy. GFP expression in the fish with the risk allele (C) appears to occur in the midbrain and hindbrain, while in the fish with the wild type (T) allele, GFP expression was observed in the midbrain-hindbrain boundary (Justice et al. 2016, submitted). 3) Variation in metabolites in the Irish Trinity Student Study (Dr. Larry Brody, NHGRI) The TSS includes 38 quantitative traits related to folate and vitamin B12 metabolism. Molloy et al. (2016) used a GWAS approach to determine whether there were significant associations between levels of plasma methylmalonic acid (MMA), a product of vitamin B12 metabolism, and over 750,000 SNPs in the TSS. Traditional methods for the statistical genetic analysis of quantitative traits were used to determine whether genetic effects that were responsible for at least a portion of the variation in MMA were tested with SLR. Significant associations were found between variants in the 3-hydroxyIsobutyryl-coA hydrolase (HIBCH) and acyl-coA synthetase family member 3 (ACSF3) genes and plasma MMA levels, accounting for about 12% of the phenotypic variance of MMA. The most significant SNP (rs291466) resulted in a missense change from methionine to threonine in the initiator codon. The association between plasma MMA levels and rs291466 in HIBCH was replicated in an independent sample of 1,481 older individuals, also of Irish descent. Other ongoing collaborations 1) The ClinSeq project (Les Biesecker, NIH/NHGRI) 2) Genetics of Brain Growth. Dr. Philip Shaw (NIH/NHGRI)
Showing the most recent 10 out of 35 publications