Methods Development Because the non-independence of marker data is particularly relevant in next generation sequencing data, most of the theoretical work during the past year has focused on the testing, implementation and extension of Tiled regression, a linear regression based method for intra-familial tests of association that address non-independence both at the marker and observational level. Although most of the major methodological development has been completed, extensions implemented during the past year have focused on the incorporation of penalized regression methods and the use of family data, and the use of simulation to test the statistical properties of these methods in tiled regression when compared to the use of stepwise regression. The tiled regression methodology has been implemented in the Tiled Regression Analysis Package (TRAP), a software package written in the R programming language. The package is freely available on the NHGRI website: http://research.nhgri.nih.gov/software/TRAP. Simulation experiments to test the statistical properties of tiled regression Two simulation projects were completed during the past year and the results are being prepared for submission Suktitipat et al, Kim et al.. The Suktitpat et al. project focused on the statistical properties of tiled regression compared to those of simple linear regression. Tiled regression had comparable power, a more conservative type I error and a lower FDR than corresponding results from simple linear regression of single markers in a GWAS setting. Kim et al. investigated penalized regression methods as an alternative to stepwise regression. Results from this study suggested that stepwise regression outperformed penalized regression when the causal variants are present in the genotyping data, but penalized regression methods outperformed stepwise methods when the causal variant were not among the variants genotyped. Thus, penalized methods may be more appropriate for a GWAS, whereas stepwise methods may be the preferred approach for next generation whole genome data. A third simulation study investigated the effects of boundary definition on the type I error rate and power in tiled regression Sorant et al., in preparation. This project is being presented at the International Genetic Epidemiology Society meeting in late August. In this project, several criteria to define hot spot boundaries are evaluated and the power and type I error rate is determined for each criteria. At the genome level, there does not appear to be substantial differences between any of the criteria for boundary selection. However, at the definition of the tile level, defining actual hot spot recombination blocks and the intervening cold spot blocks, there are differences. Although from a mathematical standpoint, boundary definition is arbitrary, it appears that the criteria to define recombination hot spots may be important in identifying the local regions that are biologically relevant, rather than simply statistically relevant. Additional simulations are being performed on data provided as part of the Genetic Analysis Workshop 19, to be held in Vienna, Austria in late August An et al., in preparation. Inflation of type I error had previously been reported in linkage analysis with STRPs at the telomere regions of the chromosomes, possible due in part to the increased density of STRPs in these regions and corresponding possible duplications and increased correlations between STRPs. In this project, the effects of variant position and the distribution of the trait phenotype on the distribution of type I error, were examined in standard tests of association with next generation sequence data. The GAW 19 data was used to test this concept on both common and rare next generation sequence variants. With respect to the physical position of the type I errors, there does not appear to be any consistent patterns of inflated type I errors at the telomeres, although isolated areas of increased type I error were observed. The data are being more thoroughly annotated and the distribution of type I error will be considered within several different ENCODE defined functional groups. The effect of the distribution of the trait phenotype was also examined. Although it is well known that the type I error rate of rare variants is substantially inflated, the reason for this is not clear. Although non-normally distributed traits had inflated type I error rates for rare variants as expected, when these traits were transformed to be more normally distributed, the inflation was reduced and eventually disappeared depending on the strength of the transformation. More common alleles (with MAFs >0.05) do not appear to have inflated type I error rates for any of the non-normally distributed traits considered, and transformation of these variants appears to be unnecessary. Collaborations Craniosynostosis Justice et al., as part of a long-term collaboration with Dr. Simeon (Boyd) Boyadjiev at UC Davis, reported a genome-wide association study (GWAS) for non-syndromic sagittal craniosynostosis and these associations were replicated in an independent Caucasian population of 186 unrelated probands with non-syndromic sagittal craniosynostosis and 564 unaffected controls Nat Genet 2012. During the past year, zebrafish were used to test the expression of the conserved non-coding regulatory elements previously identified, in order to determine if the expression of identified sequence variants differed from that of the wildtype expression. To accomplish this, a putative regulatory element was created with site-directed mutagenesis and inserted into the Zebrafish Enhancer Detection (ZED) vector construct. The ZED vector was microinjected into one-cell stage zebrafish embryos. The embryos were screened with fluorescent microscopy for red and green florescent protein (RFP and GFP, respectively) positive embryos. Embryos demonstrating RFP/GRP expression were grown to adulthood and bred with wildtype fish. Several germline transmitting founders were identified for each ZED vector construct and their progeny were screened for patterns of RFP/GFP expression, again using fluorescent microscopy. The variant showed substantially enhanced GFP expression in the mid-brain, when compared to wildtype expression Justice et al., in preparation. Methods development Two methods development manuscripts focusing on generalized linear models for gene- based case-control association studies were published during 2013 - 2014. Both were authored or co-authored by Dr. Ruzong Fan (NICHD) and published in Genetic Epidemiology. Dr. Wilson is both a collaborator and member of Dr. Fans mentoring committee. Other ongoing collaborations include: 1) Clinical characterization of NF1 (Dr. Douglas Stewart, NIH/NCI) 2) The ClinSeq project (Les Biesecker, NIH/NHGRI) 3) Variation in metabolites in the Irish Trinity Student Study (Dr. Larry Brody, NHGRI)
Showing the most recent 10 out of 35 publications