Methods Development Because the non-independence of marker data is particularly relevant in next generation sequencing data, most of the theoretical work during the past year has focused on the testing, implementation and extension of Tiled regression, a linear regression based method for intra-familial tests of association that address non-independence both at the marker and observational level. Although most of the major methodological development has been completed, extensions implemented during the past year have focused on the incorporation of penalized regression methods and the use of family data, and the use of simulation to test the statistical properties of these methods in tiled regression when compared to the use of stepwise regression. The tiled regression methodology has been implemented in the Tiled Regression Analysis Package (TRAP), a software package written in the R programming language. The package is freely available on the NHGRI website: Simulation experiments to test the statistical properties of tiled regression Two simulation projects were completed during the past year and the results are being prepared for submission Suktitipat et al, Kim et al.. The Suktitpat et al. project focused on the statistical properties of tiled regression compared to those of simple linear regression. Tiled regression had comparable power, a more conservative type I error and a lower FDR than corresponding results from simple linear regression of single markers in a GWAS setting. Kim et al. investigated penalized regression methods as an alternative to stepwise regression. Results from this study suggested that stepwise regression outperformed penalized regression when the causal variants are present in the genotyping data, but penalized regression methods outperformed stepwise methods when the causal variant were not among the variants genotyped. Thus, penalized methods may be more appropriate for a GWAS, whereas stepwise methods may be the preferred approach for next generation whole genome data. In a related simulation project, the use of generalized estimating equations as a method for including family information in a linear regression model was investigated and compared it to a variance component approach (VCA) Suktitipat et al. 2012, Hum Hered. Although the VCA makes complete use of phenotyping, genotyping and family relationships, the computational time for VCA in whole-genome data in families is considerable. The power and type I error rate for a linear model with GEE clustering with a robust variance estimator, in clusters based on extended family structure (GEEExt) and clusters based on nuclear family structure split from the original extended family structure (GEESpl), was compared to that of VCA. The type I error rate for GEEExt was marginally higher than the nominal rate when the MAF was <0.1, and close to nominal rate when MAF ≥0.2. All methods gave consistent effect estimates and had similar power. The GEE extension to a linear model with a robust variance estimator was the computationally fastest and provided a reasonable alternative to the VCA for screening family data. Collaborations Familial Idiopathic Scoliosis Two analyses focusing on candidate regions and phenotypic subsets in the Familial Idiopathic Scoliosis (FIS) project were completed during the past year. 1) Candidate regions on 9q and 16p-16q, previously identified as linked to FIS in a study of 202 families (Miller et al. 2005), were genotyped with a custom high-density map of SNPs in order to identify candidate genes and prioritize them for next generation sequence analysis. Nominally significant linkage results were found for markers in both candidate regions. Results from intra-familial tests of association and tiled regression corroborated the linkage findings and identified possible candidate genes suitable for follow-up with next generation sequencing in these same families Miller et al. Human Hered 2012. 2) Tilley et al. Spine 2013 used the family data from Miller et al. 2001 in an attempt to replicate an association between FIS and the CHD7 gene on 8q12.2 in an independent sample of families of European descent. Model-independent linkage analysis and tests of association were performed for the previously reported 22 significant single nucleotide polymorphisms (SNPs) in the CHD7 gene in 244 families with familial idiopathic scoliosis (FIS). Results from the tests of associations from this study and the previous study were combined in a weighted meta-analysis. No significant results (P <0.01) were found for linkage analysis or tests of association between genetic variants of the CHD7 and FIS in this study sample, failing to replicate the findings from the previous study. Furthermore, no significant results (P <0.01) were found from meta-analysis of the results from the tests of association from this sample and from the previous sample. No association between the SNPs in the CHD7 gene and FIS within this study sample was found, failing to replicate the earlier findings. Sagittal craniosynostosis Justice et al. Nat Genet 2012, as part of a long-term collaboration with Dr. Simeon (Boyd) Boyadjiev at UC Davis, reported a genome-wide association study (GWAS) for non-syndromic sagittal craniosynostosis using 130 European American case-parent trios. The strongest association was observed in a 120 kb region in the 3 UTR of BMP2 on chromosome 20, flanked by rs1884302 and rs6140226. The second strongest association was found in a 167 kb region of BBS9 on chromosome 7 between rs10262453 and rs17724206, with the strongest association being to rs10262453. These associations were replicated in an independent Caucasian population of 186 unrelated probands with non-syndromic sagittal craniosynostosis and 564 unaffected controls for rs1884302 and rs10262453. These findings suggest that the BMP2 and/or BBS9 genes may be involved in the etiology of the development of sagittal craniosynostosis. The GeneSTAR project (Drs. Diane and Lewis Becker, Johns Hopkins University School of Medicine) In this study, Kim et al. PLoS One 2013 used a sequencing approach to identify additional exonic variants in PEAR1 that may also determine variability in platelet aggregation. A target region on chromosome 1q23.1 including the entire PEAR1 gene was Sanger sequenced in 104 subjects selected on the basis of hyper- and hypo- aggregation across three different platelet agonists. Single-variant and collapsed multi-variant burden tests for association were performed. Of the 235 variants identified though sequencing, 104 were novel and ten of these were missense variants. More rare variants (MAF <5%) were noted in African Americans compared to European Americans (108 vs. 45). The common intronic GWAS-identified variant (rs12041331) demonstrated the most significant association signal in the African American sample;no association was seen for additional exonic variants in this group. Sequencing approaches confirm that a common intronic variant has the strongest association in African Americans, and show that additional exonic variants play a role in platelet aggregation in European Americans. Other ongoing collaborations include: 1) Clinical characterization of NF1 (Dr. Douglas Stewart, NIH/NCI) 2) The ClinSeq project (Les Biesecker, NIH/NHGRI) 3) Bechet disease (Dr. Daniel Kastner, NIH/NHGRI) 4) Variation in metabolites in the Irish Trinity Student Study (Dr. Larry Brody, NHGRI)

Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
National Human Genome Research Institute
Zip Code
Ziegler, A; Wilson, A F; Gagnon, F (2014) Informatics and genetic epidemiology. Methods Inf Med 53:1-2
Kirino, Yohei; Zhou, Qing; Ishigatsubo, Yoshiaki et al. (2013) Targeted resequencing implicates the familial Mediterranean fever gene MEFV and the toll-like receptor 4 gene TLR4 in Behcet disease. Proc Natl Acad Sci U S A 110:8134-9
Tilley, Mera K; Justice, Cristina M; Swindle, Kandice et al. (2013) CHD7 gene polymorphisms and familial idiopathic scoliosis. Spine (Phila Pa 1976) 38:E1432-6
Cheng, Ching-Yu; Lee, Kristine E; Duggal, Priya et al. (2010) Genome-wide linkage analysis of multiple metabolic factors: evidence of genetic heterogeneity. Obesity (Silver Spring) 18:146-52
Biesecker, Leslie G; Mullikin, James C; Facio, Flavia M et al. (2009) The ClinSeq Project: piloting large-scale genome sequencing for research in genomic medicine. Genome Res 19:1665-74
Mathias, Rasika A; Deepa, Mohan; Deepa, Raj et al. (2009) Heritability of quantitative traits associated with type 2 diabetes mellitus in large multiplex families from South India. Metabolism 58:1439-45