Methods Development Because non-independence of marker data is particularly relevant in next generation sequencing data, most of the theoretical work during the past year has focused on the development, testing and implementation of Tiled regression, a linear regression based method for intra-familial tests of association that address non-independence both at the marker and observational level. Tiled regression uses multiple and stepwise regression methods in predefined segments of the genome, defined by hotspot blocks, to identify independent sequence variants responsible for the variation or susceptibility in quantitative and qualitative traits, respectively. Multiple, stepwise (and other) regression methods are used to test for associations on the sequence variants in each tile to select the independent markers within each tile. Higher order regressions are then used to identify significant variant across tiles, chromosomes and the entire genome. Quantitative and qualitative traits can be analyzed. With this approach, it becomes practical to analyze hundreds of thousands or millions of markers and their significant gene x gene interaction terms. This approach can substantially reduce the total number of tests to a number closer to the number of tiles rather than the number of markers. Furthermore, the tiled approach can be incorporated into a linear regression framework that allows for non-independence between observations incorporating features from the Regression of Offspring on Mid-Parant (ROMP) and Generalized Estimating Equations approaches. The tiled regression method was tested with simulated mini-exome sequence data as part of the Genetic Analysis Workshop 17 and results are presented in detail in Sung et al. BMC Proc, 2011. The most striking finding from this analysis was that methods that use simple linear regression without considering correlations between markers have estimated type I error rates (false positive rates) that are inflated by as much as three orders of magnitude (up to 1000 times) higher than their expected type I error rates depending on the underlying genetic model. The magnitude of the increase appears to be related to the correlations due to unknown causal variants that contribute to a quantitative trait. This suggests that even with permutation tests, the type I error rate for the analysis of sequence data with GWAS methods may be substantially inflated if the marker-marker correlations are ignored, generating thousands of false positive results. Because the tiled regression method identifies only independent sequence variants, the type I error rate is stable regardless of the underlying genetic model. Permutation tests using the tiled regression method should yield appropriate type I error rates. This approach has been applied to both SNP data from fine mapping SNP studies with the scoliosis data in collaboration with Dr. Nancy Miller (U of Colorado) two manuscripts submitted,2011, and two targeted candidate gene sequencing projects, an NF1 project in collaboration with Dr. Douglas Stewart and the ClinSeq project, in collaboration with Dr. Les Biesecker . In 2011 the tiled regression methodology was implemented in TRAP, a software package written in the freely available R language. The package is freely available on the NHGRI website: http://research.nhgri.nih.gov/software/TRAP. Two other projects involved the simulated mini-exome sequence data from Genetic Analysis Workshop 17 and the findings are now in press. As part of the first project Simpson et al, BMC Proc 2011 we evaluated intrafamilial tests of associations in order to compare the statistical properties of likelihood based and regression of offspring based (ROMP) methods. In the samples considered, both methods were able to detect causal sequence variants with locus specific heritabilities greater than about 0.1, but neither method was able to detect causal variants with locus specific heritabilities near 0. There was some inflation of the type I error rates for both methods. In the second project Kim et al. BMC Proc 2011, we evaluated machine learning methods to detect associations in the GAW 17 simulated data. These methods did not provide any substantial advantage over more traditional methods, although interaction effects, the strength of the learning machine methods, were not included in the underlying simulation model. Collaborations Familial Idiopathic Scoliosis Several analyses focusing on candidate regions and phenotypic subsets have been completed and manuscripts have either been submitted or are in preparation. These included: 1) Statistical genetic analysis of two sets of families with familial idiopathic scoliosis with characteristics nearly identical to those of the sample analyzed in Miller et al. 2005. Linkage analysis and tests of association were performed in two regions on chromosome 1, previously identified as primary candidate regions. We have identified several regions of interest for subsequent nextgen sequencing Behnemann, doctoral thesis, anticipated 2011. 2) Targeted sequencing of the IRX gene family in families with kyphoscoliosis. We have identified an association between kyphoscoliosis and a sequence variant in an upstream conserved region of one of the IRX genes. Association analysis resulted in 12 SNPs with p-values <0.025, of which 11 are 500 kb from IRX1, including the most significant SNP (p = 0.000382). One of these SNPs is in a HCNR sharing 87% sequence identity with a HCNR upstream from IRX3 on 16q12 Justice et al. submitted. 3) Statistical genetic analysis of STRPs and SNPs on chromosomes 9 and 16. Fine mapping on chromosomes 9 and 16 was performed to narrow previously identified candidate regions. Linkage and association studies identified several highly significant regions that are candidates for next generation sequencing Miller et al., submitted. 4) A study based on the presence of males with severe scoliosis Miller et al., submitted. The males with severe curve subset was comprised of 25 families (207 individuals) in which at least one male was diagnosed in adolescence with a ≥30 lateral curvature. The genome-wide linkage analysis for the qualitative and quantitative traits resulted in significant p-values (2 adjacent markers with p-values <0.01) on chromosomes 2, 16 and 22. Significant SNPs lie primarily in the introns of the LARGE gene, integral to the development and maintenance of skeletal muscle, and SFI1, responsible for the integrity of the chromosomal centromere complex. Other large ongoing collaborations include: 1) Clinical characterization of NF1 (Dr. Douglas Stewart, NIH/NCI) 2) the ClinSeq project (Les Biesecker, NIH/NHGRI) 3) the GeneSTAR project (Drs. Diane and Lewis Becker, Johns Hopkins University School of Medicine) Mathias et al., 2010 4) Variation in metabolites in the Irish (Dr. Larry Brody, NIH/NHGRI)

Project Start
Project End
Budget Start
Budget End
Support Year
10
Fiscal Year
2011
Total Cost
$1,705,216
Indirect Cost
Name
National Human Genome Research Institute
Department
Type
DUNS #
City
State
Country
Zip Code
Szekely, Eszter; Schwantes-An, Tae-Hwi Linus; Justice, Cristina M et al. (2018) Genetic associations with childhood brain growth, defined in two longitudinal cohorts. Genet Epidemiol 42:405-414
Justice, Cristina M; Kim, Jinoh; Kim, Sun-Don et al. (2017) A variant associated with sagittal nonsyndromic craniosynostosis alters the regulatory function of a non-coding element. Am J Med Genet A 173:2893-2897
Chiu, Chi-Yang; Jung, Jeesun; Wang, Yifan et al. (2017) A comparison study of multivariate fixed models and Gene Association with Multiple Traits (GAMuT) for next-generation sequencing. Genet Epidemiol 41:18-34
Justice, Cristina M; Kim, Jinoh; Kim, Sun-Don et al. (2017) Cover Image, Volume 173A, Number 11, November 2017. Am J Med Genet A 173:i
Velkova, Aneliya; Diaz, Jennifer E L; Pangilinan, Faith et al. (2017) The FUT2 secretor variant p.Trp154Ter influences serum vitamin B12 concentration via holo-haptocorrin, but not holo-transcobalamin, and is associated with haptocorrin glycosylation. Hum Mol Genet 26:4975-4988
Molloy, Anne M; Pangilinan, Faith; Mills, James L et al. (2016) A Common Polymorphism in HIBCH Influences Methylmalonic Acid Concentrations in Blood Independently of Cobalamin. Am J Hum Genet 98:869-82
Fan, Ruzong; Chiu, Chi-Yang; Jung, Jeesun et al. (2016) A Comparison Study of Fixed and Mixed Effect Models for Gene Level Association Studies of Complex Traits. Genet Epidemiol 40:702-721
Justice, Cristina M; Bishop, Kevin; Carrington, Blake et al. (2016) Evaluation of IRX Genes and Conserved Noncoding Elements in a Region on 5p13.3 Linked to Families with Familial Idiopathic Scoliosis and Kyphosis. G3 (Bethesda) 6:1707-12
Carter, Tonia C; Pangilinan, Faith; Molloy, Anne M et al. (2015) Common Variants at Putative Regulatory Sites of the Tissue Nonspecific Alkaline Phosphatase Gene Influence Circulating Pyridoxal 5'-Phosphate Concentration in Healthy Adults. J Nutr 145:1386-93
Wang, Yifan; Liu, Aiyi; Mills, James L et al. (2015) Pleiotropy analysis of quantitative traits at gene level by multivariate functional linear models. Genet Epidemiol 39:259-75

Showing the most recent 10 out of 35 publications