Dyslexia and other language-based reading disorders (RD) account for nearly 85% of children receiving special education services in the U.S. RD affect between 5% and 17% of school-age children with boys at higher risk than girls. Children with RD are at high risk for academic failure and future underemployment. The current diagnosis of RD relies on behavioral symptoms. This means that RD cannot be identified until after the child has begun to learn to read. By this time, a potentially opportune window for intervention has been missed. Early identification and prevention is possible by using genetic markers because RD have a high heritability rate. The existing genetic studies about RD are limited in the sense that the gene-RD association was evaluated on a one-gene-at-a-time basis. This approach is not efficient because hundreds of thousands of genes need to be evaluated; nor is it effective because it runs a high risk of miss-detecting important genes due to overly-strict multiple test corrections applied to too many genes. Also, the existing one-gene-at-a-time approach only examines the marginal association of each gene with RD, without accounting for the joint effect of multiple genes and their interaction in relation to RD. For a complex phenotype like RD, a multi-gene- interactive mechanism is more plausible and has been supported by recent studies. Despite this, little research has been done to discover the genes simultaneously and characterize their interactions. This is partially because this field has not been able to take advantage of modern machine learning developments that provide effective and efficient approaches for genetic big data modeling and analysis. Another limitation of the existing studies is that they have been focused on single behavioral deficits and did not account for demographic difference. The short-term goals for this proposed project are to identify the gene sets associated with RD, link the gene sets to enriched functional biological pathways, characterize the gene-RD associations across the behavioral deficits in multiple reading abilities and accounting for demographic differences, and validate the findings using existing population-based datasets. These goals will be achieved using a combination of advanced machine learning algorithms and pathway analyses that are applied to existing population-based RD data sources. There are two specific aims:
Aim 1 focuses on identification of significant genes and their interactions in relation to RD using sparse machine learning models and pathway analysis;
Aim 2 focuses on validation for the models and findings in Aim 1 using the Avon Longitudinal Study of Parents and Children (ALSPAC) dataset and another independent dataset. The long-term goal of this research is to contribute to the development of personalized early identification methods and early intervention for children at risk of RD.
- Public Health Relevance Children with reading disorders account for up to 17 percent of the school-age population, and are at greater risk for underachievement in school, underemployment, and related mental health problems. The proposed interdisciplinary research has practical implications for understanding the biological causes of reading disorders and improving early identification and intervention methods.