Longitudinal genetic studies provide a valuable resource for exploring key genetic and environmental factors that affect complex traits over time. Genetic analysis of longitudinal data that incorporates trait variation over time is critical to understanding genetic influence and biological variations of complex diseases. In recent years, many genetic studies have been conducted in cohorts in which multiple measures on a trait of interest are collected on each subject over a period of time in addition to genome sequence data. These studies not only provide a more accurate assessment of disease condition but enable researchers to investigate the influence of genes on the trajectory of a trait and disease progression. This project focuses on the development of novel association testing methods to analyze sequencing genomic data at gene levels. The research will help provide insights into the underlying biology and progression of complex diseases.

In longitudinal genetic studies and data from the Electronic Medical Records and Genomics (eMERGE) network, phenotypic traits and genetic variants may be viewed as functional data. Functional data analysis (FDA) can serve as a valuable tool for exploring key genetic and environmental factors that affect complex traits over time. In the presence of a large number of rare variants, gene-based analysis is a more powerful tool for gene mapping than testing of individual genetic variants. This project seeks to develop stochastic functional regression models and longitudinal sequence kernel association tests (LSKAT) to analyze longitudinal traits of population samples and pedigree or cryptically related samples, and to analyze pleiotropic traits. FDA techniques and kernel-based approaches are utilized to reduce the high dimensionality of sequencing data and draw useful information. A variance-covariance structure is constructed to model the measurement variation and correlations of an individual's trait based on the theory of stochastic processes and novel penalized spline models are used to estimate the trajectory mean function. The proposed methods and software will be tested and refined using real data sets and simulation studies. User-friendly software will be developed to implement the proposed methods and will be made publicly available.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

National Science Foundation (NSF)
Division of Mathematical Sciences (DMS)
Standard Grant (Standard)
Application #
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Yale University
New Haven
United States
Zip Code