Pattern analyses are central in applications seeking for general guidelines. Cluster analysis is one type of pattern analysis. This application aims to develop and apply novel model-based clustering methods to a longitudinal data set from a birth cohort established in 1989 to 1990 on the Isle of Wight (IOW), UK. The proposed methods aim to jointly cluster subjects and interdependent variables aiming to improved cluster homogeneity. The word """"""""joint"""""""" refers to the ability of clustering subjects and clustering of variables along with the incorporation of dependence between the two clustering processes. At the meantime, we allow the existence of non-clustered subjects/variables. We will apply the methods to identify clusters of allergic sensitizations to different allergens (ASDA) and subjects belonging to each cluster of ASDA by searching for consistent temporal trend in subsets of ASDA. Through the inferred cluster profiles, we evaluate the association between two temporal patterns: asthma/wheeze status and allergic sensitizations over time with co-morbidities considered. Existing clustering methods (parametric or non-parametric) cannot achieve the goal stated above. These methods either cannot explain the contribution from external variables such as time (external variable) effect in allergic sensitizations (variables of interest), or overook the interdependence between different variables (e.g. allergic sensitizations to different allergens). Recent findings support dynamic allergic patterns. However, it is largely unknown (1) whether there exist a group (or groups) of allergens to which sensitizations share a similar temporal trend (natural history) such as periods of high or inert system responsive, and (2) whether dynamic allergic patterns are associated with asthma/wheeze persistence, remission, or new onset (phenomic association). This application attempts to fill these gaps, which will potentially lead us closer to the understanding of natural history of asthma, and provide strong potential to move forward the asthma prevention agenda. The birth cohort on the IOW in U.K. comprises 1,456 children examined at birth, age 1, 2, 4, 10, and 18 years with retention >90%. The cohort has extensive phenotype data at different ages and records of environmental factors such as allergen and pollutant levels. The main variables in our study include longitudinal allergic sensitization measures and asthma/wheeze status. The proposed methods are not limited to this data set, and can be applied to any data with continuous measures on a certain number of variables, e.g. high throughput gene expression data or methylation data. Our team has a long track record of successful collaboration with biostatistical (Zhang) and epidemiological (Karmaus) knowledge at the University of South Carolina, and clinical experts (Arshad and Roberts) at the University of Southampton and David Hide Asthma &Allergy Research Center on IOW. Dr. Zhang has rich experience in statistical modeling [1R03HL095429, Zhang (MPI)]. Several projects by this group are supported by NIH including 1R01AI091905 [Principal Investigator: Karmaus] and 1R01HL082925 [Principal Investigator: Arshad];on both projects Dr. Zhang is a key investigator.

Public Health Relevance

Early prevention of asthma is essential to reduce the burden of this high-impact and avoidable disease. The novel statistical methods are able to identify clusters of individuals with consistent temporal trend of allergic sensitization against a group of allergens, which consequently enables us to assess phenomic associations (association between collections of phenotypes) between asthma and allergic sensitization. The findings will improve our understanding to natural history of allergy and asthma, and by intervening in the temporal trend of allergic sensitization, the projects have strong potential to critically impact or ability to prevent new onset of asthma.

National Institute of Health (NIH)
National Institute of Allergy and Infectious Diseases (NIAID)
Exploratory/Developmental Grants (R21)
Project #
Application #
Study Section
Infectious Diseases, Reproductive Health, Asthma and Pulmonary Conditions Study Section (IRAP)
Program Officer
Minnicozzi, Michael
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of South Carolina at Columbia
Public Health & Prev Medicine
Schools of Public Health
United States
Zip Code
Terry, William; Zhang, Hongmei; Maity, Arnab et al. (2017) Unified variable selection in semi-parametric models. Stat Methods Med Res 26:2821-2831
Han, Shengtong; Zhang, Hongmei; Karmaus, Wilfried et al. (2017) Adjusting background noise in cluster analyses of longitudinal data. Comput Stat Data Anal 109:93-104
Ray, Meredith A; Tong, Xin; Lockett, Gabrielle A et al. (2016) An Efficient Approach to Screening Epigenome-Wide Data. Biomed Res Int 2016:2615348
Lockett, G A; Soto-Ramírez, N; Ray, M A et al. (2016) Association of season of birth with DNA methylation and allergic disease. Allergy 71:1314-24