This project aims to develop efficient statistical algorithms for estimation of complex dynamics, nonlinear components and disturbances, with applications to threat detection in engineering and biological systems. Four interconnected research tasks will be addressed: (1) partial state estimation of dynamical systems; (2) adaptive smoothing spline estimation of functions with varying roughness; (3) P-spline estimation of shape restricted functions; and (4) estimation and threat detection of power systems, genetic networks, and engineering systems. By integrating novel techniques from asymptotic statistics, optimization and control theory, theoretically sound and efficient detection algorithms will be developed and be applied to potentially transformative systems.
A number of critical national infrastructure and important engineering or biological systems consist of numerous components and are constantly subject to disturbances. The failure of these components and/or hazardous disturbances pose threats to national security, economy and health. The success of this project will allow practitioners to better predict system dynamics and imminent threats, and therefore to avoid potentially damaging consequences. In particular, it will be useful for detecting adverse disturbances in power systems, deepen the understanding of dynamical behaviors of epidemiological diseases, and improve precision and reliability of aerospace and other engineering systems. The investigators will also actively pursue various educational and outreach activities by engaging students at all levels to strengthen and broaden awareness of science, technology, engineering, and mathematics.
The following research tasks have been carried out during 2013-2014: (1) Modeling and inference on functional data analysis with an application to the CD4 counts data; (2) Functional Cox models wit an application to the Mexican fruit flies data. As one of the most important prognostics of the infections of human immunodeficiency virus (HIV), CD4 count has been incessantly investigated by scientists from many aspects. Biologists make use of a wide range of well-established statistical methods, including but not limited to, univariate and multivariate regression, categorical analysis and longitudinal data analysis, to model the CD4 count data and draw practical conclusions. However, nobody has proposed a statistical analysis strategy motivated and designed specifically to fit the unique characters of the CD4 count data, which will be discussed comprehensively in this project. In general, statistical modeling is an iterative process. It is fully reflected in this study focusing on investigating the CD4 data, which is a longitudinal data set with a random number of sparsely distributed repeated measurements of the CD4 counts taken at non-equal time intervals for each sample object. First of all, tools of functional data analysis (FDA) are employed to explore the features of the CD4 data, both numerically and visually. In particular, we apply the nonparametric repeated Hanning method (Tukey, 1977) to smooth the CD4 percentage, which is the sample CD4 count as compared to the whole number of lymphocyte cells, curves over visits for each patient. The amount of smoothing is determined when the estimated autocorrelation coefficients are stable or negligible (Lambert and Liu, 2006). Since the CD4 percentage trend have been shown to be linear after smoothing, we fit a simple linear model of CD4 percentage to visit for each subject of at least 3 visits and obtain the estimates of both intercept and slope. As the smoothed curves are linear, there is no landmark for us to refer. Other than exploratory analysis, another important application of FDA is to answer scientific question of interest. In this project, we formulate a simultaneous multiple hypotheses of testing whether the CD4 slopes are positive or not, with the intention to locate HIV-resistant subjects with non-decreasing CD4 curves, i.e., the HIV/AIDS nonprogressors. Our interest is to make simultaneous inference about n=219 assertions about the slopes. We formulated the simultaneous testing problem of multiple slopes and derived the IM (Martin and Liu, 2013) for inference. These 8 nonprogressors, in together with an additional 2 nonprogressors, also identified by using the Benjamini-Hochberg procedure controlling FDR. A variety setting of simulations were also performed, showing that the FDR procedure outperform the IM procedure in identifying no less true positives, while in the other hand the IM always identifying no less true negatives than the FDR. Most importantly, the IM procedure provides a way of measuring uncertainties of the assertion directly through the belief functions and the plausibility functions, which stand for the lower probabilities and the upper probabilities, respectively. However, as a post hoc adjustment of multiple p-values, the FDR procedure does not provide any uncertainty justification. This project is the main part of a student Shuang He's dissertation. She has successfully passed her defense and is now working at Eli Lilly Company. This second project is motivated by a study on the Mexican fruit flies (Carey et al., 2005). There were 1152 flies in that paper coming from four cohorts but we are using the data from cohort 1 only, which consists of the lifetime and daily reproduction (in terms of number of eggs laid daily) of 288 female flies. We are interested in how the daily reproduction will affect the survival time of Mexican fruit flies. The egg-laying curve will serve as our longitudinal covariates and the 28 infertile flies will be excluded for further analysis. The functional Cox model is fitted based on early reproduction trajectory from day 6 to day 30, where 6 is the first day that any of the 260 fertile flies starts to lay eggs and 30 is picked as the peak of reproduction based on the average reproduction curve. To guarantee a fully observed trajectory, only the flies that survived pass day 30 is used, which corresponds to 224 observations. The analysis indicates that a large early reproduction before day 13 will result in a higher mortality rate while a large reproduction that occurs after day 20 leads to a lower mortality rate. Reproductions between day 13 and day 20 won't have a major effect on mortality rate. In other words, flies that lay a lot of eggs in their earlier age (before day 13) and then decreases the reproduction rate later after day 20 tends to die earlier while those who do the oppisite tends to have a longer life span. This project is part of another student Simeng Qu's dissertation.