Missing data, censored data and surrogate markers are common incomplete data problems in biomedical data analysis. In this project, we are interested in statistical methods for experimental, observational, and genetic studies where there exist missing data, measurement errors, and surrogate markers. Examples include health surveys containing non-responders or missing items, surrogate marker data with measurement errors, etc. The applications could be longitudinal clinical trials, multilevel community studies, genetic markers, health surveys, etc. The incomplete data could be the non-ignorable missing response used in a model or as predictors, i.e. missing response, missing covariate, and covariate measurement errors. The most complicated scenario is the combination of such difficulties, e.g. missing response with covariate measurement errors, censored data with surrogate markers and measurement errors, etc. In this project, the ultimate results will be two statistical packages aiming at longitudinal and survival responses: 1) MiMe: statistical methods for missing data and measurement errors, and 2) Laso: joint modeling methods for longitudinal and survival outcomes in the study of surrogate marker for clinical event time. Functional and structural approaches will be developed, and they are applicable to many other areas, e.g. genetic markers association studies. The results from this project include innovative statistical methods, sensitivity analysis, graphical methods, case studies, software tools, and publications. An R version will be available and advanced used may apply this version for comparison studies vs. other approaches or customize this version for further extensions. A second version is to incorporate the tools from this research into our online data analysis platform, the Longit Informatics Center. Subscribers can access many statistical packages, modules, and dynamic graphics in Longit for data analysis. For various commercialization purposes, we will deliver online and offline versions, i.e. internet, intraweb, and desktop versions. We will also license ou API version for integrating with other analytic systems in business and other non-biomedical fields. One example is to integrate Longit with Alteryx, a commercial data mining tool for big data analysis.

Public Health Relevance

This project aims to develop statistical methods and friendly software tools for analyzing incomplete data with missing data, surrogate markers, and measurement errors. The outcome response could be time- independent data, longitudinal data or survival data in behavioral studies, cancer studies, AIDS studies, health surveys, etc. We will deliver desktop, intraweb, and web versions, and users may integrate and customize our software API in their own analytic systems for biomedical studies or business data mining.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Small Business Innovation Research Grants (SBIR) - Phase II (R44)
Project #
5R44GM100573-03
Application #
9060357
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Marcus, Stephen
Project Start
2012-06-01
Project End
2017-04-30
Budget Start
2016-05-01
Budget End
2017-04-30
Support Year
3
Fiscal Year
2016
Total Cost
Indirect Cost
Name
Data Numerica Institute, Inc.
Department
Type
DUNS #
003849838
City
Bellevue
State
WA
Country
United States
Zip Code
98006
Huang, Yijian; Wang, Ching-Yun (2018) Cox regression with dependent error in covariates. Biometrics 74:118-126
Yu, Hsiang; Cheng, Yu-Jen; Wang, Ching-Yun (2018) Methods for multivariate recurrent event data with measurement error and informative censoring. Biometrics 74:966-976
Wang, Ching-Yun; Cullings, Harry; Song, Xiao et al. (2017) Joint nonparametric correction estimator for excess relative risk regression in survival analysis with exposure measurement error. J R Stat Soc Series B Stat Methodol 79:1583-1599
Wang, Ching-Yun; Song, Xiao (2016) Robust best linear estimator for Cox regression with instrumental variables in whole cohort and surrogates with additive measurement error in calibration sample. Biom J 58:1465-1484