Accurate risk assessment and prediction of treatment responses are essential in health care. The potential clinical and financial consequences associated with incorrect assignment of prognostic groups signify the need for reliable prognostic indices and the rigorous evaluation of their accuracy. For complex diseases, any single marker is often inadequate for precise prediction. With dramatically increased availability of new prognostic markers, it is now possible to improve prognostic accuracy by combining information from several markers. This gives rise to the need for statistical approaches to the optimal usage of information from multiple sources to improve disease management. Our proposal aims to develop procedures to address this need. In studies designed to develop prognostic classifiers, markers are often measured at baseline and patients are followed over time for the occurrence of clinical conditions. Since the risk for the disease occurrence may change over time, the time domain must be incorporated when developing prognostic classifiers. Another challenge that arises is that the event times are not always observable due to censoring. Current statistical literature for analyzing event time data focuses primarily on model based methods and their validity relies on the model assumption. Such assumptions may not hold in practice, which may lead to biased or invalid predictions. In this proposal, we consider robust approaches to the development and evaluation of prognostic classifiers. We will focus on the following three aims.
In Aim 1, we will develop robust methods for constructing an optimal composite score based on several markers.
In Aim 2, we will evaluate and compare the prognostic potential of estimated prognostic scores and develop optimal decision rules for assigning prognostic groups.
In Aim 3, we will provide procedures for identifying subjects who would benefit from a potentially expensive or invasive prognostic evaluation given an initial assessment. This project has access to a wide variety of real datasets which will guide the methodological research. Examples include 1) data from a study of patients diagnosed with pulmonary embolism;2) data from the Cardiovascular Health Study;3) gene expression data from a breast cancer study;and 4) data from an AIDS clinical trial.
Our aims will require development of large sample distribution theory, small sample simulation studies and application to real data. Software to implement analyses will use standard statistical packages such as Splus or SAS and will be fully documented.
Showing the most recent 10 out of 58 publications