We propose research to improve methods to analyze dietary data collected in epidemiologic studies by refining the methodology needed to define the role of diet as a risk factor for and in the prevention of disease. Diet is increasingly accepted to play an important role in the etiology of many diseases including cancer, cardiovascular disease, and diabetes mellitus, yet no universally accepted strategy exists to analyze dietary data. Traditional dietary data analyses have focused on single foods or nutrients. This univariate analysis of foods or nutrients doesn't account for the specific structure of diet and the interrelatedness of foods and nutrients. Diet is a universal exposure; within- and between-person variation of total caloric intake is limited. High frequencies of particular foods or preferences for a food group implies low intake of other foods or food groups. Regressing an individual food or nutrient on a disease outcome doesn't account for these complexities. We propose exploring the problem of model specification using several approaches. With repeated measures over time, long-term diet can be assessed more precisely but modeling diet-disease relationships correctly becomes more difficult. We will explore how such repeated measures can be used most efficiently to define dietary intake over an extended period of time to account for the most important time window with respect to the specific disease. Numerous epidemiologic studies use food frequency questionnaires to assess dietary intake. Study participants often omit responses to numerous food items. Blanks in diet questionnaires represent an untypical type of missing values. They may reflect difficulties remembering intake, may be oversights or result from fatigue, or may represent zero intake. Through Interviews with study participants we will explore the most important reasons for nonresponse. This will provide us with the opportunity to improve the questionnaire design to reduce the number of blanks in the future; furthermore, gaining an insight into the underlying missingness structure allows us to use appropriate analytic methods to account for missing values. Results from regression models may be crucially dependent on the appropriate handling of missing values, in particular, if the number of omitted food items is not trivial. We have the unique opportunity to address the above issues using three large cohorts of the Nurses' Health Study, the Nurses' Health Study II, and the Health Professionals Follow-up Study.
We aim to provide guidance to researchers who wish to explore diet-disease relations in how to use data best to obtain most valid results and avoid pitfalls which might produce misleading associations.