With high adoption rates of electronic health record systems across the country, there is an increasing volume of electronic data available in the medical field. These data can be utilized as rich sources to generate data-driven hypotheses for further testing. The generated hypotheses may be utilized to guide further study design more accurately and efficiently, as compared to non-data-driven methods. To generate hypotheses is a critical step in the life cycle of a clinical research project; however, the process can be challenging and very time-consuming. To improve the efficiency of hypothesis generation, it would be very helpful to the clinical research community to have an explicit illustration of the hypothesis-generation process. Although there are existing research methodologies for hypothesis generating, none focuses on hypothesis generating via the use of secondary data in clinical research. Our overall project goal is to explore the process of hypothesis generation through the use of secondary data by both experienced and inexperienced clinical researchers. These clinical researchers will be asked to generate hypotheses with or without the use of the Web-based data analytic tool VIADS: a Visual Interactive Analysis tool for filtering and summarizing large Data Sets coded with hierarchical terminologies. The process of all of the clinical researchers will be captured via observations, think-aloud videos, surveys, and follow-up questions. The results will be compared among the four groups. Once the differences are detected, a follow-up study will be conducted to examine whether the identified differences can facilitate the hypothesis- generation process. The same secondary data set will be utilized across all the groups. The hypotheses generated by the clinical researchers will be compared and assessed by an expert panel in regard to their validity, significance, clinical relevance, feasibility, and clarity as well as to the number of hypotheses, the average time consumed to generate each hypothesis, and so forth. We hypothesize that there will be differences between experienced and inexperienced clinical researchers whether they utilize a secondary data analytic tool in generating hypotheses; the experience of the group of clinical researchers who perform ideally may be used to assist the other group of clinical researchers in generating their hypotheses more efficiently. The R-15 grant will support the research team in their exploration of the hypothesis-generation process more closely and accurately via the use of VIADS (or not) and their evaluation of the results in terms of facilitating clinical researchers in generating hypothesis and improving clinical research efficiency in the long term. We anticipate that the results generated by this study will provide an illustration of the hypothesis- generation process, especially for testable hypotheses, through the use of secondary data in clinical research. The evidence can be used to (1) provide guidance for possible automation of hypothesis generation in clinical research, and (2) increase efficiency in clinical research, in general, in the long term.

Public Health Relevance

for generating hypotheses guided by secondary data analysis We explore the hypothesis-generation process in clinical research, particularly when the process is guided by secondary data analysis. The illustration of the detailed and explicit process may provide the foundation for understanding the process, which is currently unknown. The results will contribute to the development of more intelligent tools to facilitate hypothesis generation, the formulation of research questions more efficiently, and, eventually, to the improvement of clincial research productivity, in general. A better understanding of the cognitive process of hypothesis-generation in a clincial research context will contribute to knowledge beyond clinical research.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Academic Research Enhancement Awards (AREA) (R15)
Project #
Application #
Study Section
Biomedical Computing and Health Informatics Study Section (BCHI)
Program Officer
Vanbiervliet, Alan
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Ohio University Athens
Other Health Professions
Sch Allied Health Professions
United States
Zip Code