The support provided under Core D reflect a growing trend in studies of environmental exposure from more traditional epidemiological studies and simple experimental designs to high-dimensional biology, with its emphasis on 'omic'technologies and complicated questions addressing the possible interaction of environmental exposures and high-dimensional measures of the genome, proteome, etc. These high-dimensional data sets are characterized by many (thousands) of measurements made on only a few independent units (e.g., people). Thus, the Core D reflects a parallel evolution in the field of biostatistics towards developing methodologies that can both find patterns in high dimensional data sets as well as providing proper statistical inference for these patterns. Besides offering consulting on traditional epidemiological experimental design and analysis questions, Core D will focus its efforts on providing the most relevant and rigorous statistical techniques to the Program's projects. With new 'omic' technologies, biology has entered a new more empirical phase where the goals of the research are ambitious (e.g., discovery of regulatory gene networks affected by particular environmental toxicants), but the sample sizes relatively small (biological replicates numbering in the tens). With these technologies, have come also a proliferation of proposed methods to find biologically meaningful patterns and typically little theory is provided to guide their relative worth. The goal of this Core is to provide the project researchers with the best techniques available, software to help implement them, a computational environment that can handle computer-intensive methods on large data sets and, most importantly, rigorous statistical inference for the parameters estimated by these procedures. A subset of the developments related to the proliferation of high-dimensional biological/epidemiological data particularly relevant to this proposal are 1) multiple testing, 2) machine-learning and loss-based estimation, 3) grouping algorithms methods, 4) causal inference and 5) biological metadata and systems biology. In addition, the Core will provide access to a computational environment that lends itself to the computationally intensive methods developed for data mining and re-sampling based inference.
Showing the most recent 10 out of 629 publications