Even though multiple imaging and genetic modalities can easily be collected on the same set of individuals, methods for effectively combining these different types of information are still in their infancy. All of these modalities typically involve thousands of data points per subject, and thus simple correlative approaches are of very limited use for uncovering hidden patterns and associations in these data and can easily be computationally overwhelming. This problem is only growing as the technologies improve (e.g. currently we can derive information on over 1 million single nucleotide polymorphisms (SNPs) and with the recent advent of epigenetic assays even more genetic information is available). We propose to develop a class of multivariate methods to enable research on healthy versus diseased brain by identifying associations among these different high dimensional data types. Our development of methods for the effective fusion of behavioral, fMRI, EEG, and genetic array data involves a two-level approach. In the first level, we start from a framework that makes strong assumptions about the associations and the underlying generative model across data types, and then extend this framework to allow for more flexible types of associations and underlying assumptions. In the second level, we consider ways to incorporate reliable prior information into a particular fusion framework and develop methods that improve upon those developed in the first level by effectively using prior information or meaningful constraints. Thus we provide a set of effectively """"""""informed"""""""" data-driven tools for the task. Complementary data-driven approaches we will develop include methods based upon canonical correlation analysis (CCA) which utilizes second-order statistical information. We will also continue to develop methods based upon independent component analysis (ICA) utilizing higher-order statistical information. Joint ICA (jICA) is an approach which assumes a common linear relationship among modalities. Though jICA has proven quite useful, we will also be investigating a number of ways to relax the assumptions of jICA for increased flexibility as well as the incorporation of prior information. For example, we can relax the assumption of common profiles while still emphasizing interrelationship among a subset of components using parallel ICA. We can emphasize group differences using constrained coefficient ICA. We will also investigate the utility of nonlinear ICA (in this case relaxing the assumption of linear interrelationships between modalities) as well as approaches which do not assume stationarity. The methods we develop will provide a nice framework for allowing investigators to ask more realistic questions about high dimensional data and will provide a much needed set of tools to the community. We will also focus on integrating data spanning from genetic to behavior and focus upon two important applications where integrating such data is important, schizophrenia and addiction. This will help us to further generalize the algorithms developed. We will work with data collected from two studies, one on 720 schizophrenia patients and controls, and another with 310 heavy drinkers. With access to these highly unique, large data sets, combined with our work on the development of computational approaches for fusing high dimensional data, applied to the conceptual models we have developed for disease, we are poised to fill an important gap in the field and produce new tools which have applicability to a wide variety of diseases.
In this proposal we will develop a family of data driven approaches to effectively integrate fMRI, ERP, genetic, and behavioral data and enable the incorporation of available information. We will develop, validate, and apply our methods to schizophrenia and addiction, both of which are extremely complex, mixtures of genetic and environmental factors, and affect a large number of individuals (and which share co-morbidity with one another). Our methods will be implemented in a user-friendly software toolbox with anonymized data provided.
Showing the most recent 10 out of 231 publications