Discovering interactions between the attributes in a data set provides insight into the underlying structure of the data and explains the relationships between the attributes. This project develops multi-disciplinary approaches that integrate computer science, statistics, and epidemiology techniques to mine interaction relationships among attributes and phenotypes (traits or class labels) in biological data sets. Specifically, this project develops innovative and statistically sound methodologies for mining novel interactions within attributes or between attributes and phenotypes to help identify critical factors in biological applications. In particular, the novel analysis methods can enable the genetic and environmental interactions underlying a range of complex diseases to be delineated. The research activities of this project can also promote the integration of biology, computer science, and statistics, which is highly significant to many applications.
The project will formulate various metrics that enable efficient pruning and searching in the multi-dimensional combinatorial space for identifying significant interaction relationships. This enables highly effective approaches that build search-based trees or identify highly correlated subspaces to detect meaningful local interactions that may not be significant considering the whole data sets but are strongly interacted with traits on a subset of data. This enables comparison of data from multiple different groups such as based on age, race, or other properties. It is important to find both common and different interactions in different groups so that effective methods can be developed for targeted groups. The methods developed will detect complex interactions between attributes in multiple groups simultaneously by capturing both their commonalities and differences in joint matrix factorization or deep learning models. These approaches are remarkably powerful for biological applications, such as detecting gene-gene interactions and gene-environmental interactions that lead to breast cancer. The concept of interaction is also ubiquitous and important in many scientific disciplines ranging from economics, sociology and physics, to the pharmaceutical sciences. The novel approaches and analysis tools developed in this project are useful for finding out any interaction relationships between attributes associated with phenotype labels or without phenotype labels. These approaches and tools are general and are applicable to a variety of applications.