Structural equation modeling unifies regression, factor analysis, directed graphs and other (non)linear models into a powerful and flexible toolbox for statistical inference. It has well-documented merits in various areas, as diverse as biology, ecology, economics, psychology, and social sciences. Despite the flexibility of structural equation models (SEMs), their ability to cope with high-dimensional problems encountered in contemporary fields is limited due to the lack of efficient and effective inference methods. A truly focused effort is required to make necessary breakthroughs in high-dimensional SEMs and demonstrate their suitability in emerging research areas. The objective of this project is to develop efficient inference methods for high-dimensional SEMs tailored for inference of gene networks and optimized strategies for chemical genomics. A key enabler to this end is leveraging the sparsity attributes present in high-dimensional data. The proposed research themes are centered around two thrusts: (T1) Inference for sparse SEMs: A set of efficient and robust inference methods using novel algorithmic techniques and parallel computing will be developed for both linear and nonlinear high-dimensional SEMs; and (T2) SEM-based inference of gene regulatory networks and application to optimized chemical genomics: S. cerevisiae and human gene networks will be inferred by integrating multiple types of data under the SEM framework. The inferred networks will be also validated experimentally. A set of natural compounds will be profiled using SEM-based computational strategies to drive chemical genetic screens in S. cerevisiae and S. pombe. The proposed modeling framework will explicitly incorporate genetic variation across individuals in a population, and thus, can directly utilize the wealth of sequencing data that is currently being generated to tackle the genotype-to-phenotype challenge. Furthermore, the proposed work will markedly enhance the throughput at which new bioactive compounds are characterized using chemical genomics-based approaches in yeast, and in other model systems. It will also enable the application of high-dimensional SEMs in additional areas including economics, psychology, ecology, biobehavioral and other social science.
Successful completion of the proposed project could have broad impact on human health as it would help to understand the role of genes and their interactions in various diseases and enable the construction of more comprehensive small molecule libraries with well-defined molecular targets for use in new therapeutics.
Showing the most recent 10 out of 19 publications