My bioinformatics or computational biology research ranges from abstractive modeling for integrated omics studies to mechanistic modeling for systems biology investigation. In abstractive modeling, we develop or implement clustering, decomposition, and machine learning algorithms to unravel or predict significant biological themes from various omics data. In mechanistic modeling, we employ differential equations to explore dynamic and systems properties of genetic networks and to stipulate strategies for synthetic biology. Specifically, for discovering transcriptional modules, inferring regulatory networks, and examining evolutionary conservation and variation of regulatory programs, we developed a novel algorithm, ModulePro. In the algorithm, nonlinear independent component analysis is firstly applied to reduce the nonlinear distortion in gene expression data and represents the data with independent components corresponding to hidden biological sources. Sparse matrix analysis is then conducted to model the expression of each gene as a linear weighted combination of a small number of prototypes that represent predominant transcriptional regulators such as transcription factors or microRNAs. The algorithm takes into account the sparse property of regulator-target gene relationships, underscoring the fact that at the heart of a regulatory network are a few regulatory genes whose activities govern the behavior of many other genes. The algorithm can capture transcriptional modularity that might result from highly nonlinear interactions among genes, and cluster together the genes which have related functions or regulatory programs but may show different expression patterns. The algorithm can moreover partition a gene to multiple mutually non-exclusive modules, underscoring the fact that genes may have different functions and participate in different biological processes. Through real data analysis, we show that ModulePro can lead to a significant improvement in uncovering biologically coherent transcriptional modules and regulatory networks, in comparison with many other methods. For identifying RNA motifs and target genes of RNA-binding proteins, we developed a computational program, RNAmotifPro. The sequence motifs conserved in both primary and secondary structure among the sequences of training data are firstly identified, and modeled by stochastic context-free grammar. The significantly over-represented motifs are then identified as the final RNA motifs for a RNA-binding protein. The computational program outputs graphic presentations of the 1st and 2nd structures of identified RNA motifs and a list of putative target genes of the RNA-binding protein identified via genome-wide search. The program, implemented on the UNIX computer clusters and run by parallel computation, has served as a plausible tool for analyzing the data generated by the RIP-chip technology. For exploring dynamic behavior of regulatory network, we developed a novel computational algorithm, PathwayPro. The algorithm is based on a finite-state Markov chain model constructed with gene expression and network topology data. The analysis is commenced by conducting in-silico transcription intervention on each gene or gene combination to alter the expression level, followed by estimating the probabilities of network transition between different cell states or fates under each intervention. The algorithm can thus provide quantitative assessments of behavior transition of genetic network in instances such as cancer development or recovery, aging process, or cell differentiation, and offer evaluation of a wide range of cellular responses, including susceptibility to disease, potential usefulness of a drug, or consequences to such external stimuli as pharmacological interventions or caloric restriction. The potential clinical impact of such intervention analysis by the algorithm is tremendous as it not only can open up a window on the disease progression, but also translate into accurate diagnosis, target identification, drug development and treatment. To further explore dynamic behavior of regulatory network, we employ ordinary or stochastic differential equations for deterministic or stochastic modeling to reveal genetic or non-genetic mechanisms of cell fate control. Through simulation, bifurcation, and phase plan analyses of the models, we determine multi-stability of systems and switch-like transition between attractors or cell fates. Through sensitivity analysis, we examine how sensitive the steady states of genes or proteins are to the variation of parameters such as degradation, translation or decay of genes or proteins, or binding strength of transcription factors. We also examine the effects of transcriptional randomness or noise for the non-genetic mechanisms of cell fate control. These studies allow more quantitative and predictive description of cells, their differentiation and lineage selection.
Showing the most recent 10 out of 18 publications