Developed an application for biological theme extraction of microarray patterns based on ontology related annotation (TEMPORA). The application takes gene expression patterns that are grouped by a clustering algorithm and tests for the over-representation (the abundance) of Gene Ontology (GO)biological processes for the genes within the clusters. Then, the application uses latent semantic indexing (a form of natural language processing) to create concepts from the relationship between Medical Subject Heading (MeSH) terms for diseases and the PubMed documents (scientific articles) that the genes in the GO biological processes of a cluster are published in. Finally, a similarity matrix is generated from the scientific articles to group the documents by hierarchical clustering. The scientific articles are labeled by gene IDs, GO biological processes and PubMed IDs so that clusters of documents with associated biological processes can be investigated based on the pattern of the expression from the genes in a given cluster. ? ? --------------------------------------------------------------------------------------------------? ? Developed an application for generation of phenotypic prototypes (Modk-prototypes). The application uses k-mode and k-means style clustering of categorical histopathology observations and numeric gene expression and clinical chemistry data respectively to cluster biological samples into groups which share phenotypic responses to stimuli. The clustering of the samples using Modk-prototypes and all the data together performs better than clustering of the data using any one of the data domains separately or pairwise combinations of the data.? ? --------------------------------------------------------------------------------------------------? ? Developed a software for extracting gene expression patterns and identifying co-expressed genes (EPIG). Through evaluation of the similarity among profiles, the magnitude of variation in gene expression profiles, and profile signal-to-noise ratios, EPIG extracts a set of patterns representing co-expressed genes without a pre-defined seeding of the patterns. In extracting gene expression patterns, EPIG uses a filtering process where all profiles initially are considered as pattern candidates. Subsequently, EPIG categorizes each gene to the pattern, for which it has the highest similarity with the gene profile. A gene not assigned to any extracted patterns is considered an orphan if its highest similarity value is lower than a given threshold corresponding to a p-value of 0.0001 to assure the significance of the co-expression.? ? --------------------------------------------------------------------------------------------------? ? Developed an application for gene selection and multiclass prediction of biological samples. The application uses a multiclass kernel for selecting the most informative genes to predict samples with a high degree of accuracy.? ? --------------------------------------------------------------------------------------------------? ? ? Performed scanning of the mouse and human genomes to search for patterns of DNA sequence, motifs and restriction enzyme locations. Analyzed ChiP on Chip data for detection of hypermethylation of DNA sites. ? ? --------------------------------------------------------------------------------------------------? ? Developed the MicroArray Project System (MAPS) database for more customized management of experimental information and data from microarray studies. Developed customized analytical applications for implementation into the Chemical Effects on Biological Systems (CEBS) database.? ? --------------------------------------------------------------------------------------------------? ? Developed a software for phase-shift analysis of gene expression (PAGE) data. The PAGE software clusters profiles of gene expression from multiple biological conditions across dose and time series experiments. Grouping of gene expression patterns is performed in intervals of the measurements using phase-shifts to find clusters of genes which share trends of expression profiles within the dataset. The PAGE method has three phases: ? Phase 1: Gene expression pattern matrix transformation into -1,0,1 to indicate the direction of expression change from each biological condition at fixed time and dose points. All biological replicates are averaged if provided. ? Phase 2: Generate clusters which have similar patterns of expression of over consecutive conditions ? Phase 3: Assign a significance score for each bicluster in all clusters and identify the inhibition patterns of each cluster.

Agency
National Institute of Health (NIH)
Institute
National Institute of Environmental Health Sciences (NIEHS)
Type
Intramural Research (Z01)
Project #
1Z01ES102345-01
Application #
7594035
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
1
Fiscal Year
2007
Total Cost
$1,347,190
Indirect Cost
City
State
Country
United States
Zip Code
SEQC/MAQC-III Consortium (2014) A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat Biotechnol 32:903-14
Farnan, Laura; Ivanova, Anastasia; Peddada, Shyamal D (2014) Linear mixed effects models under inequality constraints with applications. PLoS One 9:e84778
Huang, Lingkang; Zhang, Hao Helen; Zeng, Zhao-Bang et al. (2013) Improved Sparse Multi-Class SVM and Its Application for Gene Selection in Cancer Classification. Cancer Inform 12:143-53
Lu, Jun; Bushel, Pierre R (2013) Dynamic expression of 3' UTRs revealed by Poisson hidden Markov modeling of RNA-Seq: implications in gene expression profiling. Gene 527:616-23
Davis, Barbara J; Risinger, John I; Chandramouli, Gadisetti V R et al. (2013) Gene expression in uterine leiomyoma from tumors likely to be growing (from black women over 35) and tumors likely to be non-growing (from white women over 35). PLoS One 8:e63909
Huda, Ahsan; Bushel, Pierre R (2013) Widespread Exonization of Transposable Elements in Human Coding Sequences is Associated with Epigenetic Regulation of Transcription. Transcr Open Access 1:
Williams-DeVane, Clarlynda R; Reif, David M; Hubal, Elaine Cohen et al. (2013) Decision tree-based method for integrating gene expression, demographic, and clinical data to determine disease endotypes. BMC Syst Biol 7:119
Zhang, Liwen; Simpson, Dennis A; Innes, Cynthia L et al. (2013) Gene expression signatures but not cell cycle checkpoint functions distinguish AT carriers from normal individuals. Physiol Genomics 45:907-16
Corton, J Christopher; Bushel, Pierre R; Fostel, Jennifer et al. (2012) Sources of variance in baseline gene expression in the rodent liver. Mutat Res 746:104-12
Zhang, Liwen; Bushel, Pierre R; Chou, Jeff et al. (2012) Identification of Identical Transcript Changes in Liver and Whole Blood during Acetaminophen Toxicity. Front Genet 3:162

Showing the most recent 10 out of 28 publications