Understanding the processes that regulate the transcription of genes is central to understanding evolution, the development of multicellular organisms, and the response to pathological changes, including cancer and heart disease. This proposal aims to make substantial progress in developing and testing computational methods, and then applying them to experimental systems. We develop and deploy a battery of computational methods aimed at associating regulators with their targets, and inferring sequences that are targets for currently unidentified regulators. Testing and validation is carried out both retrospectively, against well curated databases, and prospectively, using a variety of experimental methods on a selected set of predictions.
The specific aims i nclude the following. 1. Develop and test innovative approaches for discovering new binding sites for well studied regulators, as well as sites for currently unidentified regulators. The former method requires integrating numerous and often very large datasets and then pruning the features to identify those that are biologically most relevant. Our preliminary results suggest that doing so substantially improves performance over existing methods 2. Implement all algorithms on IBM BlueGene/L This is one of the fastest machines available, though implementing algorithms on it requires a fair amount of technical sophistication. Our current implementation increases compute power over standard 2 GHz processors by approximately 20-fold. The use of Blue Gene/ L in combination with (1) will put the research community in a position to make discoveries that are substantially greater in number and more reliable than is currently possible. 3. Apply and test the methods on (i) the full S Cerevisiae genome and (ii) the mammalian GABA A receptor family. The former offers the advantages of being well studied, of providing a large set for data for testing, and of being relatively simple compared to the mammalian genome. GABA is the major inhibitory neurotransmitter in the central nervous system (CNS), and plays a key role in CNS development and disease. ? ? ?

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM080625-02
Application #
7485709
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Lyster, Peter
Project Start
2007-08-15
Project End
2010-07-31
Budget Start
2008-08-01
Budget End
2009-07-31
Support Year
2
Fiscal Year
2008
Total Cost
$598,658
Indirect Cost
Name
Boston University
Department
Engineering (All Types)
Type
Schools of Engineering
DUNS #
049435266
City
Boston
State
MA
Country
United States
Zip Code
02215
Lu, Junjie; Li, Hu; Hu, Ming et al. (2014) The distribution of genomic variations in human iPSCs is related to replication-timing reorganization during reprogramming. Cell Rep 7:70-8
Kim, Shinuk; Park, Taesung; Kon, Mark (2014) Cancer survival classification using integrated data sets and intermediate information. Artif Intell Med 62:23-31
Hu, Ming; Deng, Ke; Qin, Zhaohui et al. (2013) Bayesian inference of spatial organizations of chromosomes. PLoS Comput Biol 9:e1002893
Hu, Ming; Deng, Ke; Selvaraj, Siddarth et al. (2012) HiCNorm: removing biases in Hi-C data via Poisson regression. Bioinformatics 28:3131-3
Kim, Shinuk; Kon, Mark; DeLisi, Charles (2012) Pathway-based classification of cancer subtypes. Biol Direct 7:21
Hu, Ming; Zhu, Yu; Taylor, Jeremy M G et al. (2012) Using Poisson mixed-effects model to quantify transcript-level gene expression in RNA-Seq. Bioinformatics 28:63-8
Shi, Ping; Ray, Surajit; Zhu, Qifu et al. (2011) Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction. BMC Bioinformatics 12:375
Hung, Jui-Hung; Whitfield, Troy W; Yang, Tun-Hsiang et al. (2010) Identification of functional modules that correlate with phenotypic difference: the influence of network topology. Genome Biol 11:R23
Zhou, Qing; Liu, Jun S (2008) Extracting sequence features to predict protein-DNA interactions: a comparative study. Nucleic Acids Res 36:4137-48
Lee, Soohyun; Kasif, Simon; Weng, Zhiping et al. (2008) Quantitative analysis of single nucleotide polymorphisms within copy number variation. PLoS One 3:e3906

Showing the most recent 10 out of 11 publications