The availability of massive amounts of genomic data, in the form of genome sequence, mRNA gene expression, and DNA structural information has opened up a huge opportunity to develop investigative tools to infer the molecular basis for biological function. However, existing statistical methods insufficiently address the complexities arising in the analysis of such large data sets, which include complex dependence structures, many missing observations, and varying resolutions of different data types, leading to biased inference. The goals of this project are to provide robust and efficient analysis tools for data in the presence of the above complexities through development of: (i) novel unified Bayesian statistical methodology for detection of transcription factor binding sites utilizing genomic sequence and data generated through high-throughput technologies, (ii) innovative statistical classification methods for precise detection of elements of chromatin structure using data from high-resolution genome tiling arrays and identification of underlying sequence characteristics that determine chromatin structure and function and (iii) publicly available software addressing the above goals for the use of the scientific research community. The methods will be applied and validated on data from genomes at three levels of complexity, yeast, C. elegans and human, leading, in the long term, to major scientific advances in characterizing distinguishing features of chromatin regulation in complex genomes and a better understanding of causation of various cellular processes including a variety of disease states.

Public Health Relevance

Elucidation of the factors underlying chromatin structure and binding of transcription factors to genomic DNA has enormous potential implications for human health. Most fundamental cellular processes involve protein-DNA interactions that are influenced by chromatin structure. Achievement of the goals of this project would provide a detailed understanding of how biological function is encoded through genomic sequence and structure, and can have large implications in the understanding of diseases, potentially leading to new breakthroughs in genomic medicine.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Small Research Grants (R03)
Project #
5R03HG004946-02
Application #
7666274
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Good, Peter J
Project Start
2008-08-01
Project End
2010-11-30
Budget Start
2009-06-01
Budget End
2010-11-30
Support Year
2
Fiscal Year
2009
Total Cost
$81,250
Indirect Cost
Name
Boston University
Department
Biostatistics & Other Math Sci
Type
Schools of Public Health
DUNS #
604483045
City
Boston
State
MA
Country
United States
Zip Code
02118
Moser, Carlee; Gupta, Mayetri (2012) A generalized hidden Markov model for determining sequence-based predictors of nucleosome positioning. Stat Appl Genet Mol Biol 11:
Karasik, David; Cheung, Ching Lung; Zhou, Yanhua et al. (2012) Genome-wide association of an integrated osteoporosis-related phenotype: is there evidence for pleiotropic genes? J Bone Miner Res 27:319-30
Gupta, Mayetri; Cheung, Ching-Lung; Hsu, Yi-Hsiang et al. (2011) Identification of homogeneous genetic architecture of multiple genetically correlated traits by block clustering of genome-wide associations. J Bone Miner Res 26:1261-71
Mitra, Ritendranath; Gupta, Mayetri (2011) A continuous-index Bayesian hidden Markov model for prediction of nucleosome positioning in genomic DNA. Biostatistics 12:462-77
Gupta, Mayetri (2009) Model selection and sensitivity analysis for sequence pattern models. Inst Math Stat Collect 1:390-407
Gupta, Mayetri; Ibrahim, Joseph G (2009) An Information Matrix Prior for Bayesian Analysis in Generalized Linear Models with High Dimensional Data. Stat Sin 19:1641-1663