Knowledge of haplotype structure in human and mouse has important implications for strategies of disease gene mapping, quantitative trait loci (QTL) mapping, and the utility of mouse model for human cancer. We have developed a software analysis package, HapScope, which includes a comprehensive analysis pipeline (including a novel SNP tagging algorithm) and a sophisticated visualization tool for analyzing functionally annotated haplotypes. HapScope was used by LPG PI to analyze haplotype structure of two BRCA1-interacting genes from breast-ovarian cancer families. Over 20 research institutes in the US and abroad have downloaded the HapScope package to analyze their clinical genotype data. Using the HapScope tool, we observed highly divergent haplotype patterns (referred to as yin yang haplotypes) in the human genome. Genome-wide analysis of common haplotypes in 62 random genomic loci and 85 gene-coding regions in humans shows the proportion of the genome spanned by yin yang haplotypes is 75%-85%. The abundance of yin yang haplotypes in the human genome suggests susceptibility will appear to be more greatly influenced by environment than genes. In mouse models, lack of genetic diversity has been considered as a major drawback of laboratory-inbred mouse. Our analysis of a high-resolution, multiple-strain haplotype structure of mouse chromosome 16 reveals that the genetic diversity in laboratory-inbred mice is similar to human and its controlled complexity provides great utility for studying human complex diseases. The laboratory also has focused efforts on developing analytical methods, computational processes and visualization tools to evaluate mRNA expression data. It is recognized that pathway analysis makes significantly greater demands on observed microarray data than cluster or classification analysis. Existing tools do not differentiate probes of good quality from those that have either excess expression or null expression values. It is speculated that this may contribute to the lack of consistency in expression measurements for duplicate probe sets that assay the same gene. To improve the quality of expression data, we analyzed non-specific and non-functional probe pairs on the Affymetrix chips using the probe sequence context. We discovered that 18% of probes might be problematic and implemented methods to filter this noise. The lack of internal consistency in a single experiment has a severe adverse impact on interpreting expression data and it is hoped that new analytic tools will improve the quality of the expression measurement prior to the modeling and analysis of pathway relationships. Three complementary approaches are being utilized to create pathway models: 1) statistical modeling, 2) logical modeling, and 3) computational modeling. The statistical methodology known as path analysis is being used to model gene expression data. These efforts will be extended to include a collection of pathway models of interest to cancer research derived from cancer (and normal tissue) data sets. The laboratory is also collaborating with the NCICB and CGAP to develop Logical Models of pathway data. This effort will utilize databases of biomolecular interactions in human and mouse based on KEGG and BIOCARTA pathway data. The last strategy being explored within the laboratory is computational modeling. Each element in the pathway is annotated with a set of incoming and outgoing connections, which link the gene or complex to other nodes in the system. Setting the state of a node to """"""""on"""""""" or """"""""off"""""""" triggers the propagation of the effects of the change throughout the system via the node's dependent connections. The utility of this approach is currently being assessed using expression data. Recognizing that there is no single best way to create a model of such complex processes as biologic pathways, these three complementary approaches are being employed and evaluated. The instantiation of pathways as code represents the first step in development of more complex computational models.

Agency
National Institute of Health (NIH)
Institute
Division of Basic Sciences - NCI (NCI)
Type
Intramural Research (Z01)
Project #
1Z01BC010470-01
Application #
6952052
Study Section
(LPG)
Project Start
Project End
Budget Start
Budget End
Support Year
1
Fiscal Year
2003
Total Cost
Indirect Cost
Name
Basic Sciences
Department
Type
DUNS #
City
State
Country
United States
Zip Code
Radtke, Ina; Mullighan, Charles G; Ishii, Masami et al. (2009) Genomic analysis reveals few genetic alterations in pediatric acute myeloid leukemia. Proc Natl Acad Sci U S A 106:12944-9
Zhang, Jinghui; Finney, Richard P; Clifford, Robert J et al. (2005) Detecting false expression signals in high-density oligonucleotide arrays by an in silico approach. Genomics 85:297-308
Zhang, Jinghui; Hunter, Kent W; Gandolph, Michael et al. (2005) A high-resolution multistrain haplotype analysis of laboratory mouse genome reveals three distinctive genetic variation patterns. Genome Res 15:241-9
Zhang, Jinghui; Rowe, William L; Clark, Andrew G et al. (2003) Genomewide distribution of high-frequency, completely mismatching SNP haplotype pairs observed to be common across human populations. Am J Hum Genet 73:1073-81