Knowledge of genetic architecture of the human and mouse genome has important implications for strategies of disease gene mapping, quantitative trait loci (QTL) mapping, and the utility of mouse model for human cancer. We developed a software tool, SNPdetector, for automated identification of mutations and single nucleotide polymorphisms (SNPs) by fluorescence-based resequencing. SNPdetector was designed to minimize the requirement for manual SNP analysis, currently the bottleneck in a large-scale investigation. The error rate of SNPdetector is 50% lower than the current """"""""state-of-art"""""""" software tool polyphred 5. Two large human genome sequencing centers, Washington University and Baylor College of Medicine, have installed and applied SNPdetector for large-scale SNP analysis for the ENCODE project and tumor mutation detection. SNPdetector has also been applied to discover ENU-induced mutations in a zebra fish mutagenesis study at NHGRI. We analyzed the patterns of Linkage Disequilibrium in 166 human genes sequenced by the SeattleSNPs project. We found that the block-like LD structure, reported in many recent studies, only covers 15-20% of the genomic regions. The remaining regions either have overlapping LD or singleton SNPs that are not in LD with the other markers. We analyzed the origin of the overlapping LD patterns and evaluated how well the current haplotype blocking methods captures the various LD patterns in the these human genes. We anticipate that the discovery of the overlapping LD pattern will have an impact on the design of association study. In mouse models, lack of genetic diversity has been considered as a major drawback of laboratory-inbred mouse. Our analysis of a high-resolution, multiple-strain haplotype structure of mouse chromosome 16 reveals a complex haplotype structure, indicating that the controlled complexity of laboratory mouse strains provides great utility for studying human complex diseases.The laboratory also has focused efforts on developing tools for functional analysis. These include analytical methods, computational processes and visualization tools to evaluate mRNA expression data, as well as tools to identify candidate genes. It is recognized that pathway analysis makes significantly greater demands on observed microarray data than cluster or classification analysis. Existing tools do not differentiate probes of good quality from those that have either excess expression or null expression values. In addition, one gene may have multiple probe sets that give conflicting expression signals. To resolve these issues, we developed a method to build a unified probe definition using gene mapping information. Our collaborators at NCICB are currently using this gene-based probe definition to analyze expression data in the Rembrandt project for the clinical research community. To facilitate the identification of candidate genes in a disease association study, we have developed a dynamic and robust search engine, the Gene Functional Similarity Search Tool (GFSST), which allows us to select candidate genes in disease association studies and drug target discoveries. For a given gene or a given set of gene functions defined in Gene Ontology (GO) terms, this tool can identify genes within a user defined similarity threshold. To facilitate this search, we have defined a statistical model to measure functional similarity of genes based on the GO directed acyclic graph (DAG). An implementation of GFSST on UniProt (Universal Protein Resource) for the human and mouse genomes is available at http://gfsst.nci.nih.gov. We have developed tools for synchronizing Ciphergen MassSpec profile to reduce experimental variations that give false positive signal. We are also developing a new algorithm for biomarker discovery.
Radtke, Ina; Mullighan, Charles G; Ishii, Masami et al. (2009) Genomic analysis reveals few genetic alterations in pediatric acute myeloid leukemia. Proc Natl Acad Sci U S A 106:12944-9 |
Zhang, Jinghui; Finney, Richard P; Clifford, Robert J et al. (2005) Detecting false expression signals in high-density oligonucleotide arrays by an in silico approach. Genomics 85:297-308 |
Zhang, Jinghui; Hunter, Kent W; Gandolph, Michael et al. (2005) A high-resolution multistrain haplotype analysis of laboratory mouse genome reveals three distinctive genetic variation patterns. Genome Res 15:241-9 |
Zhang, Jinghui; Rowe, William L; Clark, Andrew G et al. (2003) Genomewide distribution of high-frequency, completely mismatching SNP haplotype pairs observed to be common across human populations. Am J Hum Genet 73:1073-81 |