The primary focus of the Comparative Genomics Unit is to use bioinformatics tools to investigate the multitude of fascinating research opportunities afforded by the rapidly expanding genomic sequence and genotype resources. This is accomplished largely through collaborations with researchers within National Institutes of Health (NIH) Intramural Sequencing Center (NISC), Genome Technology Branch (GTB), National Human Genome Research Institute (NHGRI), other institutes within NIH and Department of Health and Human Services (DHHS) and researchers around the world. ? ? With the NISC group we have looked at ways to augment their targeted sequencing efforts with those from similar species that are being sequenced from a whole genome approach. The number of mammalian species with whole genome sequence (WGS) available has grown to 23, and most of these are or will be sequenced at NISC as part of the ENCyclopedia Of DNA Elements (ENCODE) project, thus NISC can benefit from the WGS data, as well as the sequencing centers benefiting from the NISC data. ? ? We are also involved in helping set up a large-scale medical sequencing (LSMS) program, integrating software from other sources and collaborations, e.g. Peter Chines?s (GTB/Collins Lab) primerTile package and Dr. Debbie Nickerson?s (University of Washington, Seattle, Washington) PolyPhred 5.04 package. We now have four LSMS pilot projects actively generating data with the collaborating researchers accessing the results. These test projects will prepare NISC?s LSMS system for a much larger clinical sequencing project (ClinSeq) headed by Dr. Les Biesecker (NHGRI/Genetic Disease Research Branch) involving hundreds of genes and 1000 individuals. The pilot projects also show that even on the small-scale, a few tens of thousands of sequence traces, many investigators are interested in using the LSMS system we are developing, thus we anticipate many more projects of this sort in the future. ? ? Together with Patricia Porter-Gill (GTB/Brody Lab) we have expanded the methylation sequencing project to new genomic targets and additional DNA samples. ? ? Phase I of the International Haplotype Map (HapMap) Project was completed and published providing a valuable resource for researchers around the world. In addition, the Phase II dataset was publicly released, greatly increasing the resolution of the human haplotype structure; Phase II includes genotypes for nearly 4 million single nucleotide polymorphisms (SNPs) across 270 individuals representing 4 populations groups. ? ? In collaboration with Dr. David Reich and his group at Harvard Medical School, we are using the HapMap genotype data to study population genetics; mapping out the timing of the out-of-Africa events for East Asian and North Western European populations as well as the severity of the population bottlenecks for these events. ? ? In collaboration with Dr. Steve O?Brien (NIH/National Cancer Institute) we assembled the low-redundancy sequence of the cat genome, and when combined with a radiation hybrid map of the cat chromosomes and similarity to the dog genome, we have mapped most of the assembled sequence to locations along the cat chromosomes. This has allowed the mapping and analysis many features of the cat genome. One of the many very important outcomes of this effort is that cat breeds show a pattern of long segments of homozygosity which can make the process disease mapping efficient, nearly the same level as in the dog genome. Since cats and dogs exhibit many diseases with similar phenotypes as seen humans, efficient disease mapping in cat or dog breeds may accelerate the mapping of its similar disease in humans. ? ? In collaboration with Dr. Evan Eichler (University of Washington), the fosmid based human genome structural variation discovery effort initially involved the sequencing of nine individuals. So far, seven individuals are sequenced, and NHGRI?s council has approved increasing the number of individuals to 48. We will continue to select additional individuals from the HapMap project, leveraging the HapMap genotype data to allow us to select individuals that were most different from each other, so that this expanded discovery phase will maximize the yield on this important new class of human variation. ? ? New sequencing technologies are being developed and translated into commercial products. We have evaluated two of these technologies for feasibility of use within GTB/NISC. We sent 4 bacterial artificial chromosome (BAC) clones for sequencing on one of these new platforms and compared the results to our traditional Sanger sequencing results generated at NISC. At present, we did not see that this would be any more cost effective than our present method, however, as this new technology develops, the cost/benefit should improve making this possibly an attractive sequencing platform in the future. For a second platform which is designed to generate millions of 25mer sequences we have only run simulations, but this platform could be useful for medical sequencing for certain classes of projects. We will be keeping a close watch on new technologies in the coming year.
Showing the most recent 10 out of 23 publications