This subproject is one of many research subprojects utilizing the resources provided by a Center grant funded by NIH/NCRR. The subproject and investigator (PI) may have received primary funding from another NIH source, and thus could be represented in other CRISP entries. The institution listed is for the Center, which is not necessarily the institution for the investigator. In whole genome shotgun assembly experimentally obtained random pieces of the genome (typically of size about 500 bases), called sequence reads, are put back together thereby reconstructing the sequence of an organism's genome. Because of the scale of the data (70 Million reads at full coverage for a 3 Gigabase mammalian genome) and the repetitive nature of the genome, this problem is extremely difficult. Even smaller genomes with 3-5 fold coverage can pose serious computational challenges in terms of memory and processing power. The Whitehead Institute/MIT Center for Genome Research is part of an NIH-funded consortium to sequence the mouse genome. By the end of 2001, this consortium will have produced about 35 million sequence reads, about half of which are from the Whitehead. They have also developed a software system (Arachne) for the assembly of genomes, and tested it on the existing data (about 17 million reads). The memory demands scales linearly with the number of reads, which is proportional to the product of the size of the genome and the degree of coverage. As more reads become available they expect to continue with incremental assemblies using the additional data. Importance of the problem. Sequence for the mouse genome will facilitate the discovery of features in the human genome. It will also facilitate research about the mouse. Together, these two purposes make the mouse sequence of fundamental importance to the biological and biomedical communities. Computational requirements. An assembly of 17 million reads required about 5 days, and used up to 29 GB of memory on a Compaq ES40 667 Mhz machine. Prior experience suggests that the problem scales approximately linearly, so we anticipate that 35 million reads will require a running time of 10 days (which should be reduced to perhaps 5 days because the processors will be faster), and memory usage of about 60 GB. In addition to the mouse genome, the Center has an active program in sequencing and assembling other organisms. Currently the Center produces about 45 Million lanes (or reads) of sequence a year. Organisms recently sequenced or currently in the sequencing pipeline include -Methanosarcina, Neurospora, Tetraodon, and Ciona. The proposed resource will increase the rapidity with which they can assemble and release these genomes to the community. The computational requirements for assembling these other genomes are less than that needed for the mouse. Thus for Tetraodon and Ciona (whose genomes are substantially larger than those of Methanosarcina and Neurospora), they expect running times of two to three days and memory usage of 10 to 15 GB. They will want to repeat each assembly many times, each time experimenting with the algorithms. In general, these experiments lead to code improvements which apply to all genomes. PSC will make this Whitehead software into a service for other groups doing linkage analysis or whole genome assembly. Besides making the software and computer time available, the Research Resource at PSC will develop a biomedical training workshop focused on these codes, to make the techniques more widely known throughout the genomic community.

Agency
National Institute of Health (NIH)
Institute
National Center for Research Resources (NCRR)
Type
Biotechnology Resource Grants (P41)
Project #
2P41RR006009-16A1
Application #
7358380
Study Section
Special Emphasis Panel (ZRG1-BCMB-Q (40))
Project Start
2006-09-30
Project End
2007-07-31
Budget Start
2006-09-30
Budget End
2007-07-31
Support Year
16
Fiscal Year
2006
Total Cost
$1,012
Indirect Cost
Name
Carnegie-Mellon University
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
052184116
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213
Simakov, Nikolay A; Kurnikova, Maria G (2018) Membrane Position Dependency of the pKa and Conductivity of the Protein Ion Channel. J Membr Biol 251:393-404
Yonkunas, Michael; Buddhadev, Maiti; Flores Canales, Jose C et al. (2017) Configurational Preference of the Glutamate Receptor Ligand Binding Domain Dimers. Biophys J 112:2291-2300
Hwang, Wonmuk; Lang, Matthew J; Karplus, Martin (2017) Kinesin motility is driven by subdomain dynamics. Elife 6:
Earley, Lauriel F; Powers, John M; Adachi, Kei et al. (2017) Adeno-associated Virus (AAV) Assembly-Activating Protein Is Not an Essential Requirement for Capsid Assembly of AAV Serotypes 4, 5, and 11. J Virol 91:
Subramanian, Sandeep; Chaparala, Srilakshmi; Avali, Viji et al. (2016) A pilot study on the prevalence of DNA palindromes in breast cancer genomes. BMC Med Genomics 9:73
Ramakrishnan, N; Tourdot, Richard W; Radhakrishnan, Ravi (2016) Thermodynamic free energy methods to investigate shape transitions in bilayer membranes. Int J Adv Eng Sci Appl Math 8:88-100
Zhang, Yimeng; Li, Xiong; Samonds, Jason M et al. (2016) Relating functional connectivity in V1 neural circuits and 3D natural scenes using Boltzmann machines. Vision Res 120:121-31
Lee, Wei-Chung Allen; Bonin, Vincent; Reed, Michael et al. (2016) Anatomy and function of an excitatory network in the visual cortex. Nature 532:370-4
Murty, Vishnu P; Calabro, Finnegan; Luna, Beatriz (2016) The role of experience in adolescent cognitive development: Integration of executive, memory, and mesolimbic systems. Neurosci Biobehav Rev 70:46-58
Jurkowitz, Marianne S; Patel, Aalapi; Wu, Lai-Chu et al. (2015) The YhhN protein of Legionella pneumophila is a Lysoplasmalogenase. Biochim Biophys Acta 1848:742-51

Showing the most recent 10 out of 292 publications