DNA commonly is viewed as an evenly spaced double helix and often depicted simply as a string of letters or even as two parallel lines. Genomes typically are analyzed by treating DNA sequences as if they actually were composed of strings of letters. In addition, both evolutionary biologists and computational biologists often make the assumption that all genetic variation is generated in a random manner. In reality, however, the structure of DNA is not monotonous, but rather varies along its sequence, sometimes dramatically so. Such variation in structure leads to sequence-dependent variations in the fidelity of DNA copying and repair. That the probability of distinct classes of mutations varies along a DNA sequence has implications for evolutionary theory because selection acts on heritable variation when this variation affects fitness. Highly mutable sequences have, in fact, evolved in genome regions such as those encoding pathogen coats, where increased diversity in a population favors survival. In addition, the fidelity of DNA replication and repair is affected by the activities of multiple enzymes (which can be induced by environmental or cell-type specific factors), and furthermore, it is becoming increasingly obvious that some of the information in DNA is carried in forms that can be obscured by treating DNA as if it actually were comprised only of a sequence of letters. Often it is the conformation of DNA (or RNA) or the relationship among sequences that carries the information.
It is factors such as these that will be the focus of the Genome Structure and Variation Conference. A broad interdisciplinary group of researchers will gather to explore the impact of our increasing understanding of DNA structure, repair, replication, and organization on subjects ranging from evolution and the dependence of the effect of mutagens on environmental and sequence context to non-canonical forms of information representation in genomes. Incorporating our knowledge of the sequence-dependent effects of DNA structure and context on the analysis of genome sequence and variation is a computational challenge. However, in order for new methods to be developed, the appropriate communities must be made aware of the existence of novel biochemical and genetic observations on the one hand, and the potential for creation of novel computational tools on the other. By bringing together experts in these different areas to share ideas and begin a dialog, this workshop should serve as a catalyst for new collaborations and insights, along with the development of algorithms that will enable us to discover novel ways in which information affecting genetic variation and regulation is represented within genomes. In addition, this meeting is designed to facilitate productive interactions between early-career scientists, including graduate students, and the leading researchers at the conference.
The Conference on Effects of Genome Structure and Sequence on the Generation of Variation and Evolution was held August 9-11, 2011 at the Center for Discrete Mathematics and Theoretical Computer Science (DIMACS) at Rutgers University in Piscataway, New Jersey. The classic description of DNA is as a sequence of letters (A, G, C, T) representing "nucleotide" bases (called adenine, guanine, cytosine, thymine). The conference was based on the observation that the structure of DNA is not the same everywhere, but rather varies along its sequence, sometimes dramatically. Such variation in structure leads to variations in accuracy when DNA is copied (replicated) and repaired that are dependent on where in the sequence such operations take place. Thus, the probability of distinct classes of mutations varies along a DNA sequence and this has implications for evolution, because natural selection acts on variation that is inherited. Sequences with high likelihood of mutation have, in fact, evolved in certain genome regions where increased diversity in a population favors survival, such as in genes affecting pathogen coats and in our own immune system. The fidelity of DNA replication and repair is both dependent on the sequence and affected by the activities of multiple enzymes (which can be induced by environmental or cell-type specific factors). Furthermore, it is becoming increasingly clear that information is represented in DNA in forms that are not obvious when DNA is analyzed as if it were comprised of a sequence of actual letters, A, T, G and C, the traditional way of looking at it. Often it is the structural arrangement (conformation) of DNA (or RNA), physical chemical properties or the relationship among sequences that carries the information. This conference brought together a broad interdisciplinary group of researchers to explore the impact of increasing understanding of DNA structure, repair, replication, and organization on interrelated subjects ranging from evolution to dependence of the probability of different types of mutations on environmental and sequence context, to non-standard forms of information representation in genomes. One of the goals of the conference was to create an environment in which a broad interdisciplinary group of researchers could explore these issues through discussion and creative speculation. Although not enough time has passed for there to be specific scientific findings resulting from the conference, several collaborations and new research projects have begun. For example, a conference attendee from the National Institute on Aging, NIH, has re-established a collaboration with a colleague from the University of Pennsylvania on a "small molecule inhibitor of activation-induced Deaminase," which could be of use against tumors. Conference attendees from the University of Michigan and University of California-Davis have initiated a collaboration on analysis of regions of DNA that may melt easily, possible initiation sites for gene diversification. Several participants have new plans that affect student training and development as a result of attending the conference. A participant from George Washington University is incorporating into an immunology course syllabus new material regarding DNA sequence and structure learned at the conference. This conference was meant to enable people to access information from other fields, an important goal for many mathematicians and computer scientists who do not have training in biochemistry and thus would find it challenging to gather this perspective from the literature. There was a strong presence of mathematicians and computer scientists, especially early stage investigators, who have an interest in collaborating with biologists. The recruitment of scientists with the computational skills to analyze DNA sequences as something other than "letters" is anticipated to provide deep insights into genome structure, organization, function and evolution. There was a concerted effort of outreach to early career investigators, and, in particular, scholarships were given to a number of them. Nearly half of the speakers were women, and a significant number were postdoctoral fellows and Assistant Professors. There were a great many graduate students at the conference, both from local institutions and elsewhere, as well as some of the undergraduate students from DIMACS' large summer Research Experiences for Undergraduates program. A graduate student (Kathy Xie) was among the invited speakers, and proved to be a superstar. Conference organizers are coordinating preparation of two invited conference reports/perspectives articles, one for the journal DNA Repair, the other for the Journal of Experimental Zoology. We are also editing a conference volume that will be published by New York Academy of Sciences, and developing a website, with videos of the conference, which will be at adaptivegenome.net. A few years ago, the Cancer Institute of New Jersey (CINJ) became a full partner of DIMACS. Working closely together to plan and organize this conference has helped to cement this relationship among computer scientists and mathematicians, biologists, and medical practitioners.