Introns are ubiquitous elements of eukaryotic genomes that are especially widespread in complex organisms. The use of introns is a double-edged sword for multicellular organisms. While introns allow for complex regulation, generation of a large number of products from a limited number of genes, and facilitate emergence of new activities via exon shuffling, they also require complex processing that can lead to serious problems when it goes awry. The origins and functions of introns has been a subject of controversy for over a quarter century. In this CAREER project, the PI and his students will create or improve several databases representing gene structures, and then begin to exploit them in collaboration with three laboratories that experimentally study non-coding RNA genes, gene expression, and alternative splicing. This project will thus integrate bioinformatics, bench research, and teaching.

The databases will include diverse information (location, sequence, signals, nucleotide composition, mutations, etc.) about millions of intron and exon sequences from species including unicellular eukaryotes, plants, invertebrates, non-mammalian vertebrates, and mammals. A user-friendly interface will allow a wide spectrum of queries concerning exon/intron gene structures. Computer programs that extract and process data from the databases will be developed. One set of programs, based on a comparative genomics approach, will predict, characterize, classify, and catalog non-coding RNA genes and various functional elements inside introns. Another set of programs will detect signals inside introns and exons that are important for precise splicing of primary RNA transcripts (pre-mRNA, etc.). Computer-predicted non-coding RNA genes of mammals will be experimentally verified using methods such as quantitative RT-PCR, northern blots, and large-scale hybridization with custom-generated spotted microarrays. Verification of predicted splicing motifs will be performed using two vector systems. One is the tau exon 9-11 model substrate from which a purine-rich exonic enhancer has been deleted. The second one is the caspase-2 exon 8-10 cassette from which an intronic suppressor has been removed.

A novel curriculum will be created for students having two different backgrounds, computer science and biology, training them in database generation and mining, advanced programming skills, and interpretation of complex biological networks. All software developed and its documentation will be freely available via the internet and will promote remote bioinformatic education.

Project Report

INTELLECTUAL MERIT Eleven articles in peer-reviewed journals and one book chapter, which are focused on the grant specific aims and linked scientific questions, have been published. Three more papers with results of great importance on the grant propositions are submitted or in preparation. Within these manuscripts the following outcomes are most important: We presented computational evidences about critical association of different types of non-coding RNA with introns (NAR 2011). A new conception ("Symbiotic Introns") was proposed for the role of introns in gene functioning and evolution. We conclude that there is a natural symbiosis between genes, introns, and ncRNAs—a symbiosis that is only just beginning to be discovered and properly appreciated. We developed novel algorithms (Binary Abstracted Markov Models) for sequence discrimination between exons and introns (NAR 2012). High-quality abstraction schemes for exon/intron discrimination were selected using optimization algorithms on supercomputers. With this approach, over 95% classification accuracy is achieved without taking reading frame into account. A large-scale bioinformatics analysis has been performed to understand the possible splicing mechanisms for extra-long introns in various vertebrate species. We predicted the number of stem structures within large introns, hypothesizing that periodic hairpins with stable stems and large loops may be a possible mechanism for pre-mRNA folding which could aid splicing efficiency (PLoS ONE, 2009). Alternative Splicing Mutation Database (ASMD) has been created that serves as a repository for all exonic mutations not associated with splicing junctions that measurably change the pattern of alternative splicing (BMC Res Notes 2008a and BMC Res Notes 2008b). An algorithm was developed to derive a Splicing Potential (SP) table from the ASMD information. This SP-table characterizes the influence of each oligonucleotide on the splicing effectiveness of the exon containing it. Our approach allows for computation of the cumulative SP value of any sequence segment and, thus, gives researchers the ability to measure the possible contribution of any sequence to the pattern of splicing. We developed approaches and computational tools for investigation of genomic sequence mid-range inhomogeneity and distributions of these sequence patterns inside introns and exons (BMC Genomics 2008). We revealed a very strong fixation bias for mutations, which helps to preserve mid-range inhomogeneity regions throughout the human genome during evolution (BMC Genomics 2009). In order to facilitate the investigation of possible involvement of snoRNAs in the regulation of pre-mRNA processing, we developed a new computational web resource, snoTARGET, which searches for possible guiding sites for snoRNAs among the entire set of human and rodent exonic and intronic sequences (Gene 2008). BROADER IMPACTS a) Mentoring students Graduated two PhD students, Samuel Shepard (2010) and Ashwin Prakash (2011), who have continued their scientific careers as post-doctoral fellows in the Center for Disease Control and Prevention (CDC), Atlanta, GA, and Johns Hopkins Medical School, Department of Bioengineering, respectively. Graduated four MS students, Jason Bechtel (2008), Theodor Rais (2009), Andrew McSweeny (2010), and Mariam Nabiyouni (2011). All of them have continued their careers in various areas of science and technology. Trained eight summer and volunteer students during the summer breaks of 2007-2011. [Mark McCreary (2005 and 2007), Tiara Heisey (2008), Aaron Walsh (2008), Ramya Yarlagadda (2009), Sam Choulet (2011), Lorraine Walters (2011 and 2012), Eugene Akkuratov (2011), Mark Pavlyukovskyy (2011)]. For this activity the PI received "Undergraduate Research Recognition Award, May 2012. b) Achievements of the students The PI, in co-authorships with his students, published nine peer-reviewed papers in international journals. One of them (Bechtel et al. 2008) has been cited in the latest edition of Lewin’s "Genes X" textbook (page 96). Another one (Rearick et al. 2011) has been cited in the Wikipedia on the major page about introns (http://en.wikipedia.org/wiki/Introns) among 26 principal manuscripts about exon-intron gene structure. The PI’s students received the following awards: Jason Bechtel, Outstanding MSBS student in 2008 at HSC UT. Theodor Rais, Second/Third Poster award by Ohio Bioinformatics Consortium, 2009. Samuel Shepard, Outstanding PhD student in 2010 at HSC UT. Lorraine Walters, Undergraduate Research Recognition Award, UT May 2012. c) Courses, reviews, and on-line research and educational resources The PI developed two new courses: in 2009 "Introduction to Modern Biomedical Databases", and in 2012 "Biomarker Discovery, Validation and Implementation". Created an online lecture "Genomic Entropy, an alternative view", for the general public available from the PI’s laboratory web page (http://bpg.utoledo.edu/~afedorov/lab/). This is a scientific response to a very popular Creationist’s book "Genetic Entropy" by Dr. J. Sanford. The PI has created and maintains five public Internet resources. Published an interview "Reexamining introns" to the International Innovation journal (May 2012, pp.60-62). Published a journal review and a book chapter about structure and evolution of mammalian genomes. Established a fruitful collaboration between HSC UT and Ohio Supercomputer Center for implementation of sophisticated computational algorithms into biomedical research and also teaching local students advanced computational technologies.

Agency
National Science Foundation (NSF)
Institute
Division of Molecular and Cellular Biosciences (MCB)
Application #
0643542
Program Officer
Martha Peterson
Project Start
Project End
Budget Start
2007-06-15
Budget End
2013-05-31
Support Year
Fiscal Year
2006
Total Cost
$672,056
Indirect Cost
Name
University of Toledo Health Science Campus
Department
Type
DUNS #
City
Toledo
State
OH
Country
United States
Zip Code
43614