As we accumulate knowledge, our understanding of the genome continues to evolve. We now realize that the 99% of the genome that does not code for proteins, what was once thought of as 'junk DNA', has important functional roles. This understanding has important implications in the analysis and interpretation of high dimensional genomics analysis. It is rare that a study is lucky enough to find significantly associated variant that lie within an exon of a protein coding gene that is biologically related to the phenotype. It s more common to identify a region of interest that is intergenic, or even in a gene desert. The implication of these associations is not directly obvious and often requires extensive bioinformatics analysis to even begin to understand the possible underlying biological mechanisms. To effectively capture our greater understanding of the relationship between coding and non-coding variants with complex disease, we must be able to accurately connect those variants with their biological annotations. This application proposes to build on my current K01 research of mapping SNPs to protein coding genes to capture these other features by accomplishing the following specific aims:
Aim 1 - Capturing non-protein coding 'genes'. In this aim we will identify non-protein coding genes and define their boundaries.
Aim 2 - Map variants in gene associated regions to the corresponding genes. Phenotypes are not controlled by genic sequences alone. In this aim we will identify and map the non-genic portions of the chromosome which can influence the expression of coding genes.
Aim 3 - Expanding beyond physical boundaries. Expanding feature boundaries to account for LD regions will allow for researchers to capture genomic features that would not be identified otherwise. It is vitaly important that any bioinformatics workflow follows the principles of reproducible research, particularly when utilizing database driven resources.
These aims will be accomplished by exploiting various database repositories and presenting the compiled information in a user friendly interface.

Public Health Relevance

Currently the interpretation of whole genome analysis tends to focus on protein coding genes. As our knowledge expands, we are beginning to understand that the rest of the genome has important functional roles. This proposal will provide a tool that will enable researchers to easily access, organize, and use information located in numerous ever expanding databases.

National Institute of Health (NIH)
National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK)
Small Research Grants (R03)
Project #
Application #
Study Section
Diabetes, Endocrinology and Metabolic Diseases B Subcommittee (DDK)
Program Officer
Podskalny, Judith M,
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Alabama Birmingham
Biostatistics & Other Math Sci
Schools of Public Health
United States
Zip Code
Vaughan, Laura K; Srinivasasainagendra, Vinodh (2013) Where in the genome are we? A cautionary tale of database use in genomics research. Front Genet 4:38