There is increasing evidence that variations in non-coding sequences that regulate gene expression play an important role in human disease. However, the identification of non-coding, regulatory polymorphisms contributing to disease is limited by our inability to identify which variants reside within gene regulatory sequences. Multiple recent studies indicate that highly conserved non-coding regions identified by comparative sequence analysis often possess gene regulatory activity. Thus, sequence conservation alone can prioritize noncoding sequences for gene regulatory function. While it is reasonable to expect that variations in highly conserved non-coding regions are more likely to be deleterious, very little research has been directed to validate this hypothesis in clinical populations. Accordingly, the goal of this proposal is to investigate the utility of comparative genomics for the prioritization of functional non-coding sequence variations in several disease models and to build computational resources enabling the employment of this strategy by clinical investigators. To accomplish this, we will first classify non- coding sequence variations based on a range of comparative genomic criteria to estimate their likelihood of deleteriousness. Collaboration with i2b2 clinical investigators will enable us to test the validity of this classification in a clinical study of asthma. Building on the success of VISTA, our suite of comparative genomic tools already widely used by the biomedical community, we will develop a user-friendly """"""""clinical comparative genome portal"""""""" to automate this process. Finally, integration of the portal into i2b2's Hive architecture and clinical research chart will enable clinical investigators to exploit the synergies of computational and clinical sequence-based data in their analysis of i2b2 rich clinical databases and to accelerate the translation of clinical genetic data into functional insights. To facilitate the dissemination and adoption of the comparative portal, in coordination with i2b2 we will organize online and offline training activities specifically targeted at clinical users. Project Narrative Genetic studies of common human diseases will become increasingly frequent in the near future, thanks to advances in genomic science and sequencing technologies. Common diseases, such as diabetes and cardiovascular disease, are among the major causes of human morbidity and mortality. Novel computational approaches are needed to help translate the genetic information obtained in such studies into functional information that can be used to understand the mechanisms of disease and develop new diagnostic and therapeutic approaches. In this proposal, we will use our expertise with comparative sequence analysis to develop a computational platform to help integrate computational and clinical data based on the sequence of the human genome and accelerate the translation of genetic findings into functional and medical insights. This platform will be widely available to clinical investigators through the NIH-supported i2b2 clinical research chart.

National Institute of Health (NIH)
National Heart, Lung, and Blood Institute (NHLBI)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-BST-E (50))
Program Officer
Larkin, Jennie E
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Lawrence Berkeley National Laboratory
Organized Research Units
United States
Zip Code
Yang, Song; Oksenberg, Nir; Takayama, Sachiko et al. (2015) Functionally conserved enhancers with divergent sequences in distant vertebrates. BMC Genomics 16:882
Sulakhe, Dinanath; Balasubramanian, Sandhya; Xie, Bingqing et al. (2014) High-throughput translational medicine: challenges and solutions. Adv Exp Med Biol 799:39-67
Dubchak, Inna; Munoz, Matthew; Poliakov, Alexandre et al. (2013) Whole-Genome rVISTA: a tool to determine enrichment of transcription factor binding sites in gene promoters from transcriptomic data. Bioinformatics 29:2059-61
Lukashin, Igor; Novichkov, Pavel; Boffelli, Dario et al. (2011) VISTA Region Viewer (RViewer)--a computational system for prioritizing genomic intervals for biomedical studies. Bioinformatics 27:2595-7
Liao, Katherine P; Cai, Tianxi; Gainer, Vivian et al. (2010) Electronic medical records for discovery research in rheumatoid arthritis. Arthritis Care Res (Hoboken) 62:1120-7