Genome wide association studies are powerful for correlating human genotype to phenotype. These studies are designed to identify the polymorphisms in the genetic code that are most predictive of a phenotype. Rapid advances in genotyping technologies enable comprehensive coverage of the genome, including a majority of intergenic polymorphisms. Interestingly, when included in the association analysis, non-coding polymorphisms are often the most highly predictive of the phenotype. Furthermore, Single Nucleotide Polymorphisms (SNPs) are inherited together in Linkage Disequilibrium (LD) blocks. As a result, identifying the causative SNP in an LD block mapping to non-coding regions of the genome remains a contemporary computational and experimental challenge in the field of genomics. Although non-coding regions of the genome are not translated into protein, they are in a majority of cases transcribed in RiboNucleic Acid (RNA). Since RNA is a single stranded polymer, it will fold and the higher-order structures it adopts are integral to numerous RNA-mediated post-transcriptional regulatory functions in the cell. In detailed and focused studies of individual transcripts, our team has discovered that disruption of RNA structural features in non-coding regions of transcribed RNAs are causative in at least three human disease states - hyperferritinemia cataract syndrome, retinoblastoma and cartilage hair hypoplasia - and that altered RNA structure determines hepatitis C virus clearance efficiency. The vision of this proposal is to improve our computational ability to predict RiboSNitches (structural features in RNA that are disrupted by a SNP) by improving the accuracy of ensemble suboptimal structure sampling and pseudoknot prediction, and by using chemical structure probing data to characterize allele-specific RNA conformations, both in vitro and in healthy living cells in vivo. Ultimately, this work will substantially improve our ability to predict the causative disease-associated SNP in an LD block mapping to non-coding, intergenic regions of the human genome.
Approximately two percent of the human genome encodes for proteins, which are the building blocks of our cells. Studies that associate phenotype (e.g. risk for developing a disease) with individual genetic codes often identify mutations in the 98% of the genome that does not code for proteins. Much of our genome is however transcribed in RiboNucleic Acid (RNA) and this proposal aims to determine how structures in this messenger of genetic information are affected by mutations to predict which of them cause human disease.
Showing the most recent 10 out of 14 publications