Nextgenerationgenomescalesequencingofpatientsisnowbecomingroutinefortwoclassesofdisease:rare Mendeliantraitsandcancer.Infavorablecases,thesedataallowidentificationofrelevantmutationsandthus aiddiagnosisandtherapy.Inbothclassesofdisease,themostcommontypeofmutationismissense-single base changes that result in an amino acid substitution in a protein. Uncertainty as to the impact of these mutationsoninvivoproteinactivityhasresultedinaveryconservativeapproachtotheirinterpretationinthe clinic, so causing many missed opportunities for targeted treatment. The goal of this project is to use a combinationofthreestrategiestomaketheinterpretationofthesemutationsmuchmoreapplicableintheclinic. Therearealreadyalargenumberofcomputationalmethodsthatattempttodeterminetheimpactofmissense mutationsonfunction,andthereissubstantialevidencethatthesehaveusefulaccuracy.Theprimarydifficulty isthattheaccuracyinanyparticularcaseisnotreliablycalibrated.Therefore,ourfirstaimistouseacombination ofthesemethodstodevelopanapproachfocusedonmorereliableestimatesfortheprobabilityofhighimpact on protein function (i.e. more confident P values).
The second aim i s to maximize the utilization of three- dimensionalstructuralinformation,largelyignoredbymostcomputationalmethods.Alargefractionofmissense mutationsintheseclassesofdiseaseactbydestabilizingproteinstructureandknowledgeofstructureallows thesetobeidentifiedwithmuchhigherreliability.Also,structureprovidesaframeworkfordetailedannotation andcomprehensionoffunction.Tofacilitatetheutilizationofstructure,wewillimplementamodelingplatform thatleveragesavailableexperimentalinformationtomaximizethestructuraldataavailableforanalyzingmutation impact. An important aspect of the platform is incorporation of methods for evaluating the reliability of the structuralfeaturesrelevanttoanalysisofeachmutation.
Inthe thirdaim wewillbuildspecificfunctionalmodels foreachproteinofinterest,integratinginformationfromcurrentdatabases,theliterature,andcommunityinput, soastoprovidetherichestpossiblebackgroundagainstwhichtojudgetheimpactofmutations.Proteopedia,a wellestablishedmediawikiforproteins,willbeusedtoprovideanintegratedviewoftext,data,andstructure.A keycomponentoftheinformationresourcewillbecontributionsfromcurators,whowillprovideannotationand alsosolicitinputfromotherexperts.Thisaspectoftheprojectbuildsonexperiencewithothercrowdsourcing endeavors, including CASP, CAGI and Proteopedia. There will be three primary outcomes from the project: First,improvedreliabilityfortheinterpretationofmissensemutations.Second,aprototypemutationannotation proceduresuitableforuseinaclinicalsetting.Third,theresourcewillprovideinformationofbenefittoarange ofotherscientists,thusfacilitatingtheanalysisofdiseaserelatedmutations.
Genome scale DNA sequencing is now contributing to diagnosis and therapy in cases of rare human diseaseandcancer.Fullexploitationofthesedataiscurrentlyhamperedbyinadequateunderstanding ofwhichDNAchangesaffectproteinfunctionsoastocontributetodisease.Thisprojectaimstodevelop themethodsandtoolsneededtoremovethatobstacle.
Yin, Yizhou; Kundu, Kunal; Pal, Lipika R et al. (2017) Ensemble variant interpretation methods to predict enzyme activity and assign pathogenicity in the CAGI4 NAGLU (Human N-acetyl-glucosaminidase) and UBE2I (Human SUMO-ligase) challenges. Hum Mutat 38:1109-1122 |
Kundu, Kunal; Pal, Lipika R; Yin, Yizhou et al. (2017) Determination of disease phenotypes and pathogenic variants from exome sequence data in the CAGI 4 gene panel challenge. Hum Mutat 38:1201-1216 |
Pal, Lipika R; Kundu, Kunal; Yin, Yizhou et al. (2017) CAGI4 SickKids clinical genomes challenge: A pipeline for identifying pathogenic variants. Hum Mutat 38:1169-1181 |