Cleft lip with/without cleft palate (CL/P) are among the most common congenital birth defects with a prevalence of around 1/700 live births in the world. Mouse and human genetic studies, as well as the epidemiology, have identified a wide array of genetic and environmental risk factors for CL/P. However, the etiology of CL/P has not been fully understood yet because of the complexity of genetic and environmental risk factors as well as gene-environment interactions. The mouse model is one of the most frequently used and well-established animal models to study the mechanism of craniofacial development. The FaceBase Consortium funded by NIDCR has been generating a large number of datasets since 2009. This includes many genomic datasets such as mRNA, microRNA (miRNA), and enhancer data, which covers mouse lip and palate development (e.g. embryonic days E10.5 to E14.5). While such data provides an unprecedented opportunity to uncover the molecular cascades that control craniofacial development and link to various diseases and phenotypes, there have been main challenges and technical issues that strongly limit the interrogation of the data across the domains, mouse strains, or multiple FaceBase datasets. Specifically, the currently available FaceBase data were generated from different mouse strains (C57BL/6J, 129S6, and CD-1) by using different platforms (Illumina HiSeq 2000, 2500, Affymetrix Mouse 430 2.0, and Affymetrix Mouse Exon 1.0ST) for different tissues at different embryonic days. To address these challenges and improve our understanding of regulatory mechanisms involved in CL/P, we propose three specific aims in this proposal.
In Aim 1, ! we will evaluate mRNA and miRNA expression between mouse strains C57BL/6J and 129S6 through a modified Nonnegative Matrix Factorization (NMF) algorithm, which can effectively adjust the biases from various FaceBase datasets. The information we learn will serve as an important guidance on data integration between genomic platforms and between mouse strains, leading to reproducible research.
In Aim 2, we will identify long non-coding RNA (lncRNA) expression information by reusing FaceBase microarray and RNA-sequencing data, adding an important category of non-coding RNA information to the FaceBase consortium. We will next conduct bioinformatics curation for regulatory relationships among transcription factor, miRNA, lncRNA, and protein- coding gene for CL/P genes in mice.
In Aim 3, we will distribute the gene expression (mRNA, miRNA, lncRNA) and their regulatory annotations to the research community through our CleftGeneDB database and the FaceBase Portal. The successful completion of this project will provide investigators with important guidance on secondary analysis of FaceBase genomic data across platforms and mouse strains as well as a comprehensive web resource for genes related to CL/P.
The FaceBase Consortium has recently generated many genomic datasets in mice in order to accelerate the understanding of craniofacial development biology. In this proposal, to maximize the use of FaceBase genomic data, we will first perform comprehensive evaluation of various FaceBase genomic datasets and functional annotations in mice and then distribute these processed and annotated data related to cleft genes to the research community through the CleftGeneDB database and FaceBase Portal. The successful completion of this project will provide investigators with important guidance on reanalysis of FaceBase genomic data across platforms and mouse strains, as well as a user-friendly, content-driven online resource for cleft lip and palate.