The over-arching goal of this project is to address several major challenges to biologic interpretation, functional validation, and clinical translation of genetic association findings for quantitative red blood cell traits and non- malignant blood cell disorders in the post-genomic era.
In Aim 1, we will apply state-of-the-art statistical genomic and computational tools to extremely large human multi-ethnic population-based datasets containing hundreds of thousands of individuals with red blood cell traits (hemoglobin, hematocrit, RBC count, MCV, MCH, MCHC, red cell distribution width or RDW) and whole genome sequence (WGS) data (the NHLBI TOPMed WGS project) or GWAS data (Blood Cell Consortium or BCX and UK Biobank) to provide updated analysis, discovery, and interpretation of results for common, low-frequency, and rare genetic variants associated with red blood cell counts and indices.
In Aim 2, validation of new red blood cell phenotype-associated genomic loci and genetic variants will occur through a combination of imputation and replication in independent data sets (using TOPMed WGS as imputation reference panel), and/or de novo genotyping or sequence analysis of selected phenotypic samples or pedigrees. We will also provide functional annotation, fine-mapping, and prioritization for new and existing red blood cell trait-associated variants and genes, with an emphasis on new blood cell lineage-specific epigenomic, transcriptomic, and 3D genomic resources, including those becoming available through TOPMed and BLUEPRINT projects.
In Aim 3, we will perform functional, cell-based analyses of selected non-coding genomic loci/ variants (~50 per year) identified in Aims 1 and 2 (particularly those that alter canonical transcription factor motifs and demonstrate clinical impact through PheWAS or co-segregation with phenotypic extremes in pedigrees) utilizing a combination of massively parallel reporter assays (MPRA) and CRISPR/Cas9 genomic perturbation to interrogate non-coding genetic variation and thereby provide comprehensive and predictive assessments of regulatory non-coding variation and function. We will disseminate all genomic, annotation, and functional information derived from Aims 1, 2, and 3 to ensure knowledge dissemination to the clinical and scientific community, for discovery, fine-mapping, and investigation of causal genes that underlie red blood cell traits and hematological disorders.
This project will lead to improved insight into the genetic basis of hematologic traits and red blood cell disorders. Finding the risk factors and causes of these disorders or traits will lead to new insights into why they occur, and, potentially, how they can be treated. Our project will create a renewable resource for the scientific community for research into human red blood cell production and how this goes awry in disease.