One of the paradoxes of modern genetics is the contrast between the tremendous technological advances in sequencing and genotyping during the past decade and the slow progress in identifying genes for complex diseases. These diseases involve subtle disruptions of biochemical and developmental pathways and display substantial genetic heterogeneity and gene-by-gene and gene-by-environment interactions. In response to these challenges, geneticists are collecting much larger samples and genotyping enormous numbers of SNPs (single nucleotide polymorphisms). To handle the massive increases in data flow and extract the maximum amount of information from available data, better statistical analysis tools must be made available to the human genetics community. The current grant supports construction of new statistical methods and their translation into user friendly software via the widely distributed program Mendel. Under the auspices of the grant, we will tackle a series of related projects on computational statistics, association mapping, estimation of DNA copy numbers, population genetics, and software for managing and displaying human pedigree data. Our research in computational statistics revolves around three classes of optimization algorithms - MM and EM algorithms, block relaxation methods, and lasso penalized estimation. We will apply these methods to estimation in random graphs, nonnegative matrix factorization, and multicategory discriminant analysis. These methods are also pertinent to fast logistic regression with case-control data and fast mapping of QTLs (quantitative trait loci). We further plan to develop fast tests of association based on contingency tables, robust testing procedures for multivariate traits, and algorithms for modeling gene-by-gene and gene-by-environment interactions. Our efforts on copy number variation will focus on penalized estimation of DNA copy number by signal intensity, and hidden Markov modeling of copy numbers from the Illumina genotyping platform. In population genetics we will develop methods and software for testing Hardy-Weinberg equilibrium in pedigree data, penalized estimation of haplotype frequencies, and estimation of ethnic admixture. Finally our software development efforts will concentrate on making Mendel more conducive to dense, genome-wide SNP data, including: parallelization of the existing Mendel code;restructuring of the data structures in Mendel;making it easier to run complete analysis routines within Mendel;and perfection of MendelPro, the graphical user interface to Mendel. This ambitious agenda is all part of our coherent effort to provide a single platform for managing, displaying, and analyzing genetic data. This kind of software infrastructure is necessary if genetic epidemiology is to move rapidly forward in the twenty-first century.
Showing the most recent 10 out of 156 publications