GWAS was initiated with the purpose of identifying new genes, called AIDS restriction genes (ARGs), which influence the likelihood of HIV infection and the time of progression to AIDS. Such a study does not require prior biological knowledge for the identification of new ARGs. Over the last two decades, the Laboratory of Genomic Diversity (LGD) has obtained samples from several patient cohorts that were selected and characterized from particular at-risk groups (men who have sex with men, hemophiliacs, intravenous drug users, and others). In many cases, lymphoblastoid cell lines have been established as a source of renewable DNA. We have identified over 6,000 subjects from these cohorts who are highly informative for HIV infection or progression hypotheses. Currently, 1,808 of these subjects have been interrogated for 906,600 single nucleotide polymorphisms (SNPs) on the Affymetrix 6.0 genotyping platform. In addition, 744,000 probes distributed across the genome are being used to detect copy number variation (CNV). We have developed bioinformatics software that can assess the very large data sets produced by this study. Prior to performing genetic association analyses, we evaluate genotype quality by determining: 1) the call rate on both a per person and per SNP basis, 2) accuracy based on an internal replication control design, 3) consistency with Hardy-Weinberg equilibrium, and 4) consistency with Mendelian inheritance. We perform tests for genetic association with infection or rate of progression using categorical analysis or by survival analysis using the Cox proportional hazards model. Change in log viral load and square root CD4 decline are estimated using a linear mixed effects model with random effects included for each individual to account for the longitudinal nature of the data. Dominant, additive, and recessive models are explored for each SNP. Genetic association analysis of AIDS and other complex diseases is both strengthened and complicated by the existence of multiple outcomes. In particular, these complicate correcting for multiple tests in large scale association studies since associations with different outcomes represent multiple correlated tests. We have developed a method using principal component analysis on permuted association data (""""""""PCPD"""""""", Principal Components of the Permuted Data) that replaces the correlated outcomes with a set of independent outcome principal components. Further, this method generates a single P-value estimate for the overall genetic association. In order to reduce the false positive associations due to population substructure, our study populations are tested for population genetic homogeneity by using the STRUCTURE algorithm and our genetic association analyses are corrected for population stratification by the EIGENSTRAT method. We have developed software tools for the visualization of the large number genetic association results produced by a GWAS. These include ARG-ARRAY which displays statistical significance in a heat map format, ARG-BROWSER which shows candidate ARGs in their genomic context, and the ARG-HIGHWAY program which allows inspection of several million gene association tests spanning large genome segments dispersed across each of the 24 human chromosomes as well as mitochondrial DNA.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Intramural Research (Z01)
Project #
1Z01BC010317-10
Application #
7732997
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
10
Fiscal Year
2008
Total Cost
$1,297,476
Indirect Cost
Name
National Cancer Institute Division of Basic Sciences
Department
Type
DUNS #
City
State
Country
United States
Zip Code