Genome-wide association (GWA) analysis promises to allow the discovery of new genes that influence susceptibility to HIV infection as well as the rate of progression to AIDS in infected patients without prior biological knowledge. Such AIDS restriction genes (ARGs) have previously been identified by a candidate approach. However, since a causal or """"""""operative"""""""" SNP may not be included in a set of GWA markers, a major challenge for GWA studies involves the efficiency of LD with adjacent """"""""proxy"""""""" SNPs to allow detection of """"""""operative"""""""" SNPs. To investigate this issue, we examined how well adjacent SNPs and multi-SNP haplotypes in the region would perform in identifying known ARGs in a well-characterized study population. We designed a pilot study where 306 SNPs spaced at a 15-18 kb density across the regions of eight previously validated ARGs were genotyped and tested for association with different stages of HIV/AIDS disease. SNP genotypes were assessed among 2,139 subjects at risk for AIDS from the epidemiological study cohorts originally used to discover the ARGs. A set of computational tools was developed to allow the processing of data sets containing millions of genotypes. ARGANALYSIS performs a large number of statistical tests by invoking macros that call Statistical Analysis Software (SAS) procedures and store results in SAS data sets. Both categorical data analysis and survival analysis can be performed. Common odds ratios for tables with more than two rows or columns are computed using logistic regression. The current version of ARGANALYSIS is very flexible since the actual tests to be performed can be specified in an """"""""include"""""""" file without modifying the program code itself. It has been successfully used to analyze haploid genotypes from mitochondrial and Y chromosome markers. BLOCKHEAD infers multi-SNP haplotypes using the EM algorithm. ARGARRAY displays P-values in arrays of color-coded squares where rows correspond to the marker and columns correspond to the test while ARGHIGHWAY displays P-values using the height of vertical bars and odds ratios or relative hazards through the color of the bars. ARGRANK graphically compares the significance of and strengths of associations with those of other markers tested. A six-page report with detailed analysis results is produced by the ARGTRACKS program. ARGBROWSER displays analysis results in their genomic context. Using these software tools, we found that the proxy SNP approach works remarkably well in revealing operative SNPs by capturing intrinsic LD around them and estimate a discovery success rate of 50-75% of operative SNPs in a blind genome scan with the marker density and number of subjects used by this pilot study.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Intramural Research (Z01)
Project #
1Z01BC010317-09
Application #
7592661
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
9
Fiscal Year
2007
Total Cost
$1,079,674
Indirect Cost
Name
National Cancer Institute Division of Basic Sciences
Department
Type
DUNS #
City
State
Country
United States
Zip Code