An extraordinary explosion of new information on the human genome is becoming available from scientific research. One of the many facets of this enterprise is the characterization of the variation in gene frequencies that exists among human populations. Gene frequency variation among the world's populations has arisen from the net effect of all the genetically relevant chance and systematic events in the history of those populations past and present population sizes, the relative degree of endogamous and exogamous mating practices and how they varied over time, natural selection dynamics as the disease environment and climate changed, and the numerous subdivisions and mergers of ancient populations as humans spread around the world. The knowledge scientists are accumulating of the extant patterns of genetic similarity among human populations provides useful information to the many different areas where researchers are working to understand the roles those diverse factors have played in human evolution. This includes human relationships with other species from microbes to the other great apes as well as the more recent relationships among the human populations of the world as reflected in written histories.

The Allele Frequency Database (ALFRED), created at Yale University in 2000 with support from NSF, makes data on human genetic variation broadly available for research and education. This award will support 3 specific aims supporting the overall goal of integrating into ALFRED the data just now becoming available from various research teams studying enormous numbers of human single nucleotide polymorphisms (SNPs) on a few to as many as several dozen populations from around the world. The number of SNPs studied in these 'very high throughput' or VHT projects ranges from several hundred thousand to about 2 million SNPs distributed across the human chromosomes. The first specific aim focuses on greatly expanding the contents of ALFRED by automating the curation process sufficiently to facilitate the addition of these very large data sets. Other data from the scientific literature will continue to be incorporated with priority given to SNP results from populations not already represented in the database. The second aim of this project enhances the educational and research value of ALFRED by improving the Geographic Information System (GIS) interface, especially by allowing greater flexibility in the graphical display of data. The third aim is to enhance the user-friendly interface generally in order to help users access the vastly expanded contents. New and more flexible methods will be created to facilitate assembling and downloading useful subsets of the huge multi-population and multi-marker datasets. These improvements will also include expanding the explanatory text and static graphics for educating the users about the use of the database as well as background information on the nature of the information contained in the database.

Currently, ALFRED is a fairly large database with almost 280,000 gene frequency tables based on more than 650 human populations studied very unevenly on more than 14,700 polymorphisms along with detailed population and marker descriptions and links to other informative databases. The addition of the data from the VHT projects will vastly increase the scale of the database on the polymorphism dimension requiring a variety of adjustments to help users as outlined in specific aims 2 and 3 in order to take advantage of the wealth of new information. These improvements and expansions to ALFRED will magnify the value of this resource for research and education in anthropological genetics and many other interdisciplinary sciences (such as archaeology, demography, linguistics, forensics, ethnography, and medical research) that already make use of the database. ALFRED thus will be strengthened in the service of a variety of functions for education and for interdisciplinary research. The enhanced GIS interface will help summarize information for various emerging new disciplines such as geographical genetics and existing disciplines such as genetic epidemiology. ALFRED will be able to provide even more reference gene frequencies for comparison with new data sets that researchers develop. The database can also assist in the planning of future studies by helping researchers focus on combinations of genetic markers and population samples that can test various research questions or to identify gaps in our knowledge that need to be filled in by collecting new datasets.

In the broadest sense, the expansion of ALFRED supports a wide array of disciplines and educational efforts aimed at providing a better understanding of our biological history as a species. Additionally, one of the most effective means to combat the misuse of genetic information is to make data regarding genetic variation in our species widely available through facilities like ALFRED.

Agency
National Science Foundation (NSF)
Institute
Division of Behavioral and Cognitive Sciences (BCS)
Type
Standard Grant (Standard)
Application #
0840570
Program Officer
Carolyn Ehardt
Project Start
Project End
Budget Start
2008-09-15
Budget End
2010-08-31
Support Year
Fiscal Year
2008
Total Cost
$200,000
Indirect Cost
Name
Yale University
Department
Type
DUNS #
City
New Haven
State
CT
Country
United States
Zip Code
06520