This application describes a project to maintain and develop a computer program. The IMa computer program is widely used by researchers for the study of divergence, including divergence as the normal part of population structure within species, and the divergence that happens during speciation. The program implements a statistical analysis of a rich model of the divergence process that includes parameters for times of population separation, gene exchange, and population size;and it does so for models and data sets for multiple populations. However the program runs on a single processor and it is too slow for newer data sets with very large numbers of loci. The program is also difficult for investigators to learn, and as new features have been added to it the source code of the program has become difficult to modify or for multiple researchers to work on together. Finally the current input and output file formats are unwieldy and difficult to work with when data sets are large and analyses are complex. Under the proposed project the IMa program will be rewritten to include new methods for large data sets and for running on multiple processors. The source code will also be redesigned to be object oriented, and then it will be rewritten in C++. A new graphical front-end program will be written to simplify the tasks of: data checking;input file assembly;model specification;starting jobs;and viewing runtime status. To facilitate assembly and checking of data files, and to simplify the process of starting jobs, a new format for input files and model specification will be rewritten to follow a simple XML (Extensible Markup Language) specification. To make program output more accessible the program will be revised to generate all charts and tables in a file that follows the SVG (Scalable Vector Graphics) XML standard.
This project will support the development of the IMa computer program that is used by many scientists to study how populations, including human populations as well as populations of other species have evolved and diverged. Knowledge of how and why human populations are differentiated is used for the identification of genes that are associated with adaptation and that carry alleles that contributed to disease. The current program runs slowly and is unwieldy for the large data sets that are being generated using the latest DNA sequencing technologies. Under the pro- posed project new methods will be implemented for handling very large data sets and for using multiple computer processors (parallelization). A new user interface and new formats for data files and results will also be implemented.