Since it was first made available in 1998, the software program EUGene has become a useful tool for scholars who analyze international conflict and crisis data by providing simple to use tools for dataset construction. The software development laid out in this proposal will further expand the EUGene program, broadening its application into a tool for developing international relations, comparative politics, and comparative political economy data. Continued program development will focus on a complete redesign of the data management engine to allow for the development of new, more flexible datasets and research designs that better fit contemporary research needs.
The program makes routine a set of data preparation tasks that are cumbersome and difficult, and keeps track of critical research design choices made by users. This facilitates more advanced research and theorizing by scholars by freeing them of the necessity to perform technical data manipulations. AS in coding previous versions, the PI's integrate graduate and undergraduate student programmers into the project as well as supervising contracted professional software engineers. EUGENE will benefit both experienced researchers as undergraduate and graduate students in training.
In this expansion, the focus is on five key new additions that will advance future scientific understanding and discovery, while building on existing work. The software will be freely distributed with a point and click user interface. The development team will add five new program attributes to broaden the impacts of the exiting software. 1. Develop an observational format that relaxes the current restrictions of monadic or dyadic data allowing for k-adic research designs. Using dy-adic data to analyze what are, in reality, k-adic events leads to model misspecification and, inevitably, substantial statistical bias. 2. Allow users to specify the unit of observation. Users will also be able to select the temporal element, e.g. daily, weekly, monthly, quarterly, or annual data. Users will be able to choose the units including the country, the IGO, the terrorist group, MID, or any user specified non-state actor. 3. Develop an expanded sampling engine to accommodate the greatly increased population of observations the new data structures will generate. 4. Build in an expanding scaling and measurement engine that allows users to apply principle component analysis and factor analysis towards the building of a wide range of scales and indices such as NOMINATE and S scores. 5. Develop the data handling routines needed for spatial regression analysis, to control for spatial interdependence in the user's data.
In addition EUGene 4.0 will include the most recent releases of an expanding variety of pre-existing datasets.
The study of international relations requires comprehensive and easily manipulable data sets. Data must be available on the variables of interest, and those data must be manipulated into a suitable format to allow the inclusion of appropriate control variables as well as variables of central theoretical interest. NewGene is software designed to eliminate many of the difficulties commonly involved in constructing large international relations data sets. NewGene is a stand-alone Microsoft Windows and Apple Osx based program for the construction of annual, monthly, and daily data sets on countries, leaders, organizations, etc. It also provides users the ability to create datasets full of ``k-ads’’, where k can equal 1 (e.g. country-year), k can equal 2 (e.g. country1-country2-year), or k can equal some number greater than 2 (e.g. country1-country2-country3…-country k-year). NewGene provides a highly flexible platform on which users can construct international relations data sets using pre-loaded data or by incorporating their own data. It accomplishes this by automating a variety of tasks necessary to integrate several data building blocks commonly used in tests of international relations theories. NewGene users specify the type of data set they would like to create by selecting from a series of menus. The program assembles the data according to these user specifications and outputs it for analysis in other statistical packages. The output is in comma-delimited .csv format, which can be imported into any of the host of statistical software programs. Users of NewGene do not need to be able to write a single line of computer code in order to merge data, read data from input files of varying formats, or convert data into common units of analysis. By reducing the time necessary to carry out routine data set construction tasks, NewGene allows users to proceed more rapidly to the analysis stage, and allows scholars to spend more time on theory development and on asking new research questions than on data management. NewGene is available as freeware at www.newgenesoftware.org.