We address the problem of inferring the evolutionary history of a set of natural languages. The methodology we propose can be used when languages are described by qualitative characters, where qualitative characters are functions (based upon linguistic properties, such as sounds and lexical items) which define partitions of the set of natural languages into distinct equivalence classes. Evolutionary trees are also inferred for biological taxa described by qualitative characters, and perhaps the most popular method for constructing evolutionary trees in biology is based upon the parsimony criterion. We have developed a methodology for encoding linguistic information as qualitative characters and efficient algorithms for constructing the most parsimonious trees for linguistic data which promise to provide robust and significantly more informative trees than can be obtained using any of the previous methods to date. A direct consequence of the methodology we are developing is the ability to test hypotheses; in particular, this methodology enables the linguist to determine whether the particular linguistic information is evolutionarily relevant, and whether the interpretation of the information is correct. We will produce algorithms and software which will quickly generate the most parsimonious trees for a given data set. These methods have the promise of greatly improving the ability of historical linguists to derive evolutionary trees because of the following: Character data has not been fully used by historical linguists, who have instead had to rely upon distance data or only those characters which are completely directed. Character-based methods, since based upon primary data. provide information about evolutionary history that cannot reliably be obtained through the use of distance data (distances are compressions of multi-dimensional information into a single dimension). The methods we will develo p will probably return optimal trees in polynomial time, and will also determine those aspects of phylogenetic trees which hold for all solutions to the input.

Agency
National Science Foundation (NSF)
Institute
Division of Behavioral and Cognitive Sciences (BCS)
Application #
9512092
Program Officer
Catherine N. Ball
Project Start
Project End
Budget Start
1995-08-01
Budget End
1999-07-31
Support Year
Fiscal Year
1995
Total Cost
$161,000
Indirect Cost
Name
University of Pennsylvania
Department
Type
DUNS #
City
Philadelphia
State
PA
Country
United States
Zip Code
19104