The principal object of study for a core area of Linguistics is the individual speaker's knowledge of his/her native language, or "grammar." Given that a speaker's knowledge of grammar is unconscious, a challenge for the discipline is to develop reliable methodologies that uncover the right data and enhance replicability. "Parsed corpora" projects are an important component in an emerging methodology being used to uncover the syntactic patterns underlying speakers' use of language. These are texts, both written and spoken, which are annotated with detailed grammatical information, and then used as tools to test hypotheses about statistical tendencies in syntactic patterning. While there is a rapidly growing body of Germanic parsed corpora in the discipline, there have been equally important developments in parsed corpora in Romance.

With support from the National Science Foundation, a "Special Session on Parsed Corpora of Romance languages" will be held at the 43rd Linguistic Symposium on Romance Languages (LSRL43), April 17-19, 2013, at The Graduate Center of The City University of New York. As the largest annual gathering of linguists working on Romance languages, the LSRL affords the perfect occasion to make the Romance linguistics community aware of some of the most exciting recent advances in syntax, which are based on these innovative tools. The objective is to provide a focused discussion of how these parsed corpora can be used as tools by anyone in the discipline, and to thereby foster scientific activity. The Session will include three one-hour talks on both historical and synchronic parsed corpora in Romance languages: (1) the "Modelling Change: The Paths of French" corpus, presented by Anthony Kroch and Beatrice Santorini (University of Pennsylvania), (2) the "Syntax-oriented corpus of Portuguese dialects," presented by Ana Maria Martins (University of Lisbon), and (3) the "Tycho Brahe Parsed Corpus of Historical Portuguese," presented by Charlotte Galves (University of Campinas).

Project Report

A "parsed corpus" is a body of text which has been annotated with grammatical information. The building of parsed corpora has become a rapidly growing area of research activity in linguistics. The most well-known are the Penn Parsed Corpora of Historical English (Kroch & Taylor 2000; Kroch, Santorini, & Diertani 2004; Kroch, Santorini, & Diertani 2010), and there are other well-known Germanic parsed corpora. However, there have also been equally important developments in parsed corpora in Romance. Most notable in this regard are the following: (i) the MCVF corpus (Modéliser le changement: les voies du français / Modelling Change: The Paths of French), in development by Anthony Kroch, Beatrice Santorini, France Martineau, Paul Hirschbühler, and Marie Labelle; (ii) the Corpus Histórico do Português Tycho Brahe / Tycho Brahe Parsed Corpus of Historical Portuguese, in development by Charlotte Galves and Pablo Faria, of the University of Campinas, Brazil; (iii) the Syntax-oriented corpus of Portuguese dialects (CORDIAL-SIN), in development by Ana Maria Martins and Ernestina Carrilho, of the University of Lisbon. To understand, explore, and advance research in this area, this grant supported a Special Session on Romance Parsed Corpora, held on April 18, 2013. (See Special Session website at: lsrl43.commons.gc.cuny.edu/special-session-on-parsed-corpora-april-18/). The Special Session was part of the 43rd Linguistic Symposium on Romance Languages (LSRL43), a three-day conference which was held at The Graduate Center, of The City University of New York, April 17-19, 2013. The session consisted of three consecutive one-hour talks on parsed corpora projects, which use the Romance languages as a base. The talks were by: (1) Anthony Kroch and Beatrice Santorini (University of Pennsylvania), (2) Ana Maria Martins (University of Lisbon), and (3) Charlotte Galves (University of Campinas). There were three essential ingredients which made the Special Session particularly needed, and completely unique: first, it was the first concentrated set of talks on parsed corpora ever, which focussed specifically on Romance. Second, it included a mix of both historical and synchronic parsed corpora. Third, the venue (LSRL43) attracted the largest concentration of Romance linguists in one Romance linguistics oriented location. The Special Session met the objectives of the grant: it heightened awareness amongst Romance scholars of these important Romance parsed corpora projects, and also of the research resulting from the use of these innovative tools. Furthermore, because of the highlighting of Romance corpora, many of the abstracts submitted to the general session of LSRL43 were also on corpus projects, which in turn had the effect of spreading the theme of the Special Session throughout the entire conference. LSRL43’s focussed discussion of parsed corpora as research tools clearly fostered and enhanced scientific activities in the discipline. For example, the conference resulted in the conceptualization of a special issue of the journal Linguistic Variation (in progress), which focusses on Romance parsed corpora by highlighting papers from the Special Session, as well as papers on the topic from the general session. The Special Session also served the purpose of enhancing graduate student participation in research activities by involving them in (i) the abstract reading process, (ii) the reviewer selection process, (iii) the review-reading process, and (iv) the abstract selection process. In addition, one graduate student organizer is currently co-editing the above-referenced special issue of Linguistic Variation, while two others are currently co-editing a volume of selected proceedings from the general session.

Project Start
Project End
Budget Start
2013-03-01
Budget End
2014-02-28
Support Year
Fiscal Year
2012
Total Cost
$13,796
Indirect Cost
Name
CUNY College of Staten Island
Department
Type
DUNS #
City
Staten Island
State
NY
Country
United States
Zip Code
10314