Children use syntax to understand sentences and to learn verbs; this is syntactic bootstrapping. We proposed a Structure Mapping account of the origins of syntactic bootstrapping. On this account, children begin with an unlearned bias toward one-to-one mapping between nouns in sentences and participant-roles in events. Given this bias, children find the number of nouns in a sentence inherently meaningful. In the previous funding period, we tested key predictions of this account, and found strong evidence for structure-mapping. Identifying the set of nouns in sentences yields a partial representation of syntactic structure that allows toddlers to identify verbs, and to interpret novel transitive and intransitive verbs in simple sentences. The proposed research asks how syntactic bootstrapping moves beyond 'counting the nouns', scaling up to the true complexity of verbs and sentences. We focus on two data-sources: distributional learning and discourse structure. First, we propose that distributional learning creates probabilistic syntactic-semantic combinatorial knowledge about verbs. This combinatorial knowledge, also known as verb bias, permits syntactic bootstrapping, and from early in development is used online to help identify the structure and lexical content of sentences, guiding syntactic analysis. Second, we propose that a bias toward discourse continuity increases linguistic support for verb learning by allowing learners to collect evidence for arguments across nearby sentences. Verb bias guides this process, by cuing children to seek referents for missing arguments in the discourse context. To investigate these proposals we combine experiments with toddlers and preschoolers, and a computational model based on systems for automatic semantic role labeling. Experiments with children assess comprehension of familiar and invented verbs in sentences, by measuring children's visual fixations to relevant scenes or objects. Project 1 explores toddlers' encoding of syntactic-semantic combinatorial facts about verbs from listening experience. Project 2 explores toddlers' use of discourse context to guide sentence interpretation, constrained by verb bias. Project 3 asks to what extent verb bias in preschoolers changes with new distributional learning. In Project 4 we develop our computational model to investigate the same processes. Results from experiments with children constrain the features we equip the model to detect; we use the model to test the consequences of our claims for learning from corpora of natural child-directed speech. This combination of experimental and computational studies will advance scientific knowledge about how children learn their native languages, and guide the development of new, robust learning protocols that will be of use in automatic natural language processing. The proposed research will help us to understand how children learn the words and syntax of their native languages; such research will contribute to the detection and remediation of language delays, and to language pedagogy.
The proposed research will examine verb learning and the development of sentence comprehension, using a combination of experiments with toddlers and preschoolers, and a computational model of early sentence comprehension. Our findings should have considerable impact, for two main reasons: First, our work will shed light on how learners begin to find meaning in syntax, addressing long-standing and fundamental scientific questions about language development. Second, the findings should help us predict and understand the consequences of individual variations in the early language environment for language development: by studying how young children collect and use linguistic-distributional data about words, we can predict what kinds of data they need to make typical progress, and thus what kinds of early experiences might lead to risks for language difficulties.