Issue of Monday, February 17th, 2025
Colloquium 2/21: Idan Landau (Tel Aviv University)
Speaker: Idan Landau (Tel Aviv University)
Title: Detecting, Constraining and Interpreting Silent Structure: Insights from Argument Ellipsis in Hebrew”
Time: Friday, February 21st, 3.30-5pm
Location: 32-141
Abstract: In this talk I examine classical and current issues in the theory of ellipsis through the prism of Argument Ellipsis (AE) in Hebrew, a productive process that offers a rich empirical testing ground. I start with standard diagnostics for surface anaphora, distinguishing AE from pro-drop and from Null Complement Anaphora, and leading to the strongest type of argument for AE, based on subextraction. Then I turn to the question of identity in ellipsis – whether it is syntactic or semantic, what mismatches are tolerated between the antecedent and the elliptical constituent, and whether identity applies to the entire ellipsis domain or just to a subdomain within it. Evidence bearing on this question comes from force mismatch under CP ellipsis, confirming and expanding on similar results obtained in studies of sluicing. Next, I discuss a curious semantic condition on AE – only arguments denoting individuals (type <e>) can be elided. The restriction holds across a number of unrelated languages, and is helpful in pinning down the size of the elided category (DP or VP). I will also discuss very recent results from a study of resumption inside ellipsis sites and how they overcome the severe limitations of the subextraction diagnostic (insofar as resumption is not derived by movement). Finally, I go over recent proposals as to what is AE, and sketch an approach based on the “Big DP” hypothesis.
Breakstone Speaker Series in Language, Computation and Cognition:
Roni Katzir 2/20
Speaker: Roni Katzir (Tel Aviv University)
Title: Large language models and human linguistic cognition
Time: Thursday, February 20, 2025: 12:30 - 2pm
Location: 32-461
Abstract: Several recent publications in cognitive science have made the suggestion that the performance of current Large Language Models (LLMs) challenges arguments that linguists use to support their theories (in particular, arguments from the poverty of the stimulus and from cross-linguistic variation). I will review this line of work, starting from proposals that take LLMs themselves to be good theories of human linguistic cognition. I will note that the architectures behind current LLMs lack the distinction between competence and performance and between correctness and probability, two fundamental distinctions of human cognition. Moreover, these architectures fail to acquire key aspects of human linguistic knowledge — in fact, they make inductive leaps that are not just non-human-like but would be surprising in any kind of rational learner. These observations make current LLMs inadequate theories of human linguistic cognition. Still, LLMs can in principle inform cognitive science by serving as proxies for theories of cognition whose representations and learning are more linguistically-neutral than those of most theories within generative linguistics. I will illustrate this proxy use of LLMs in evaluating learnability and typological arguments and show that, at present, these models provide little support for linguistically-neutral theories of cognition.
Phonology Circle 2/18 - Roni Katzir (Tel Aviv)
Speaker: Roni Katzir (Tel Aviv University)
Title: Gaps, doublets, and rational learning
Time: Tuesday, February 18th, 5pm - 6:30pm
Location: 32-D831
Abstract: Inflectional gaps (??forgoed/??forwent) and doublets (✓dived/✓dove) can seem surprising in light of common assumptions about morphology and learning. Perhaps understandably, morphologists have troubled themselves with such cases (especially with gaps) and have offered different ways in which they can be accommodated within the morphological component of the grammar, often by making major departures from common assumptions about morphology and learning. I will suggest that these worries and the proposed remedies are premature. The worries arise from the unmotivated assumption that all gaps and doublets are necessarily derived within individual grammars. Once this assumption is abandoned, the observed properties of gaps and doublets become much less puzzling. The only new assumption that is needed is that speakers only use forms that they know (and not just believe) to be correct.
Roni Katzir mini-course: Can artificial neural networks become more rational?
Our Breakstone Speaker series visitor (and last week’s colloquium speaker), Roni Katzir will be giving a mini-course this week as well:
Since the mid-1980s, artificial neural networks (ANNs) have been trained almost exclusively using a particular approach that has proven to be very useful for improving how the ANN fits its training data and, in turn, has been instrumental in the impressive engineering successes of ANNs on linguistic tasks over the past decades. ANNs trained using this method are typically extremely large, they require huge training corpora, and their inner workings are opaque. They also generalize in ways that seem inconsistent with common assumptions about rational inference. In these classes we will look at what happens when we replace the standard training approach for neural networks with Minimum Description Length (MDL), a simplicity principle that helps explain what makes some generalizations better than others. MDL has a long history in cognitive science, and among other things it provides a possible answer to how humans learn abstract grammars from unanalyzed surface data.
MDL also provides a way for machines to do the same: with MDL as the training method, we obtain small, transparent networks that learn complex recursive patterns perfectly and from very little data. These MDL networks help illustrate just how far standard ANNs (even the most successful of them) are from what we would expect from an intelligent system that attempts to extract regularities from the input data: given hundreds of billions of parameters and huge training corpora the performance of standard ANNs is sufficiently good to fool us on many common examples, but even then what the networks offer is just a superficial approximation of the regularities that reveals a complete lack of understanding of what what these regularities actually are. The MDL networks show us that it is possible for ANNs to learn intelligently and acquire systematic regularities perfectly from small training corpora but that this requires a very different learning approach from what current networks are based on.
Fox, D. and Katzir, R. (2024). Large language models and theoretical linguistics. Theoretical Linguistics, 50(1–2):71–76. https://doi.org/10.1515/tl-2024-2005
Lan, N., Geyer, M., Chemla, E., and Katzir, R. (2022). Minimum description length recurrent neural networks. Transactions of the Association for Computational Linguistics, 10:785–799. https://doi.org/10.1162/tacl_a_00489
Newman @ Nice locality workshop
Our colleague Elise Newman was an invited speaker at a workshop about Locality across the board in the pun-inviting locality of Nice. Her talk was entitled “The locality of subcategorization: a case for underspecified category”, and here is the abstract for it:
This talk is concerned with the selectional mechanism that underlies verb-argument pairs like (1).
-
- a. depend [PP on …]
b. say [CP that …]
- a. depend [PP on …]
Following Pesetsky (1982), I’ll refer to the relationship between the verb and its arguments in (1) as l-selection, where the verb requires a particular lexical item to head its argument. This distinguishes the relationship in (1) from other kinds of selection based on syntactic (c-selection) or semantic (s-selection) properties.
Such a distinction among selectional rules echoes earlier work, such as Chomsky (1965), which claimed that relationships like (1) were governed by strict subcategorization rules, in contrast with selection of a subject, for example, which was governed by selectional rules. Some differences between the two kinds of rules pertained to their locality conditions and the kinds of properties they were sensitive to: 1) strict subcategorization was limited to head-complement pairs, while selectional rules could create other kinds of branching structures, and 2) strict subcategorization rules only applied when there was no more general syntactic property to appeal to.
Puzzle: the modern view of selection (which only makes use of selectional rules) does not capture the original locality distinction between subcategorization and selection: l-selectional relationships like (1) seem to only arise in head-complement pairs. In other words, we do not find cases where a head l-selects for the head of its specifier.
This is a puzzle in current frameworks. Since Merge underlies both complementation and specifier-formation, it is not obvious what syntactic tools we have for enforcing only complementation in these cases.
I offer a proposal that captures the locality profile of strict subcategorization within a current feature-driven framework. Building on Newman (2024), the proposal makes use of an underspecified categorial feature X, which can be checked by any kind of element. The interaction between X-checking and the checking of other c- selectional features imposes restrictions on the order of operations: elements that are not c-selected can only check X, and therefore must merge first, or they will be bled by Merge of another element (which can check X in addition to whatever feature selected it). The requirement to merge first is what restricts l-selected elements to being complements.
While this may seem like just a solution to a technicality, the account makes predictions for the locality of A-movement as well, when we consider how l-selection interacts with the functional hierarchy more generally. I will argue that the functional hierarchy stems from a mixture of l-selectional and c-selectional requirements, where the interaction between the two can produce smuggling configurations (Collins 2005). When smuggling happens, arguments can obviate certain locality conditions on A- movement to derive phenomena such as symmetric passivization.