Steven T. Piantadosi
University of Rochester
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Steven T. Piantadosi.
Proceedings of the National Academy of Sciences of the United States of America | 2011
Steven T. Piantadosi; Harry Tily; Edward Gibson
We demonstrate a substantial improvement on one of the most celebrated empirical laws in the study of language, Zipfs 75-y-old theory that word length is primarily determined by frequency of use. In accord with rational theories of communication, we show across 10 languages that average information content is a much better predictor of word length than frequency. This indicates that human lexicons are efficiently structured for communication by taking into account interword statistical dependencies. Lexical systems result from an optimization of communicative pressures, coding meanings efficiently given the complex statistics of natural language use.
PLOS ONE | 2012
Celeste Kidd; Steven T. Piantadosi; Richard N. Aslin
Human infants, like immature members of any species, must be highly selective in sampling information from their environment to learn efficiently. Failure to be selective would waste precious computational resources on material that is already known (too simple) or unknowable (too complex). In two experiments with 7- and 8-month-olds, we measure infants’ visual attention to sequences of events varying in complexity, as determined by an ideal learner model. Infants’ probability of looking away was greatest on stimulus items whose complexity (negative log probability) according to the model was either very low or very high. These results suggest a principle of infant attention that may have broad applicability: infants implicitly seek to maintain intermediate rates of information absorption and avoid wasting cognitive resources on overly simple or overly complex events.
Cognition | 2012
Steven T. Piantadosi; Harry Tily; Edward Gibson
We present a general information-theoretic argument that all efficient communication systems will be ambiguous, assuming that context is informative about meaning. We also argue that ambiguity allows for greater ease of processing by permitting efficient linguistic units to be re-used. We test predictions of this theory in English, German, and Dutch. Our results and theoretical analysis suggest that ambiguity is a functional property of language that allows for greater communicative efficiency. This provides theoretical and empirical arguments against recent suggestions that core features of linguistic systems are not designed for communication.
Psychonomic Bulletin & Review | 2014
Steven T. Piantadosi
The frequency distribution of words has been a key object of study in statistical linguistics for the past 70 years. This distribution approximately follows a simple mathematical form known as Zipf’s law. This article first shows that human language has a highly complex, reliable structure in the frequency distribution over and above this classic law, although prior data visualization methods have obscured this fact. A number of empirical phenomena related to word frequencies are then reviewed. These facts are chosen to be informative about the mechanisms giving rise to Zipf’s law and are then used to evaluate many of the theoretical explanations of Zipf’s law in language. No prior account straightforwardly explains all the basic facts or is supported with independent evaluation of its underlying assumptions. To make progress at understanding why language obeys Zipf’s law, studies must seek evidence beyond the law itself, testing assumptions and evaluating novel predictions with new, independent data.
Language and Linguistics Compass | 2011
Edward Gibson; Steven T. Piantadosi; Kristina Fedorenko
The prevalent method in theoretical syntax and semantics research involves obtaining a judgment of the acceptability of a sentence ⁄ meaning pair, typically by just the author of the paper, sometimes with feedback from colleagues. The weakness of the traditional non-quantitative single-sentence ⁄ single-participant methodology, along with the existence of cognitive and social biases, has the unwanted effect that claims in the syntax and semantics literature cannot be trusted. Even if most of the judgments in an arbitrary syntax ⁄ semantics paper can be substantiated with rigorous quantitative experiments, the existence of a small set of judgments that do not conform to the authors’ intuitions can have a large effect on the potential theories. Whereas it is clearly desirable to quantitatively evaluate all syntactic and semantic hypotheses, it has been time-consuming in the past to find a large pool of nao¨ve experimental participants for behavioral experiments. The advent of Amazon.com’s Mechanical Turk now makes this process very simple. Mechanical Turk is a marketplace interface that can be used for collecting behavioral data over the internet quickly and inexpensively. The cost of using an interface like Mechanical Turk is minimal, and the time that it takes for the results to be returned is very short. Many linguistic surveys can be completed within a day, at a cost of less than
Cognition | 2012
Steven T. Piantadosi; Joshua B. Tenenbaum; Noah D. Goodman
50. In this paper, we provide detailed instructions for how to use our freely available software in order to (a) post-linguistic acceptability surveys to Mechanical Turk; and (b) extract and analyze the resulting data.
Proceedings of the National Academy of Sciences of the United States of America | 2013
Edward Gibson; Leon Bergen; Steven T. Piantadosi
In acquiring number words, children exhibit a qualitative leap in which they transition from understanding a few number words, to possessing a rich system of interrelated numerical concepts. We present a computational framework for understanding this inductive leap as the consequence of statistical inference over a sufficiently powerful representational system. We provide an implemented model that is powerful enough to learn number word meanings and other related conceptual systems from naturalistic data. The model shows that bootstrapping can be made computationally and philosophically well-founded as a theory of number learning. Our approach demonstrates how learners may combine core cognitive operations to build sophisticated representations during the course of development, and how this process explains observed developmental patterns in number word learning.
Psychological Science | 2013
Edward Gibson; Steven T. Piantadosi; Kimberly Brink; Leon Bergen; Eunice Lim; Rebecca Saxe
Sentence processing theories typically assume that the input to our language processing mechanisms is an error-free sequence of words. However, this assumption is an oversimplification because noise is present in typical language use (for instance, due to a noisy environment, producer errors, or perceiver errors). A complete theory of human sentence comprehension therefore needs to explain how humans understand language given imperfect input. Indeed, like many cognitive systems, language processing mechanisms may even be “well designed”–in this case for the task of recovering intended meaning from noisy utterances. In particular, comprehension mechanisms may be sensitive to the types of information that an idealized statistical comprehender would be sensitive to. Here, we evaluate four predictions about such a rational (Bayesian) noisy-channel language comprehender in a sentence comprehension task: (i) semantic cues should pull sentence interpretation towards plausible meanings, especially if the wording of the more plausible meaning is close to the observed utterance in terms of the number of edits; (ii) this process should asymmetrically treat insertions and deletions due to the Bayesian “size principle”; such nonliteral interpretation of sentences should (iii) increase with the perceived noise rate of the communicative situation and (iv) decrease if semantically anomalous meanings are more likely to be communicated. These predictions are borne out, strongly suggesting that human language relies on rational statistical inference over a noisy channel.
Cognition | 2013
Kyle Mahowald; Evelina Fedorenko; Steven T. Piantadosi; Edward Gibson
The distribution of word orders across languages is highly nonuniform, with subject-verb-object (SVO) and subject-object-verb (SOV) orders being prevalent. Recent work suggests that the SOV order may be the default in human language. Why, then, is SVO order so common? We hypothesize that SOV/SVO variation can be explained by language users’ sensitivity to the possibility of noise corrupting the linguistic signal. In particular, the noisy-channel hypothesis predicts a shift from the default SOV order to SVO order for semantically reversible events, for which potential ambiguity arises in SOV order because two plausible agents appear on the same side of the verb. We found support for this prediction in three languages (English, Japanese, and Korean) by using a gesture-production task, which reflects word-order preferences largely independent of native language. Other patterns of crosslinguistic variation (e.g., the prevalence of case marking in SOV languages and its relative absence in SVO languages) also straightforwardly follow from the noisy-channel hypothesis.
Child Development | 2014
Celeste Kidd; Steven T. Piantadosi; Richard N. Aslin
A major open question in natural language research is the role of communicative efficiency in the origin and on-line processing of language structures. Here, we use word pairs like chimp/chimpanzee, which differ in length but have nearly identical meanings, to investigate the communicative properties of lexical systems and the communicative pressures on language users.If language is designed to be information-theoretically optimal, then shorter words should convey less information than their longer counterparts, when controlling for meaning. Consistent with this prediction, a corpus analysis revealed that the short form of our meaning-matched pairs occurs in more predictive contexts than the longer form. Second, a behavioral study showed that language users choose the short form more often in predictive contexts, suggesting that tendencies to be information-theoretically efficient manifest in explicit behavioral choices. Our findings, which demonstrate the prominent role of communicative efficiency in the structure of the lexicon, complement and extend the results of Piantadosi, Tily, and Gibson (2011), who showed that word length is better correlated with Shannon information content than with frequency. Crucially, we show that this effect arises at least in part from active speaker choice.