Brandon Cain Roy
Massachusetts Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Brandon Cain Roy.
EELC'06 Proceedings of the Third international conference on Emergence and Evolution of Linguistic Communication: symbol Grounding and Beyond | 2006
Deb Roy; Rupal Patel; Philip DeCamp; Rony Kubat; Michael Fleischman; Brandon Cain Roy; Nikolaos Mavridis; Stefanie Tellex; Alexia Salata; Jethran Guinness; Michael Levit; Peter Gorniak
The Human Speechome Project is an effort to observe and computationally model the longitudinal course of language development for a single child at an unprecedented scale. We are collecting audio and video recordings for the first three years of one childs life, in its near entirety, as it unfolds in the childs home. A network of ceiling-mounted video cameras and microphones are generating approximately 300 gigabytes of observational data each day from the home. One of the worlds largest single-volume disk arrays is under construction to house approximately 400,000 hours of audio and video recordings that will accumulate over the three year study. To analyze the massive data set, we are developing new data mining technologies to help human analysts rapidly annotate and transcribe recordings using semi-automatic methods, and to detect and visualize salient patterns of behavior and interaction. To make sense of large-scale patterns that span across months or even years of observations, we are developing computational models of language acquisition that are able to learn from the childs experiential record. By creating and evaluating machine learning systems that step into the shoes of the child and sequentially process long stretches of perceptual experience, we will investigate possible language learning strategies used by children with an emphasis on early word learning.
Proceedings of the National Academy of Sciences of the United States of America | 2015
Brandon Cain Roy; Michael C. Frank; Philip DeCamp; Matthew Miller; Deb Roy
Significance The emergence of productive language is a critical milestone in a child’s life. Laboratory studies have identified many individual factors that contribute to word learning, and larger scale studies show correlations between aspects of the home environment and language outcomes. To date, no study has compared across many factors involved in word learning. We introduce a new ultradense set of recordings that capture a single child’s daily experience during the emergence of language. We show that words used in distinctive spatial, temporal, and linguistic contexts are produced earlier, suggesting they are easier to learn. These findings support the importance of multimodal context in word learning for one child and provide new methods for quantifying the quality of children’s language input. Children learn words through an accumulation of interactions grounded in context. Although many factors in the learning environment have been shown to contribute to word learning in individual studies, no empirical synthesis connects across factors. We introduce a new ultradense corpus of audio and video recordings of a single child’s life that allows us to measure the child’s experience of each word in his vocabulary. This corpus provides the first direct comparison, to our knowledge, between different predictors of the child’s production of individual words. We develop a series of new measures of the distinctiveness of the spatial, temporal, and linguistic contexts in which a word appears, and show that these measures are stronger predictors of learning than frequency of use and that, unlike frequency, they play a consistent role across different syntactic categories. Our findings provide a concrete instantiation of classic ideas about the role of coherent activities in word learning and demonstrate the value of multimodal data in understanding children’s language acquisition.
international conference on multimodal interfaces | 2007
Rony Kubat; Philip DeCamp; Brandon Cain Roy
We introduce a system for visualizing, annotating, and analyzing very large collections of longitudinal audio and video recordings. The system, TotalRecall, is designed to address the requirements of projects like the Human Speechome Project, for which more than 100,000 hours of multitrack audio and video have been collected over a twentytwo month period. Our goal in this project is to transcribe speech in over 10,000 hours of audio recordings, and to annotate the position and head orientation of multiple people in the 10,000 hours of corresponding video. Higher level behavioral analysis of the corpus will be based on these and other annotations. To efficiently cope with this huge corpus, we are developing semi-automatic data coding methods that are integrated into TotalRecall. Ultimately, this system and the underlying methodology may enable new forms of multimodal behavioral analysis grounded in ultradense longitudinal data.
acm multimedia | 2007
Michael Fleischman; Brandon Cain Roy; Deb Roy
Most approaches to highlight classification in the sports domain exploit only limited temporal information. This paper presents a method, called temporal feature induction, which automatically mines complex temporal information from raw video for use in highlight classification. The method exploits techniques from temporal data mining to discover a codebook of temporal patterns that encode long distance dependencies and duration information. Preliminary experiments show that using such induced temporal features significantly improves performance of a baseball highlight classification system.
Psychological Science | 2017
Stephan C. Meylan; Michael C. Frank; Brandon Cain Roy; Roger Levy
How do children begin to use language to say things they have never heard before? The origins of linguistic productivity have been a subject of heated debate: Whereas generativist accounts posit that children’s early language reflects the presence of syntactic abstractions, constructivist approaches instead emphasize gradual generalization derived from frequently heard forms. In the present research, we developed a Bayesian statistical model that measures the degree of abstraction implicit in children’s early use of the determiners “a” and “the.” Our work revealed that many previously used corpora are too small to allow researchers to judge between these theoretical positions. However, several data sets, including the Speechome corpus—a new ultra-dense data set for one child—showed evidence of low initial levels of productivity and higher levels later in development. These findings are consistent with the hypothesis that children lack rich grammatical knowledge at the outset of language learning but rapidly begin to generalize on the basis of structural regularities in their input.
Proceedings of the Annual Meeting of the Cognitive Science Society | 2009
Brandon Cain Roy; Michael C. Frank; Deb Roy
conference of the international speech communication association | 2009
Brandon Cain Roy; Deb Roy
Archive | 2008
Brandon Cain Roy; Deb Roy
Soroush Vosoughi | 2010
Brandon Cain Roy; Deb Roy; Soroush Vosoughi
Cognitive Science | 2012
Brandon Cain Roy; Michael C. Frank; Deb Roy