Is this you? Create Your Porfile

Peter Gorniak

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Peter Gorniak is active.

Explore More

Publication

Featured researches published by Peter Gorniak.

Journal of Artificial Intelligence Research | 2004

Grounded semantic composition for visual scenes

Peter Gorniak; Deb Roy

We present a visually-grounded language understanding model based on a study of how people verbally describe objects in scenes. The emphasis of the model is on the combination of individual word meanings to produce meanings for complex referring expressions. The model has been implemented, and it is able to understand a broad range of spatial referring expressions. We describe our implementation of word level visually-grounded semantics and their embedding in a compositional parsing framework. The implemented system selects the correct referents in response to natural language expressions for a large percentage of test cases. In an analysis of the systems successes and failures we reveal how visual context influences the semantics of utterances and propose future extensions to the model that take such context into account.

EELC'06 Proceedings of the Third international conference on Emergence and Evolution of Linguistic Communication: symbol Grounding and Beyond | 2006

The human speechome project

Deb Roy; Rupal Patel; Philip DeCamp; Rony Kubat; Michael Fleischman; Brandon Cain Roy; Nikolaos Mavridis; Stefanie Tellex; Alexia Salata; Jethran Guinness; Michael Levit; Peter Gorniak

The Human Speechome Project is an effort to observe and computationally model the longitudinal course of language development for a single child at an unprecedented scale. We are collecting audio and video recordings for the first three years of one childs life, in its near entirety, as it unfolds in the childs home. A network of ceiling-mounted video cameras and microphones are generating approximately 300 gigabytes of observational data each day from the home. One of the worlds largest single-volume disk arrays is under construction to house approximately 400,000 hours of audio and video recordings that will accumulate over the three year study. To analyze the massive data set, we are developing new data mining technologies to help human analysts rapidly annotate and transcribe recordings using semi-automatic methods, and to detect and visualize salient patterns of behavior and interaction. To make sense of large-scale patterns that span across months or even years of observations, we are developing computational models of language acquisition that are able to learn from the childs experiential record. By creating and evaluating machine learning systems that step into the shoes of the child and sequentially process long stretches of perceptual experience, we will investigate possible language learning strategies used by children with an emphasis on early word learning.

international conference on multimodal interfaces | 2005

Probabilistic grounding of situated speech using plan recognition and reference resolution

Peter Gorniak; Deb Roy

Situated, spontaneous speech may be ambiguous along acoustic, lexical, grammatical and semantic dimensions. To understand such a seemingly difficult signal, we propose to model the ambiguity inherent in acoustic signals and in lexical and grammatical choices using compact, probabilistic representations of multiple hypotheses. To resolve semantic ambiguities we propose a situation model that captures aspects of the physical context of an utterance as well as the speakers intentions, in our case represented by recognized plans. In a single, coherent Framework for Understanding Situated Speech (FUSS) we show how these two influences, acting on an ambiguous representation of the speech signal, complement each other to disambiguate form and content of situated speech. This method produces promising results in a game playing environment and leaves room for other types of situation models.

international conference on multimodal interfaces | 2003

Augmenting user interfaces with adaptive speech commands

Peter Gorniak; Deb Roy

We present a system that augments any unmodified Java application with an adaptive speech interface. The augmented system learns to associate spoken words and utterances with interface actions such as button clicks. Speech learning is constantly active and searches for correlations between what the user says and does. Training the interface is seamlessly integrated with using the interface. As the user performs normal actions, she may optionally verbally describe what she is doing. By using a phoneme recognizer, the interface is able to quickly learn new speech commands. Speech commands are chosen by the user and can be recognized robustly due to accurate phonetic modelling of the users utterances and the small size of the vocabulary learned for a single application. After only a few examples, speech commands can replace mouse clicks. In effect, selected interface functions migrate from keyboard and mouse to speech. We demonstrate the usefulness of this approach by augmenting jfig, a drawing application, where speech commands save the user from the distraction of having to use a tool palette.

north american chapter of the association for computational linguistics | 2003

Understanding complex visually referring utterances

Peter Gorniak; Deb Roy

We propose a computational model of visually-grounded spatial language understanding, based on a study of how people verbally describe objects in visual scenes. We describe our implementation of word level visually-grounded semantics and their embedding in a compositional parsing framework. The implemented system selects the correct referents in response to a broad range of referring expressions for a large percentage of test cases. In an analysis of the systems successes and failures we reveal how visual context influences the semantics of utterances and propose future extensions to the model that take such context into account.

Cognitive Science | 2007