Is this you? Create Your Porfile

Michael R. Siracusa

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael R. Siracusa is active.

Explore More

Publication

Featured researches published by Michael R. Siracusa.

international conference on acoustics, speech, and signal processing | 2004

Multiple person and speaker activity tracking with a particle filter

Neal Checka; Kevin W. Wilson; Michael R. Siracusa; Trevor Darrell

In this paper, we present a system that combines sound and vision to track multiple people. In a cluttered or noisy scene, multi-person tracking estimates have a distinctly non-Gaussian distribution. We apply a particle filter with audio and video state components, and derive observation likelihood methods based on both audio and video measurements. Our state includes the number of people present, their positions, and whether each person is talking. We show experiments in an environment with sparse microphones and monocular cameras. Our results show that our system can accurately track the locations and speech activity of a varying number of people.

international conference on computer vision | 2005

Visual speech recognition with loosely synchronized feature streams

Kate Saenko; Karen Livescu; Michael R. Siracusa; Kevin W. Wilson; James R. Glass; Trevor Darrell

We present an approach to detecting and recognizing spoken isolated phrases based solely on visual input. We adopt an architecture that first employs discriminative detection of visual speech and articulate features, and then performs recognition using a model that accounts for the loose synchronization of the feature streams. Discriminative classifiers detect the subclass of lip appearance corresponding to the presence of speech, and further decompose it into features corresponding to the physical components of articulate production. These components often evolve in a semi-independent fashion, and conventional viseme-based approaches to recognition fail to capture the resulting co-articulation effects. We present a novel dynamic Bayesian network with a multi-stream structure and observations consisting of articulate feature classifier scores, which can model varying degrees of co-articulation in a principled way. We evaluate our visual-only recognition system on a command utterance task. We show comparative results on lip detection and speech/non-speech classification, as well as recognition performance against several baseline systems

international conference on multimodal interfaces | 2003

A multi-modal approach for determining speaker location and focus

Michael R. Siracusa; Louis-Philippe Morency; Kevin W. Wilson; John W. Fisher; Trevor Darrell

This paper presents a multi-modal approach to locate a speaker in a scene and determine to whom he or she is speaking. We present a simple probabilistic framework that combines multiple cues derived from both audio and video information. A purely visual cue is obtained using a head tracker to identify possible speakers in a scene and provide both their 3-D positions and orientation. In addition, estimates of the audio signals direction of arrival are obtained with the help of a two-element microphone array. A third cue measures the association between the audio and the tracked regions in the video. Integrating these cues provides a more robust solution than using any single cue alone. The usefulness of our approach is shown in our results for video sequences with two or more people in a prototype interactive kiosk environment.

international conference on digital signal processing | 2009

Estimation of Signal Information Content for Classification

John W. Fisher; Michael R. Siracusa; Kinh Tieu

Information measures have long been studied in the context of hypothesis testing leading to variety of bounds on performance based on the information content of a signal or the divergence between distributions. Here we consider the problem of estimation of information content for high-dimensional signals for purposes of classification. Direct estimation of information for high-dimensional signals is generally not tractable therefore we consider an extension to a method first suggested in [1] in which high dimensional signals are mapped to lower dimensional feature spaces yielding lower bounds on information content. We develop an affine-invariant gradient method and examine the utility of the resulting estimates for predicting classification performance empirically.

international conference on acoustics, speech, and signal processing | 2005

Estimating dependency and significance for high-dimensional data

Michael R. Siracusa; Kinh Tieu; Alexander T. Ihler; John W. Fisher; Alan S. Willsky

Understanding the dependency structure of a set of variables is a key component in various signal processing applications which involve data association. The simple task of detecting whether any dependency exists is particularly difficult when models of the data are unknown or difficult to characterize because of high-dimensional measurements. We review the use of nonparametric tests for characterizing dependency and how to carry out these tests with high-dimensional observations. In addition we present a method to assess the significance of the tests.

asilomar conference on signals, systems and computers | 2006

Inferring Dynamic Dependency with Applications to Link Analysis

Michael R. Siracusa; John W. Fisher

Statistical approaches to modeling dynamics and clustering data are well studied research areas. This paper considers a special class of such problems in which one is presented with multiple data streams and wishes to infer their interaction as it evolves over time. This problem is viewed as one of inference on a class of models in which interaction is described by changing dependency structures, i.e. the presence or absence of edges in a graphical model, but for which the full set of parameters are not available. The application domain of dynamic link analysis as applied to tracked object behavior is explored. An approximate inference method is presented along with empirical results demonstrating its performance.

international conference on multimodal interfaces | 2004

Real-time audio-visual tracking for meeting analysis

David Demirdjian; Kevin W. Wilson; Michael R. Siracusa; Trevor Darrell

We demonstrate an audio-visual tracking system for meeting analysis. A stereo camera and a microphone array are used to track multiple people and their speech activity in real-time. Our system can estimate the location of multiple people, detect the current speaker and build a model of interaction between people in a meeting.

asilomar conference on signals, systems and computers | 2008

Interaction analysis using switching structured autoregressive models

Michael R. Siracusa; John W. Fisher

This paper explores modeling the dependency structure among multiple vector time-series. We focus on a large classes of structures which yield efficient and tractable exact inference. Specifically, we use directed trees and forests to model causal interactions among time-series. These models are incorporated in a dynamic setting in which a latent variable indexes evolving structures. We demonstrate the utility of the method by analyzing the interaction of multiple moving objects.

Archive | 2005

Geometric and Statistical Approaches to Audiovisual Segmentation

Trevor Darrell; John W. Fisher; Kevin W. Wilson; Michael R. Siracusa

Multimodal approaches are proposed for segmenting multiple speakers using geometric or statistical techniques. When multiple microphones and cameras are available, 3-D audiovisual tracking is used for source segmentation and array processing. With just a single camera and microphone, an information theoretic criteria separates speakers in a video sequence and associates relevant portions of the audio signal. Results are shown for each approach, and an initial integration effort is discussed.

international conference on artificial intelligence and statistics | 2009