Panayiotis G. Georgiou

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Panayiotis G. Georgiou is active.

Explore More

Publication

Featured researches published by Panayiotis G. Georgiou.

IEEE Transactions on Multimedia | 1999

Alpha-stable modeling of noise and robust time-delay estimation in the presence of impulsive noise

Panayiotis G. Georgiou; Panagiotis Tsakalides; Chris Kyriakakis

A new representation of audio noise signals is proposed, based on symmetric /spl alpha/-stable (S/spl alpha/S) distributions in order to better model the outliers that exist in real signals. This representation addresses a shortcoming of the Gaussian model, namely, the fact that it is not well suited for describing signals with impulsive behavior. The /spl alpha/-stable and Gaussian methods are used to model measured noise signals. It is demonstrated that the /spl alpha/-stable distribution, which has heavier tails than the Gaussian distribution, gives a much better approximation to real-world audio signals. The significance of these results is shown by considering the time delay estimation (TDE) problem for source localization in teleimmersion applications. In order to achieve robust sound source localization, a novel time delay estimation approach is proposed. It is based on fractional lower order statistics (FLOS), which mitigate the effects of heavy-tailed noise. An improvement in TDE performance is demonstrated using FLOS that is up to a factor of four better than what can be achieved with second-order statistics.

Proceedings of the IEEE | 2013

Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language

Shrikanth Narayanan; Panayiotis G. Georgiou

The expression and experience of human behavior are complex and multimodal and characterized by individual and contextual heterogeneity and variability. Speech and spoken language communication cues offer an important means for measuring and modeling human behavior. Observational research and practice across a variety of domains from commerce to healthcare rely on speech- and language-based informatics for crucial assessment and diagnostic information and for planning and tracking response to an intervention. In this paper, we describe some of the opportunities as well as emerging methodologies and applications of human behavioral signal processing (BSP) technology and algorithms for quantitatively understanding and modeling typical, atypical, and distressed human behavior with a specific focus on speech- and language-based communicative, affective, and social behavior. We describe the three important BSP components of acquiring behavioral data in an ecologically valid manner across laboratory to real-world settings, extracting and analyzing behavioral cues from measured data, and developing models offering predictive and decision-making support. We highlight both the foundational speech and language processing building blocks as well as the novel processing and modeling opportunities. Using examples drawn from specific real-world applications ranging from literacy assessment and autism diagnostics to psychotherapy for addiction and marital well being, we illustrate behavioral informatics applications of these signal processing techniques that contribute to quantifying higher level, often subjectively described, human behavior in a domain-sensitive fashion.

multimedia signal processing | 2007

Real-time Emotion Detection System using Speech: Multi-modal Fusion of Different Timescale Features

Samuel Kim; Panayiotis G. Georgiou; Sungbok Lee; Shrikanth Narayanan

The goal of this work is to build a real-time emotion detection system which utilizes multi-modal fusion of different timescale features of speech. Conventional spectral and prosody features are used for intra-frame and supra-frame features respectively, and a new information fusion algorithm which takes care of the characteristics of each machine learning algorithm is introduced. In this framework, the proposed system can be associated with additional features, such as lexical or discourse information, in later steps. To verify the realtime system performance, binary decision tasks on angry and neutral emotion are performed using concatenated speech signal simulating realtime conditions.

international conference on acoustics, speech, and signal processing | 2005

Smart room: participant and speaker localization and identification

Carlos Busso; Sergi Hernanz; Chi Wei Chu; Soon Il Kwon; Sung Lee; Panayiotis G. Georgiou; Isaac Cohen; Shrikanth Narayanan

Our long-term objective is to create smart room technologies that are aware of the users presence and their behavior and can become an active, but not an intrusive, part of the interaction. In this work, we present a multimodal approach for estimating and tracking the location and identity of the participants including the active speaker. Our smart room design contains three user-monitoring systems: four CCD cameras, an omnidirectional camera and a 16 channel microphone array. The various sensory modalities are processed both individually and jointly and it is shown that the multimodal approach results in significantly improved performance in spatial localization, identification and speech activity detection of the participants.

IEEE Transactions on Audio, Speech, and Language Processing | 2008

Challenging Uncertainty in Query by Humming Systems: A Fingerprinting Approach

Erdem Unal; Elaine Chew; Panayiotis G. Georgiou; Shrikanth Narayanan

Robust data retrieval in the presence of uncertainty is a challenging problem in multimedia information retrieval. In query-by-humming (QBH) systems, uncertainty can arise in query formulation due to user-dependent variability, such as incorrectly hummed notes, and in query transcription due to machine-based errors, such as insertions and deletions. We propose a fingerprinting (FP) algorithm for representing salient melodic information so as to better compare potentially noisy voice queries with target melodies in a database. The FP technique is employed in the QBH system back end; a hidden Markov model (HMM) front end segments and transcribes the hummed audio input into a symbolic representation. The performance of the FP search algorithm is compared to the conventional edit distance (ED) technique. Our retrieval database is built on 1500 MIDI files and evaluated using 400 hummed samples from 80 people with different musical backgrounds. A melody retrieval accuracy of 88% is demonstrated for humming samples from musically trained subjects, and 70% for samples from untrained subjects, for the FP algorithm. In contrast, the widely used ED method achieves 86% and 62% accuracy rates, respectively, for the same samples, thus suggesting that the proposed FP technique is more robust under uncertainty, particularly for queries by musically untrained users.

ieee automatic speech recognition and understanding workshop | 2003

Transonics: a speech to speech system for English-Persian interactions

Shrikanth Narayanan; Sankaranarayanan Ananthakrishnan; Robert Belvin; E. Ettaile; Shadi Ganjavi; Panayiotis G. Georgiou; C. M. Hein; S. Kadambe; Kevin Knight; Daniel Marcu; Howard Neely; Naveen Srinivasamurthy; David R. Traum; Dagen Wang

In this paper, we describe the first phase of development of our speech-to-speech system between English and Modem Persian under the DARPA Babylon program. We give an overview of the various system components: the front end ASR, the machine translation system and the speech generation system. Challenges such as the sparseness of available spoken language data and solutions that have been employed to maximize the obtained benefits from using these limited resources are examined. Efforts in the creation of the user interface and the underlying dialog management system for mediated communication are described.

affective computing and intelligent interaction | 2011

That's aggravating, very aggravating: is it possible to classify behaviors in couple interactions using automatically derived lexical features?

Panayiotis G. Georgiou; Matthew P. Black; Adam C. Lammert; Brian R. Baucom; Shrikanth Narayanan

Psychology is often grounded in observational studies of human interaction behavior, and hence on human perception and judgment. There are many practical and theoretical challenges in observational practice. Technology holds the promise of mitigating some of these difficulties by assisting in the evaluation of higher level human behavior. In this work we attempt to address two questions: (1) Does the lexical channel contain the necessary information towards such an evaluation; and if yes (2) Can such information be captured by a noisy automated transcription process. We utilize a large corpus of couple interaction data, collected in the context of a longitudinal study of couple therapy. In the original study, each spouse was manually evaluated with several sessionlevel behavioral codes (e.g., level of acceptance toward other spouse). Our results will show that both of our research questions can be answered positively and encourage future research into such assistive observational technologies.

IEEE Transactions on Audio, Speech, and Language Processing | 2011

Enhanced Sparse Imputation Techniques for a Robust Speech Recognition Front-End

Qun Feng Tan; Panayiotis G. Georgiou; Shrikanth Narayanan

Missing data techniques (MDTs) have been widely employed and shown to improve speech recognition results under noisy conditions. This paper presents a new technique which improves upon previously proposed sparse imputation techniques relying on the least absolute shrinkage and selection operator (LASSO). LASSO is widely employed in compressive sensing problems. However, the problem with LASSO is that it does not satisfy oracle properties in the event of a highly collinear dictionary, which happens with features extracted from most speech corpora. When we say that a variable selection procedure satisfies the oracle properties, we mean that it enjoys the same performance as though the underlying true model is known. Through experiments on the Aurora 2.0 noisy spoken digits database, we demonstrate that the Least Angle Regression implementation of the Elastic Net (LARS-EN) algorithm is able to better exploit the properties of a collinear dictionary, and thus is significantly more robust in terms of basis selection when compared to LASSO on the continuous digit recognition task with estimated mask. In addition, we investigate the effects and benefits of a good measure of sparsity on speech recognition rates. In particular, we demonstrate that a good measure of sparsity greatly improves speech recognition rates, and that the LARS modification of LASSO and LARS-EN can be terminated early to achieve improved recognition results, even though the estimation error is increased.

IEEE Transactions on Audio, Speech, and Language Processing | 2009

An Iterative Relative Entropy Minimization-Based Data Selection Approach for n-Gram Model Adaptation

Abhinav Sethy; Panayiotis G. Georgiou; Bhuvana Ramabhadran; Shrikanth Narayanan

Performance of statistical n-gram language models depends heavily on the amount of training text material and the degree to which the training text matches the domain of interest. The language modeling community is showing a growing interest in using large collections of text (obtainable, for example, from a diverse set of resources on the Internet) to supplement sparse in-domain resources. However, in most cases the style and content of the text harvested from the web differs significantly from the specific nature of these domains. In this paper, we present a relative entropy based method to select subsets of sentences whose n-gram distribution matches the domain of interest. We present results on language model adaptation using two speech recognition tasks: a medium vocabulary medical domain doctor-patient dialog system and a large vocabulary transcription system for European parliamentary plenary speeches (EPPS). We show that the proposed subset selection scheme leads to performance improvements over state of the art speech recognition systems in terms of both speech recognition word error rate (WER) and language model perplexity (PPL).

international conference on acoustics, speech, and signal processing | 2007

Real-Time Monitoring of Participants' Interaction in a Meeting using Audio-Visual Sensors

Carlos Busso; Panayiotis G. Georgiou; Shrikanth Narayanan

Intelligent environments equipped with audio-visual sensors provide suitable means for automatically monitoring and tracking the behavior, strategies and engagement of the participants in multiperson meetings. In this paper, high-level features are calculated from active speaker segmentations, automatically annotated by our smart room system, to infer the interaction dynamics between the participants. These features include the number and the average duration of each turn, statistics of turn-taking such as time as active speaker, and turn-taking transition patterns between participants. The results show that it is possible to accurately estimate in real-time not only the flow of the interaction, but also how dominant and engaged each participant was during the discussion. These high-level features, which cannot be inferred from any of the individual modalities by themselves, can be useful for summarization, classification, retrieval and (after action) analysis of meetings.

Explore More