Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Alexandros Potamianos is active.

Publication


Featured researches published by Alexandros Potamianos.


Journal of the Acoustical Society of America | 1999

Acoustics of children’s speech: Developmental changes of temporal and spectral parameters

Sungbok Lee; Alexandros Potamianos; Shrikanth Narayanan

Changes in magnitude and variability of duration, fundamental frequency, formant frequencies, and spectral envelope of childrens speech are investigated as a function of age and gender using data obtained from 436 children, ages 5 to 17 years, and 56 adults. The results confirm that the reduction in magnitude and within-subject variability of both temporal and spectral acoustic parameters with age is a major trend associated with speech development in normal children. Between ages 9 and 12, both magnitude and variability of segmental durations decrease significantly and rapidly, converging to adult levels around age 12. Within-subject fundamental frequency and formant-frequency variability, however, may reach adult range about 2 or 3 years later. Differentiation of male and female fundamental frequency and formant frequency patterns begins at around age 11, becoming fully established around age 15. During that time period, changes in vowel formant frequencies of male speakers is approximately linear with age, while such a linear trend is less obvious for female speakers. These results support the hypothesis of uniform axial growth of the vocal tract for male speakers. The study also shows evidence for an apparent overshoot in acoustic parameter values, somewhere between ages 13 and 15, before converging to the canonical levels for adults. For instance, teenagers around age 14 differ from adults in that, on average, they show shorter segmental durations and exhibit less within-subject variability in durations, fundamental frequency, and spectral envelope measures.


Signal Processing | 1994

A comparison of the energy operator and the Hilbert transform approach to signal and speech demodulation

Alexandros Potamianos; Petros Maragos

Abstract The Hilbert transform together with Gabors analytic signal provides a standard linear integral approach to estimate the amplitude envelope and instantaneous frequency of signals with a combined amplitude modulation (AM) and frequency modulation (FM) structure. A recent alternative approach uses a nonlinear differential ‘energy’ operator to track the energy required to generate an AM-FM signal and separate it into amplitude and frequency components. In this paper, we compare these two fundamentally different approaches for demodulation of arbitrary signals and of speech resonances modeled by AM-Fm signals. The comparison is done from several viewpoints: magnitude of estimation errors, computational complexity, and adaptability to instantaneous signal changes. We also propose a refinement of the energy operator approach that uses simple binomial convolutions to smooth the energy signals. This smoothed energy operator is compared to the Hilbert transform on tracking modulations in speech vowel signals, band-pass filtered around their formants. The effects of pitch periodicity and band-pass filtering on both demodulation approaches are examined and an application to formant tracking is presented. The results provide strong evidence that the estimation errors of the smoothed energy operator approach are similar to that of the Hilbert transform approach for speech applications, but smaller for communication applications. In addition, the smoothed energy operator approach has smaller computational complexity and faster adaptation due to its instantaneous nature.


Journal of the Acoustical Society of America | 1996

Speech formant frequency and bandwidth tracking using multiband energy demodulation

Alexandros Potamianos; Petros Maragos

In this paper, the amplitude and frequency (AM–FM) modulation model and a multiband demodulation analysis scheme are applied to formant frequency and bandwidth tracking of speech signals. Filtering by a bank of Gabor bandpass filters is performed to isolate each speech resonance in the signal. Next, the amplitude envelope (AM) and instantaneous frequency (FM) are estimated for each band using the energy separation algorithm (ESA). Short‐time formant frequency and bandwidth estimates are obtained from the instantaneous amplitude and frequency signals; two frequency estimates are proposed and their relative merits are discussed. The short‐time estimates are used to compute the formant locations and bandwidths. Performance and computational issues of the algorithm are discussed. Overall, multiband demodulation analysis (MDA) is shown to be a useful tool for extracting information from the speech resonances in the time–frequency plane.


IEEE Transactions on Speech and Audio Processing | 2003

Robust recognition of children's speech

Alexandros Potamianos; Shrikanth Narayanan

Developmental changes in speech production introduce age-dependent spectral and temporal variability in the speech signal produced by children. Such variabilities pose challenges for robust automatic recognition of childrens speech. Through an analysis of age-related acoustic characteristics of childrens speech in the context of automatic speech recognition (ASR), effects such as frequency scaling of spectral envelope parameters are demonstrated. Recognition experiments using acoustic models trained from adult speech and tested against speech from children of various ages clearly show performance degradation with decreasing age. On average, the word error rates are two to five times worse for children speech than for adult speech. Various techniques for improving ASR performance on childrens speech are reported. A speaker normalization algorithm that combines frequency warping and model transformation is shown to reduce acoustic variability and significantly improve ASR performance for children speakers (by 25-45% under various model training and testing conditions). The use of age-dependent acoustic models further reduces word error rate by 10%. The potential of using piece-wise linear and phoneme-dependent frequency warping algorithms for reducing the variability in the acoustic feature space of children is also investigated.


IEEE Transactions on Speech and Audio Processing | 2002

Creating conversational interfaces for children

Shrikanth Narayanan; Alexandros Potamianos

Creating conversational interfaces for children is challenging in several respects. These include acoustic modeling for automatic speech recognition (ASR), language and dialog modeling, and multimodal-multimedia user interface design. First, issues in ASR of childrens speech are introduced by an analysis of developmental changes in the spectral and temporal characteristics of the speech signal using data obtained from 456 children, ages five to 18 years. Acoustic modeling adaptation and vocal tract normalization algorithms that yielded state-of-the-art ASR performance on childrens speech are described. Second, an experiment designed to better understand how children interact with machines using spoken language is described. Realistic conversational multimedia interaction data were obtained from 160 children who played a voice-activated computer game in a Wizard of Oz (WoZ) scenario. Results of using these data in developing novel language and dialog models as well as in a unified maximum likelihood framework for acoustic decoding in ASR and semantic classification for spoken language understanding are described. Leveraging the lessons learned from the WoZ study and a concurrent user experience evaluation, a multimedia personal agent prototype for children was designed. Details of the architecture and application details are described. Informal evaluation by children was found positive especially for the animated agent and the speech interface.


Journal of the Acoustical Society of America | 1999

Fractal dimensions of speech sounds: computation and application to automatic speech recognition.

Petros Maragos; Alexandros Potamianos

The dynamics of airflow during speech production may often result in some small or large degree of turbulence. In this paper, the geometry of speech turbulence as reflected in the fragmentation of the time signal is quantified by using fractal models. An efficient algorithm for estimating the short-time fractal dimension of speech signals based on multiscale morphological filtering is described, and its potential for speech segmentation and phonetic classification discussed. Also reported are experimental results on using the short-time fractal dimension of speech signals at multiple scales as additional features in an automatic speech-recognition system using hidden Markov models, which provide a modest improvement in speech-recognition performance.


IEEE Transactions on Multimedia | 2013

Multimodal Saliency and Fusion for Movie Summarization Based on Aural, Visual, and Textual Attention

Georgios Evangelopoulos; Athanasia Zlatintsi; Alexandros Potamianos; Petros Maragos; Konstantinos Rapantzikos; Georgios Skoumas; Yannis S. Avrithis

Multimodal streams of sensory information are naturally parsed and integrated by humans using signal-level feature extraction and higher level cognitive processes. Detection of attention-invoking audiovisual segments is formulated in this work on the basis of saliency models for the audio, visual, and textual information conveyed in a video stream. Aural or auditory saliency is assessed by cues that quantify multifrequency waveform modulations, extracted through nonlinear operators and energy tracking. Visual saliency is measured through a spatiotemporal attention model driven by intensity, color, and orientation. Textual or linguistic saliency is extracted from part-of-speech tagging on the subtitles information available with most movie distributions. The individual saliency streams, obtained from modality-depended cues, are integrated in a multimodal saliency curve, modeling the time-varying perceptual importance of the composite video stream and signifying prevailing sensory events. The multimodal saliency representation forms the basis of a generic, bottom-up video summarization algorithm. Different fusion schemes are evaluated on a movie database of multimodal saliency annotations with comparative results provided across modalities. The produced summaries, based on low-level features and content-independent fusion and selection, are of subjectively high aesthetic and informative quality.


IEEE Transactions on Knowledge and Data Engineering | 2010

Unsupervised Semantic Similarity Computation between Terms Using Web Documents

Elias Iosif; Alexandros Potamianos

In this work, Web-based metrics that compute the semantic similarity between words or terms are presented and compared with the state of the art. Starting from the fundamental assumption that similarity of context implies similarity of meaning, relevant Web documents are downloaded via a Web search engine and the contextual information of words of interest is compared (context-based similarity metrics). The proposed algorithms work automatically, do not require any human-annotated knowledge resources, e.g., ontologies, and can be generalized and applied to different languages. Context-based metrics are evaluated both on the Charles-Miller data set and on a medical term data set. It is shown that context-based similarity metrics significantly outperform co-occurrence-based metrics, in terms of correlation with human judgment, for both tasks. In addition, the proposed unsupervised context-based similarity computation algorithms are shown to be competitive with the state-of-the-art supervised semantic similarity algorithms that employ language-specific knowledge resources. Specifically, context-based metrics achieve correlation scores of up to 0.88 and 0.74 for the Charles-Miller and medical data sets, respectively. The effect of stop word filtering is also investigated for word and term similarity computation. Finally, the performance of context-based term similarity metrics is evaluated as a function of the number of Web documents used and for various feature weighting schemes.


IEEE Signal Processing Letters | 2005

Robust AM-FM features for speech recognition

Dimitrios Dimitriadis; Petros Maragos; Alexandros Potamianos

In this letter, a nonlinear AM-FM speech model is used to extract robust features for speech recognition. The proposed features measure the amount of amplitude and frequency modulation that exists in speech resonances and attempt to model aspects of the speech acoustic information that the commonly used linear source-filter model fails to capture. The robustness and discriminability of the AM-FM features is investigated in combination with mel cepstrum coefficients (MFCCs). It is shown that these hybrid features perform well in the presence of noise, both in terms of phoneme-discrimination (J-measure) and in terms of speech recognition performance in several different tasks. Average relative error rate reduction up to 11% for clean and 46% for mismatched noisy conditions is achieved when AM-FM features are combined with MFCCs.


international conference on acoustics, speech, and signal processing | 1997

On combining frequency warping and spectral shaping in HMM based speech recognition

Alexandros Potamianos; Richard C. Rose

Frequency warping approaches to speaker normalization have been proposed and evaluated on various speech recognition tasks. These techniques have been found to significantly improve performance even for speaker independent recognition from short utterances over the telephone network. In maximum likelihood (ML) based model adaptation a linear transformation is estimated and applied to the model parameters in order to increase the likelihood of the input utterance. The purpose of this paper is to demonstrate that significant advantage can be gained by performing frequency warping and ML speaker adaptation in a unified framework. A procedure is described which compensates utterances by simultaneously scaling the frequency axis and reshaping the spectral energy contour. This procedure is shown to reduce the error rate in a telephone based connected digit recognition task by 30-40%.

Collaboration


Dive into the Alexandros Potamianos's collaboration.

Top Co-Authors

Avatar

Elias Iosif

Technical University of Crete

View shared research outputs
Top Co-Authors

Avatar

Shrikanth Narayanan

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Petros Maragos

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Manolis Perakakis

Technical University of Crete

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Athanasia Zlatintsi

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Nikolaos Malandrakis

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Sungbok Lee

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Elisavet Palogiannidi

Technical University of Crete

View shared research outputs
Researchain Logo
Decentralizing Knowledge