Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Panikos Heracleous is active.

Publication


Featured researches published by Panikos Heracleous.


ieee automatic speech recognition and understanding workshop | 2003

Accurate hidden Markov models for non-audible murmur (NAM) recognition based on iterative supervised adaptation

Panikos Heracleous; Yoshitaka Nakajima; Akinobu Lee; Hiroshi Saruwatari; Kiyohiro Shikano

In previous works, we introduced a special device (Non-Audible Murmur (NATM) microphone) able to detect very quietly uttered speech (murmur), which cannot be heard by listeners near the talker. Experimental results showed the efficiency of the device in NAM recognition. Using normal-speech monophone hidden Markov models (HMM) retrained with NAM data from a specific speaker, we could recognize NAM with high accuracy. Although the results were very promising, a serious problem is the HMM retraining, which requires a large amount of training data. In this paper, we introduce a new method for NAM recognition, which requires only a small amount of NAM data for training. The proposed method is based on supervised adaptation. The main difference from other adaptation approaches lies in the fact that instead of single-iteration adaptation, we use iterative adaptation (iterative supervised MLLR). Experiments prove the efficiency of the proposed method. Using normal-speech clean initial models and only 350 adaptation NAM utterances, we achieved a recognition accuracy of 88.62%, which is a very promising result. Therefore, with a small amount of adaptation data, we were able to create accurate individual HMM. We also introduce results of experiments, which show the effects of the number of iterations, the amount of adaptation data, and the regression tree classes.


IEEE Signal Processing Letters | 2009

Lip Shape and Hand Position Fusion for Automatic Vowel Recognition in Cued Speech for French

Panikos Heracleous; Noureddine Aboutabit; Denis Beautemps

Cued speech is a visual mode of communication that uses handshapes and placements in combination with the mouth movements of speech to make the phonemes of a spoken language look different from each other and clearly understandable to deaf and hearing-impaired people. The aim of cued speech is to overcome the problems of lip reading and thus enable deaf children and adults to wholly understand spoken language. Cued speech recognition requires hand gesture recognition and lip shape recognition, and also integration of the two components. This article presents hidden Markov model (HMM)-based vowel recognition as used in Cued Speech for French. Based on concatenative feature fusion and multistream HMM decision fusion, lip shape and hand position components were integrated into a single component, and automatic vowel recognition was realized. In the case of multistream HMM decision fusion, the obtained vowel classification accuracy using lip shape and hand position information was 87.6%, showing absolute improvement of 19.6% in comparison with a use restricted only to lip parameters.


EURASIP Journal on Advances in Signal Processing | 2007

Unvoiced speech recognition using tissue-conductive acoustic sensor

Panikos Heracleous; Tomomi Kaino; Hiroshi Saruwatari; Kiyohiro Shikano

We present the use of stethoscope and silicon NAM (nonaudible murmur) microphones in automatic speech recognition. NAM microphones are special acoustic sensors, which are attached behind the talkers ear and can capture not only normal (audible) speech, but also very quietly uttered speech (nonaudible murmur). As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech transform, etc.) for sound-impaired people. Using adaptation techniques and a small amount of training data, we achieved for a 20 k dictation task a word accuracy for nonaudible murmur recognition in a clean environment. In this paper, we also investigate nonaudible murmur recognition in noisy environments and the effect of the Lombard reflex on nonaudible murmur recognition. We also propose three methods to integrate audible speech and nonaudible murmur recognition using a stethoscope NAM microphone with very promising results.


ambient intelligence | 2005

A tissue-conductive acoustic sensor applied in speech recognition for privacy

Panikos Heracleous; Yoshitaka Nakajima; Hiroshi Saruwatari; Kiyohiro Shikano

In this paper, we present the Non-Audible Murmur (NAM) microphones focusing on their applications in automatic speech recognition. A NAM microphone is a special acoustic sensor attached behind the talkers ear and able to capture very quietly uttered speech (non-audible murmur) through body tissue. Previously, we reported experimental results for non-audible murmur recognition using a Stethoscope microphone in a clean environment. In this paper, we also present a more advanced NAM microphone, the so-called Silicon NAM microphone. Using a small amount of training data and adaptation approaches, we achieved a 93.9% word accuracy for a 20k vocabulary dictation task. Therefore, in situations when privacy in human-machine communication is preferable, NAM microphone can be very effectively applied for automatic recognition of speech inaudible to other listeners near the talker. Because of the nature of non-audible murmur (e.g., privacy) investigation of the behavior of NAM microphones in noisy environments is of high importance. To do this, we also conducted experiments in real and simulated noisy environments. Although, using simulated noisy data the NAM microphones show high robustness against noise, in real environments the recognition performance decreases markedly due to the effect of the Lombard reflex. In this paper, we also report experimental results showing the negative impact effect of the Lombard reflex on non-audible murmur recognition. In addition to a dictation task, we also report a keyword-spotting system based on non-audible murmur with very promising results.


international conference on acoustics, speech, and signal processing | 2011

Automatic recognition of speech without any audio information

Panikos Heracleous; Norihiro Hagita

This article introduces automatic recognition of speech without any audio information. Movements of the tongue, lips, and jaw are tracked by an Electro-Magnetic Articulography (EMA) device and are used as features to create hidden Markov models (HMMs) and conduct automatic speech recognition in a conventional way. The results obtained are promising, which confirm that phonetic features characterizing articulation are as discriminating as those characterizing acoustics (except for voicing). The results also show that using tongue parameters result in a higher accuracy compared with the lip parameters.


Speech Communication | 2010

Cued Speech automatic recognition in normal-hearing and deaf subjects

Panikos Heracleous; Denis Beautemps; Noureddine Aboutabit

This article discusses the automatic recognition of Cued Speech in French based on hidden Markov models (HMMs). Cued Speech is a visual mode which, by using hand shapes in different positions and in combination with lip patterns of speech, makes all the sounds of a spoken language clearly understandable to deaf people. The aim of Cued Speech is to overcome the problems of lipreading and thus enable deaf children and adults to understand spoken language completely. In the current study, the authors demonstrate that visible gestures are as discriminant as audible orofacial gestures. Phoneme recognition and isolated word recognition experiments have been conducted using data from a normal-hearing cuer. The results obtained were very promising, and the study has been extended by applying the proposed methods to a deaf cuer. The achieved results have not shown any significant differences compared to automatic Cued Speech recognition in a normal-hearing subject. In automatic recognition of Cued Speech, lip shape and gesture recognition are required. Moreover, the integration of the two modalities is of great importance. In this study, lip shape component is fused with hand component to realize Cued Speech recognition. Using concatenative feature fusion and multi-stream HMM decision fusion, vowel recognition, consonant recognition, and isolated word recognition experiments have been conducted. For vowel recognition, an 87.6% vowel accuracy was obtained showing a 61.3% relative improvement compared to the sole use of lip shape parameters. In the case of consonant recognition, a 78.9% accuracy was obtained showing a 56% relative improvement compared to the use of lip shape only. In addition to vowel and consonant recognition, a complete phoneme recognition experiment using concatenated feature vectors and Gaussian mixture model (GMM) discrimination was conducted, obtaining a 74.4% phoneme accuracy. Isolated word recognition experiments in both normal-hearing and deaf subjects were also conducted providing a word accuracy of 94.9% and 89%, respectively. The obtained results were compared with those obtained using audio signal, and comparable accuracies were observed.


intelligent robots and systems | 2012

Combining laser range finders and local steered response power for audio monitoring

Jani Even; Carlos Toshinori Ishi; Panikos Heracleous; Takahiro Miyashita; Norihiro Hagita

This paper presents an audio monitoring system for detecting and identifying people engaged in a conversation. The proposed method is hands-free as it uses a microphone array to acquire the sound. A particularity of the approach is the use of a laser range finder based human tracker system. The human tracker monitors the locations of people then local steered response power is used to detect the people speaking and localize precisely their mouths. Then an audio stream is created for each person and used to perform speaker identification. Experimental results show that the use of the human tracker has several benefits compared to an audio only approach.


international conference on telecommunications | 2011

Visual-speech to text conversion applicable to telephone communication for deaf individuals

Panikos Heracleous; Hiroshi Ishiguro; Norihiro Hagita

The access to communication technologies has become essential for the handicapped people. This study introduces the initial step of an automatic translation system able to translate visual speech used by deaf individuals to text, or auditory speech. A such a system would enable deaf users to communicate with each other and with normal-hearing people through telephone networks or through Internet by only using telephone devices equipped with simple cameras. In particular, this paper introduces automatic recognition and conversion to text of Cued Speech for French. Cued speech is a visual mode used for communication in the deaf society. Using hand shapes placed in different positions near the face as a complement to lipreading, all the sounds of a spoken language can be visually distinguished and perceived. Experimental results show high recognition rates for both isolated word and continuous phoneme recognition experiments in Cued Speech for French.


international conference on acoustics, speech, and signal processing | 2012

Fusion of standard and alternative acoustic sensors for robust automatic speech recognition

Panikos Heracleous; Jani Even; Carlos Toshinori Ishi; Takahiro Miyashita; Norihiro Hagita

This paper focuses on the problem of environmental noises in human-human communication and in automatic speech recognition. To deal with this problem, the use of alternative acoustic sensors -which are attached to the talker and receive the uttered speech through skin or bones- is investigated. In the current study, throat microphones and ear bone microphones are integrated with standard microphones using several fusion methods. The results obtained show that the recognition rates in noisy environments are drastically increased when these sensors are integrated with standard microphones. Moreover, the system does not show any recognition degradations in clean environments. In fact, recognition rates also increase slightly in clean environments. Using late fusion to integrate a throat microphone, an ear bone microphone, and a standard microphone, we achieved a 44% relative improvement in recognition rate in a noisy environment and a 24% relative improvement in recognition rate in a clean environment.


intelligent robots and systems | 2011

Multi-modal front-end for speaker activity detection in small meetings

Jani Even; Panikos Heracleous; Carlos Toshinori Ishi; Norihiro Hagita

Small informal meetings of two to four participants are very common in work environments. For this reason, a convenient way for recording and archiving these meetings is of great interest. In order to efficiently archive such meetings, an important task to address is to keep trace of “who talked when” during a meeting. This paper proposes a new multi-modal approach to tackle this speaker activity detection problem. One of the novelty of the proposed approach is that it uses a human tracker that relies on scanning laser range finders (LRFs) to localize the participants. This choice is especially relevant for robotic applications as robots are often equipped with LRFs for navigation purpose. In the proposed system, a table top microphone array in the center of the meeting room acquires the audio data while the LRF based human tracker monitors the movement of the participants. Then the speaker activity detection is performed using Gaussian mixture models that were trained before hand. An experiment reproducing a meeting configuration demonstrates the performance of the system for speaker activity detection. In particular, the proposed hands free system maintains an good level of performance compared to the use of close talking microphone while participants are simultaneously speaking.

Collaboration


Dive into the Panikos Heracleous's collaboration.

Top Co-Authors

Avatar

Kiyohiro Shikano

Nara Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Satoshi Nakamura

Nara Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jani Even

Nara Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Yoshitaka Nakajima

Nara Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Keiji Yasuda

National Institute of Information and Communications Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge