Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Isidoros Rodomagoulakis is active.

Publication


Featured researches published by Isidoros Rodomagoulakis.


international conference on acoustics, speech, and signal processing | 2016

Multimodal human action recognition in assistive human-robot interaction

Isidoros Rodomagoulakis; Nikolaos Kardaris; Vassilis Pitsikalis; E. Mavroudi; Athanasios Katsamanis; Antigoni Tsiami; Petros Maragos

Within the context of assistive robotics we develop an intelligent interface that provides multimodal sensory processing capabilities for human action recognition. Human action is considered in multimodal terms, containing inputs such as audio from microphone arrays, and visual inputs from high definition and depth cameras. Exploring state-of-the-art approaches from automatic speech recognition, and visual action recognition, we multimodally recognize actions and commands. By fusing the unimodal information streams, we obtain the optimum multimodal hypothesis which is to be further exploited by the active mobility assistance robot in the framework of the MOBOT EU research project. Evidence from recognition experiments shows that by integrating multiple sensors and modalities, we increase multimodal recognition performance in the newly acquired challenging dataset involving elderly people while interacting with the assistive robot.


international conference on acoustics, speech, and signal processing | 2014

Robust far-field spoken command recognition for home automation combining adaptation and multichannel processing

Athanasios Katsamanis; Isidoros Rodomagoulakis; Gerasimos Potamianos; Petros Maragos; Antigoni Tsiami

The paper presents our approach to speech-controlled home automation. We are focusing on the detection and recognition of spoken commands preceded by a key-phrase as recorded in a voice-enabled apartment by a set of multiple microphones installed in the rooms. For both problems we investigate robust modeling, environmental adaptation and multichannel processing to cope with a) insufficient training data and b) the far-field effects and noise in the apartment. The proposed integrated scheme is evaluated in a challenging and highly realistic corpus of simulated audio recordings and achieves F-measure close to 0.70 for key-phrase spotting and word accuracy close to 98% for the command recognition task.


international conference on digital signal processing | 2013

Experiments on far-field multichannel speech processing in smart homes

Isidoros Rodomagoulakis; Panagiotis Giannoulis; Z.-I. Skordilis; Petros Maragos; Gerasimos Potamianos

In this paper, we examine three problems that rise in the modern, challenging area of far-field speech processing. The developed methods for each problem, namely (a) multi-channel speech enhancement, (b) voice activity detection, and (c) speech recognition, are potentially applicable to a distant speech recognition system for voice-enabled smart home environments. The obtained results on real and simulated data, regarding the smart home speech applications, are quite promising due to the accomplished improvements made in the employed signal processing methods.


Hands-free Speech Communication and Microphone Arrays (HSCMA), 2014 4th Joint Workshop on | 2014

The Athena-RC system for speech activity detection and speaker localization in the DIRHA smart home

Panagiotis Giannoulis; Antigoni Tsiami; Isidoros Rodomagoulakis; Athanasios Katsamanis; Gerasimos Potamianos; Petros Maragos

We present our system for speech activity detection and speaker localization inside a smart home with multiple rooms equipped with microphone arrays of known geometry and placement. The smart home is developed as part of the DIRHA European funded project, providing both simulated and real data for system development and evaluation, under extremely challenging conditions of noise, reverberation, and speech overlap. Our proposed approach performs speech activity detection first, by employing multi-microphone decision fusion on traditional statistical models and acoustic features, within a Viterbi decoding framework, further assisted by signal energy- and model log-likelihood threshold-based heuristics. Then it performs speaker localization using traditional time-difference of arrival estimation between properly selected microphone pairs, further assisted by a dereverberation component. The system achieves very low detection errors, namely less than 4% (5%) for speech activity detection in the simulated (real) DIRHA corpus, and less than 10% (12%) for joint speech detection and speaker localization.


international conference on image processing | 2012

Unsupervised classification of extreme facial events using active appearance models tracking for sign language videos

Epameinondas Antonakos; Vassilis Pitsikalis; Isidoros Rodomagoulakis; Petros Maragos

We propose an Unsupervised method for Extreme States Classification (UnESC) on feature spaces of facial cues of interest. The method is built upon Active Appearance Models (AAM) face tracking and on feature extraction of Global and Local AAMs. UnESC is applied primarily on facial pose, but is shown to be extendable for the case of local models on the eyes and mouth. Given the importance of facial events in Sign Languages we apply the UnESC on videos from two sign language corpora, both American (ASL) and Greek (GSL) yielding promising qualitative and quantitative results. Apart from the detection of extreme facial states, the proposed Un-ESC also has impact for SL corpora lacking any facial annotations.


acm multimedia | 2016

A Platform for Building New Human-Computer Interface Systems that Support Online Automatic Recognition of Audio-Gestural Commands

Nikolaos Kardaris; Isidoros Rodomagoulakis; Vassilis Pitsikalis; Antonis Arvanitakis; Petros Maragos

We introduce a new framework to build human-computer interfaces that provide online automatic audio-gestural command recognition. The overall system allows the construction of a multimodal interface that recognizes user input expressed naturally as audio commands and manual gestures, captured by sensors such as Kinect. It includes a component for acquiring multimodal user data which is used as input to a module responsible for training audio-gestural models. These models are employed by the automatic recognition component, which supports online recognition of audio-visual modalities. The overall framework is exemplified by a working system use case. This demonstrates the potential of the overall software platform, which can be employed to build other new human-computer interaction systems. Moreover, users may populate libraries of models and/or data, that can be shared in the network. In this way users may reuse or extend existing systems.


international conference on image processing | 2016

A multimedia gesture dataset for human robot communication: Acquisition, tools and recognition results

Isidoros Rodomagoulakis; Nikolaos Kardaris; Vassilis Pitsikalis; Antonis Arvanitakis; Petros Maragos

Motivated by the recent advances in human-robot interaction we present a new dataset, a suite of tools to handle it and state-of-the-art work on visual gestures and audio commands recognition. The dataset has been collected with an integrated annotation and acquisition web-interface that facilitates on-the-way temporal ground-truths for fast acquisition. The dataset includes gesture instances in which the subjects are not in strict setup positions, and contains multiple scenarios, not restricted to a single static configuration. We accompany it by a valuable suite of tools as the practical interface to acquire audio-visual data in the robotic operating system, a state-of-the-art learning pipeline to train visual gesture and audio command models, and an online gesture recognition system. Finally, we include a rich evaluation of the dataset providing rich and insightfull experimental recognition results.


ieee symposium series on computational intelligence | 2016

The MOBOT rollator human-robot interaction model and user evaluation process

Eleni Efthimiou; Stavroula-Evita Fotinea; Theodore Goulas; Maria Koutsombogera; Panagiotis Karioris; Anna Vacalopoulou; Isidoros Rodomagoulakis; Petros Maragos; Costas S. Tzafestas; Vassilis Pitsikalis; Yiannis Koumpouros; Alexandra Karavasili; Panagiotis Siavelis; Foteini Koureta; Despoina Alexopoulou

In this paper we discuss the integration of a communication model in the MOBOT assistive robotic platform and its evaluation by target users. The MOBOT platform envisions the development of cognitive robotic assistant prototypes that act proactively, adaptively and interactively with respect to elderly humans with slight walking and cognitive impairments. The respective multimodal action recognition system has been developed to monitor, analyze and predict user actions with a high level of accuracy and detail. The robotic platform incorporates a human-robot communication model that has been defined with semantics of human actions in interaction, their capture and their representation in terms of behavioral patterns, to achieve an effective, natural interaction, aiming to support elderly users of slight walking and cognitive inability. The platform has been evaluated in a series of validation experiments with end users, the procedure and results of which are also presented in this paper.


european signal processing conference | 2017

On the improvement of modulation features using multi-microphone energy tracking for robust distant speech recognition

Isidoros Rodomagoulakis; Petros Maragos

In this work, we investigate robust speech energy estimation and tracking schemes aiming at improved energy-based multiband speech demodulation and feature extraction for multi-microphone distant speech recognition. Based on the spatial diversity of the speech and noise recordings of a multi-microphone setup, the proposed Multichannel, Multiband Demodulation (MMD) scheme includes: 1) energy selection across the microphones that are less affected by noise and 2) cross-signal energy estimation based on the cross-Teager energy operator. Instantaneous modulations of speech resonances are estimated on the denoised energies. Second-order frequency modulation features are measured and combined with MFCCs achieving improved distant speech recognition on simulated and real data recorded in noisy and reverberant domestic environments.


Computer Speech & Language | 2017

Room-localized spoken command recognition in multi-room, multi-microphone environments☆☆☆

Isidoros Rodomagoulakis; Athanasios Katsamanis; Gerasimos Potamianos; Panagiotis Giannoulis; Antigoni Tsiami; Petros Maragos

Abstract The paper focuses on the design of a practical system pipeline for always-listening, far-field spoken command recognition in everyday smart indoor environments that consist of multiple rooms equipped with sparsely distributed microphone arrays. Such environments, for example domestic and multi-room offices, present challenging acoustic scenes to state-of-the-art speech recognizers, especially under always-listening operation, due to low signal-to-noise ratios, frequent overlaps of target speech, acoustic events, and background noise, as well as inter-room interference and reverberation. In addition, recognition of target commands often needs to be accompanied by their spatial localization, at least at the room level, to account for users in different rooms, providing command disambiguation and room-localized feedback. To address the above requirements, the use of parallel recognition pipelines is proposed, one per room of interest. The approach is enabled by a room-dependent speech activity detection module that employs appropriate multichannel features to determine speech segments and their room of origin, feeding them to the corresponding room-dependent pipelines for further processing. These consist of the traditional cascade of far-field spoken command detection and recognition, the former based on the detection of “activating” key-phrases. Robustness to the challenging environments is pursued by a number of multichannel combination and acoustic modeling techniques, thoroughly investigated in the paper. In particular, channel selection, beamforming, and decision fusion of single-channel results are considered, with the latter performing best. Additional gains are observed, when the employed acoustic models are trained on appropriately simulated reverberant and noisy speech data, and are channel-adapted to the target environments. Further issues investigated concern the inter-dependencies of the various system components, demonstrating the superiority of joint optimization of the component tunable parameters over their separate or sequential optimization. The proposed approach is developed for the Greek language, exhibiting promising performance in real recordings in a four-room apartment, as well as a two-room office. For example, in the latter, a 76.6% command recognition accuracy is achieved on a speaker-independent test, employing a 180-sentence decoding grammar. This result represents a 46% relative improvement over conventional beamforming.

Collaboration


Dive into the Isidoros Rodomagoulakis's collaboration.

Top Co-Authors

Avatar

Petros Maragos

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Vassilis Pitsikalis

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Antigoni Tsiami

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Athanasios Katsamanis

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Nikolaos Kardaris

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Panagiotis Giannoulis

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Costas S. Tzafestas

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Antonis Arvanitakis

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Athanasia Zlatintsi

National Technical University of Athens

View shared research outputs
Researchain Logo
Decentralizing Knowledge