Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Gökhan Ince is active.

Publication


Featured researches published by Gökhan Ince.


intelligent robots and systems | 2008

Using binaural and spectral cues for azimuth and elevation localization

Tobias Rodemann; Gökhan Ince; Frank Joublin; Christian Goerick

It is a common assumption that with just two microphones only the azimuth angle of a sound source can be estimated and that a third, orthogonal microphone (or set of microphones) is necessary to estimate the elevation of the source. Recently, using specially designed ears and analyzing spectral cues several researchers managed to estimate sound source elevation with a binaural system. In this work, we show that with two bionic ears both azimuth and elevation angle can be determined using both binaural (e.g. IID and ITD) and spectral cues. This ability can also be used to disambiguate signals coming from the front or back. We present a detailed analysis of both azimuth and elevation localization performance for binaural and spectral cues in comparison. We demonstrate that with a small extension of a standard binaural system a basic elevation estimation capacity can be gained.


intelligent robots and systems | 2009

Ego noise suppression of a robot using template subtraction

Gökhan Ince; Kazuhiro Nakadai; Tobias Rodemann; Yuji Hasegawa; Hiroshi Tsujino; Jun-ichi Imura

While a robot is moving, the joints inevitably generate noise due to its motors, i.e. ego-motion noise. This problem is very crucial, especially in humanoid robots, because it tends to have a lot of joints and the motors are located closer to the microphones than the sound sources. In this work, we investigate methods for the prediction and suppression of the ego-motion noise. In the first part, we analyze the performance of different noise subtraction strategies, assuming that the noise prediction problem has been solved. In the second part, we present some results for a noise prediction scheme based on the current robot joint status. Performance is evaluated for a number of criteria, including Automatic Speech Recognition (ASR). We demonstrate that our method improves recognition performance during ego-motion considerably.


intelligent robots and systems | 2012

Real-time super-resolution Sound Source Localization for robots

Keisuke Nakamura; Kazuhiro Nakadai; Gökhan Ince

Sound Source Localization (SSL) is an essential function for robot audition and yields the location and number of sound sources, which are utilized for post-processes such as sound source separation. SSL for a robot in a real environment mainly requires noise-robustness, high resolution and real-time processing. A technique using microphone array processing, that is, Multiple Signal Classification based on Standard EigenValue Decomposition (SEVD-MUSIC) is commonly used for localization. We improved its robustness against noise with high power by incorporating Generalized EigenValue Decomposition (GEVD). However, GEVD-based MUSIC (GEVD-MUSIC) has mainly two issues: 1) the resolution of pre-measured Transfer Functions (TFs) determines the resolution of SSL, 2) its computational cost is expensive for real-time processing. For the first issue, we propose a TF interpolation method integrating time-domain-based and frequency-domain-based interpolation. The interpolation achieves super-resolution SSL, whose resolution is higher than that of the pre-measured TFs. For the second issue, we propose two methods, MUSIC based on Generalized Singular Value Decomposition (GSVD-MUSIC), and Hierarchical SSL (H-SSL). GSVD-MUSIC drastically reduces the computational cost while maintaining noise-robustness in localization. H-SSL also reduces the computational cost by introducing a hierarchical search algorithm instead of using greedy search in localization. These techniques are integrated into an SSL system using a robot embedded microphone array. The experimental result showed: the proposed interpolation achieved approximately 1 degree resolution although we have only TFs at 30 degree intervals, GSVD-MUSIC attained 46.4% and 40.6% of the computational cost compared to SEVD-MUSIC and GEVD-MUSIC, respectively, H-SSL reduces 59.2% computational cost in localization of a single sound source.


intelligent robots and systems | 2011

Intelligent Sound Source Localization and its application to multimodal human tracking

Keisuke Nakamura; Kazuhiro Nakadai; Futoshi Asano; Gökhan Ince

We have assessed robust tracking of humans based on intelligent Sound Source Localization (SSL) for a robot in a real environment. SSL is fundamental for robot audition, but has three issues in a real environment: robustness against noise with high power, lack of a general framework for selective listening to sound sources, and tracking of inactive and/or noisy sound sources. To address the first issue, we extended Multiple SIgnal Classification by incorporating Generalized EigenValue Decomposition (GEVD-MUSIC) so that it can deal with high power noise and can select target sound sources. To address the second issue, we proposed Sound Source Identification (SSI) based on hierarchical gaussian mixture models and integrated it with GEVD-MUSIC to realize a selective listening function. To address the third issue, we integrated audio-visual human tracking using particle filtering. Integration of these three techniques into an intelligent human tracking system showed: 1) GEVD-MUSIC improved the noise-robustness of SSL by a signal-to-noise ratio of 5–6 dB; 2) SSI performed more than 70% in F-measure even in a noisy environment; and 3) audio-visual integration improved the average tracking error by approximately 50%.


international conference on robotics and automation | 2010

A hybrid framework for ego noise cancellation of a robot

Gökhan Ince; Kazuhiro Nakadai; Tobias Rodemann; Yuji Hasegawa; Hiroshi Tsujino; Jun-ichi Imura

Noise generated due to the motion of a robot is not desired, because it deteriorates the quality and intelligibility of the sounds recorded by robot-embedded microphones. It must be reduced or cancelled to achieve automatic speech recognition with a high performance. In this work, we divide ego-motion noise problem into three subdomains of arm, leg and head motion noise, depending on their complexity and intensity levels. We investigate methods that make use of single-channel and multi-channel processing in order to suppress ego noise separately. For this purpose, a framework consisting of a microphone-array-based geometric source separation, a consequent post filtering process and a parallel module for template subtraction is used. Furthermore, a control mechanism is proposed, which is based on signal-to-noise ratio and instantaneously detected motions, to switch to the most suitable method to deal with the current type of noise. We evaluate the proposed techniques on a humanoid robot using automatic speech recognition (ASR). The preliminary results of isolated word recognition show the effectiveness of our methods by increasing the word correct rates up to 50% compared to the single channel recognition in arm and leg motion noises and up to 25% in very strong head motion noises.


international conference on signal processing | 2012

Robot audition for dynamic environments

Kazuhiro Nakadai; Gökhan Ince; Keisuke Nakamura; Hirofumi Nakajima

This paper addresses robot audition for dynamic environments, where speakers and/or a robot is moving within a dynamically-changing acoustic environment. Robot Audition studied so far assumed only stationary human-robot interaction scenes, and thus they have difficulties in coping with such dynamic environments. We recently developed new techniques for a robot to listen to several things simultaneously using its own ears even in dynamic environments; MUltiple SIgnal Classification based on Generalized Eigen-Value Decomposition (GEVD-MUSIC), Geometrically constrained High-order Decorrelation based Source Separation with Adaptive Step-size control (GHDSS-AS), Histogram-based Recursive Level Estimation (HRLE), and Template-based Ego Noise Suppression (TENS). GEVD-MUSIC provides noise-robust sound source localization. GHDSS-AS is a new sound source separation method which quickly adapts its sound source separation parameters to dynamic changes. HRLE is a practical post-filtering method with a small number of parameters. ENS estimates the motor noise of the robot by using templates recorded in advance and eliminates it. These methods are implemented as modules for our open-source robot audition software HARK to be easily integrated. We show that each of these methods and their combinations are effective to cope with dynamic environments through off-line experiments and on-line real-time demonstrations.


international conference on robotics and automation | 2011

Assessment of general applicability of ego noise estimation

Gökhan Ince; Keisuke Nakamura; Futoshi Asano; Hirofumi Nakajima; Kazuhiro Nakadai

Noise generated due to the motion of a robot deteriorates the quality of the desired sounds recorded by robot-embedded microphones. On top of that, a moving robot is also vulnerable to its loud fan noise that changes its orientation relative to the moving limbs where the microphones are mounted on. To tackle the non-stationary ego-motion noise and the direction changes of fan noise, we propose an estimation method based on instantaneous prediction of ego noise using parameterized templates. We verify the ego noise suppression capability of the proposed estimation method on a humanoid robot by evaluating it on two important applications in the framework of robot audition: (1) automatic speech recognition and (2) sound source localization. We demonstrate that our method improves recognition and localization performance during both head and arm motions considerably.


robot and human interactive communication | 2012

An active audition framework for auditory-driven HRI: Application to interactive robot dancing

João Lobato Oliveira; Gökhan Ince; Keisuke Nakamura; Kazuhiro Nakadai; Hiroshi G. Okuno; Luís Paulo Reis; Fabien Gouyon

In this paper we propose a general active audition framework for auditory-driven Human-Robot Interaction (HRI). The proposed framework simultaneously processes speech and music on-the-fly, integrates perceptual models for robot audition, and supports verbal and non-verbal interactive communication by means of (pro)active behaviors. To ensure a reliable interaction, on top of the framework a behavior decision mechanism based on active audition policies the robots actions according to the reliability of the acoustic signals for auditory processing. To validate the frameworks application to general auditory-driven HRI, we propose the implementation of an interactive robot dancing system. This system integrates three preprocessing robot audition modules: sound source localization, sound source separation, and ego noise suppression; two modules for auditory perception: live audio beat tracking and automatic speech recognition; and multi-modal behaviors for verbal and non-verbal interaction: music-driven dancing and speech-driven dialoguing. To fully assess the system, we set up experimental and interactive real-world scenarios with highly dynamic acoustic conditions, and defined a set of evaluation criteria. The experimental tests revealed accurate and robust beat tracking and speech recognition, and convincing dance beat-synchrony. The interactive sessions confirmed the fundamental role of the behavior decision mechanism for actively maintaining a robust and natural human-robot interaction.


International Journal of Social Robotics | 2015

The Effect of Embodiment in Sign Language Tutoring with Assistive Humanoid Robots

Hatice Kose; Pinar Uluer; Neziha Akalin; Rabia Yorganci; Ahmet Özkul; Gökhan Ince

This paper presents interactive games for sign language tutoring assisted by humanoid robots. The games are specially designed for children with communication impairments. In this study, different robot platforms such as a Nao H25 and a Robovie R3 humanoid robots are used to express a set of chosen signs in Turkish Sign Language using hand and arm movements. Two games involving physically and virtually embodied robots are designed. In the game involving physically embodied robot, the robot is able to communicate with the participant by recognizing colored flashcards through a camera based system and generating a selected subset of signs including motivational facial gestures, in return. A mobile version of the game is also implemented to be used as part of children’s education and therapy for the purpose of teaching signs. The humanoid robot acts as a social peer and assistant in the games to motivate the child, teach a selected set of signs, evaluate the child’s effort, and give appropriate feedback to improve the learning and recognition rate of children. Current paper presents results from the preliminary study with different test groups, where children played with the physical robot platform, R3, and a mobile game incorporating the videos of the robot performing the signs, thus the effect of assistive robot’s embodiment is analyzed within these games. The results indicate that the physical embodiment plays a significant role on improving the children’s performance, engagement and motivation.


intelligent robots and systems | 2012

Online learning for template-based multi-channel ego noise estimation

Gökhan Ince; Kazuhiro Nakadai; Keisuke Nakamura

This paper presents a system that gives a robot the ability to diminish its own disturbing noise (i.e., ego noise) by utilizing template-based ego noise estimation, an algorithm previously developed by the authors. In pursuit of an autonomous, online and adaptive template learning system in this work, we specifically focus on eliminating the requirement of an offline training session performed in advance to build the essential templates, which represent the ego noise. The idea of discriminating ego noise from all other sound sources in the environment enables the robot to learn the templates online without requiring any prior information. Based on the directionality/diffuseness of the sound sources, the robot can easily decide whether the template should be discarded because it is corrupted by external noises, or it should be inserted into the database because the template consists of pure ego noise only. Furthermore, we aim to update the template database optimally by introducing an additional time-variant forgetting factor parameter, which provides a balance between adaptivity and stability of the learning process automatically. Moreover, we enhanced the single-channel noise estimation system to be compatible with the multi-channel robot audition framework so that ego noise can be eliminated from all signals stemming from multiple sound sources respectively. We demonstrate that the proposed system allows the robot to have the ability of online template learning as well as a high performance of noise estimation and suppression for multiple sound sources.

Collaboration


Dive into the Gökhan Ince's collaboration.

Top Co-Authors

Avatar

Jun-ichi Imura

Tokyo Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Hatice Kose

Istanbul Technical University

View shared research outputs
Top Co-Authors

Avatar

Sanem Sariel

Istanbul Technical University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Baris Bayram

Istanbul Technical University

View shared research outputs
Top Co-Authors

Avatar

Rabia Yorganci

Istanbul Technical University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ahmet Özkul

Istanbul Technical University

View shared research outputs
Top Co-Authors

Avatar

Anil Ozen

Istanbul Technical University

View shared research outputs
Top Co-Authors

Avatar

Mustafa Esengün

Istanbul Technical University

View shared research outputs
Researchain Logo
Decentralizing Knowledge