Frank Joublin
Honda
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Frank Joublin.
International Journal of Social Robotics | 2012
Maha Salem; Stefan Kopp; Ipke Wachsmuth; Katharina J. Rohlfing; Frank Joublin
How is communicative gesture behavior in robots perceived by humans? Although gesture is crucial in social interaction, this research question is still largely unexplored in the field of social robotics. Thus, the main objective of the present work is to investigate how gestural machine behaviors can be used to design more natural communication in social robots. The chosen approach is twofold. Firstly, the technical challenges encountered when implementing a speech-gesture generation model on a robotic platform are tackled. We present a framework that enables the humanoid robot to flexibly produce synthetic speech and co-verbal hand and arm gestures at run-time, while not being limited to a predefined repertoire of motor actions. Secondly, the achieved flexibility in robot gesture is exploited in controlled experiments. To gain a deeper understanding of how communicative robot gesture might impact and shape human perception and evaluation of human-robot interaction, we conducted a between-subjects experimental study using the humanoid robot in a joint task scenario. We manipulated the non-verbal behaviors of the robot in three experimental conditions, so that it would refer to objects by utilizing either (1) unimodal (i.e., speech only) utterances, (2) congruent multimodal (i.e., semantically matching speech and gesture) or (3) incongruent multimodal (i.e., semantically non-matching speech and gesture) utterances. Our findings reveal that the robot is evaluated more positively when non-verbal behaviors such as hand and arm gestures are displayed along with speech, even if they do not semantically match the spoken utterance.
intelligent robots and systems | 2006
Tobias Rodemann; Martin Heckmann; Frank Joublin; Christian Goerick; Björn Schölling
We present a sound localization system that operates in real-time, calculates three binaural cues (IED, UD, and ITD) and integrates them in a biologically inspired fashion to a combined localization estimation. Position information is furthermore integrated over frequency channels and time. The localization system controls a head motor to fovealize on and track the dominant sound source. Due to an integrated noise-reduction module the system shows robust localization capabilities even in noisy conditions. Real-time performance is gained by multi-threaded parallel operation across different machines using a timestamp-based synchronization scheme to compensate for processing delays
intelligent robots and systems | 2008
Tobias Rodemann; Gökhan Ince; Frank Joublin; Christian Goerick
It is a common assumption that with just two microphones only the azimuth angle of a sound source can be estimated and that a third, orthogonal microphone (or set of microphones) is necessary to estimate the elevation of the source. Recently, using specially designed ears and analyzing spectral cues several researchers managed to estimate sound source elevation with a binaural system. In this work, we show that with two bionic ears both azimuth and elevation angle can be determined using both binaural (e.g. IID and ITD) and spectral cues. This ability can also be used to disambiguate signals coming from the front or back. We present a detailed analysis of both azimuth and elevation localization performance for binaural and spectral cues in comparison. We demonstrate that with a small extension of a standard binaural system a basic elevation estimation capacity can be gained.
robot and human interactive communication | 2011
Maha Salem; Katharina J. Rohlfing; Stefan Kopp; Frank Joublin
Gesture is an important feature of social interaction, frequently used by human speakers to illustrate what speech alone cannot provide, e.g. to convey referential, spatial or iconic information. Accordingly, humanoid robots that are intended to engage in natural human-robot interaction should produce speech-accompanying gestures for comprehensible and believable behavior. But how does a robots non-verbal behavior influence human evaluation of communication quality and the robot itself? To address this research question we conducted two experimental studies. Using the Honda humanoid robot we investigated how humans perceive various gestural patterns performed by the robot as they interact in a situational context. Our findings suggest that the robot is evaluated more positively when non-verbal behaviors such as hand and arm gestures are displayed along with speech. These findings were found to be enhanced when the participants were explicitly requested to direct their attention towards the robot during the interaction.
Speech Communication | 2011
Martin Heckmann; Xavier Domont; Frank Joublin; Christian Goerick
In this paper we present a hierarchical framework for the extraction of spectro-temporal acoustic features. The design of the features targets higher robustness in dynamic environments. Motivated by the large gap between human and machine performance in such conditions we take inspirations from the organization of the mammalian auditory cortex in the design of our features. This includes the joint processing of spectral and temporal information, the organization in hierarchical layers, competition between coequal features, the use of high-dimensional sparse feature spaces, and the learning of the underlying receptive fields in a data-driven manner. Due to these properties we termed the features as hierarchical spectro-temporal (HIST) features. For the learning of the features at the first layer we use Independent Component Analysis (ICA). At the second layer of our feature hierarchy we apply Non-Negative Sparse Coding (NNSC) to obtain features spanning a larger frequency and time region. We investigate the contribution of the different subparts of this feature extraction process to the overall performance. This includes an analysis of the benefits of the hierarchical processing, the comparison of different feature extraction methods on the first layer, the evaluation of the feature competition, and the investigation of the influence of different receptive field sizes on the second layer. Additionally, we compare our features to MFCC and RASTA-PLP features in a continuous digit recognition task in noise. On a wideband dataset we constructed ourselves based on the Aurora-2 task, as well as on the actual Aurora-2 database. We show that a combination of the proposed HIST features and RASTA-PLP features yields significant improvements and that the proposed features carry complementary information to RASTA-PLP and MFCC features.
IEEE Transactions on Audio, Speech, and Language Processing | 2010
Claudius Gläser; Martin Heckmann; Frank Joublin; Christian Goerick
We present a framework for estimating formant trajectories. Its focus is to achieve high robustness in noisy environments. Our approach combines a preprocessing based on functional principles of the human auditory system and a probabilistic tracking scheme. For enhancing the formant structure in spectrograms we use a Gammatone filterbank, a spectral preemphasis, as well as a spectral filtering using difference-of-Gaussians (DoG) operators. Finally, a contrast enhancement mimicking a competition between filter responses is applied. The probabilistic tracking scheme adopts the mixture modeling technique for estimating the joint distribution of formants. In conjunction with an algorithm for adaptive frequency range segmentation as well as Bayesian smoothing an efficient framework for estimating formant trajectories is derived. Comprehensive evaluations of our method on the VTR-formant database emphasize its high precision and robustness. We obtained superior performance compared to existing approaches for clean as well as echoic noisy speech. Finally, an implementation of the framework within the scope of an online system using instantaneous feature-based resynthesis demonstrates its applicability to real-world scenarios.
international conference on social robotics | 2011
Maha Salem; Friederike Anne Eyssel; Katharina J. Rohlfing; Stefan Kopp; Frank Joublin
Previous work has shown that gestural behaviors affect anthropomorphic inferences about artificial communicators such as virtual agents. In an experiment with a humanoid robot, we investigated to what extent gesture would affect anthropomorphic inferences about the robot. Particularly, we examined the effects of the robots hand and arm gestures on the attribution of typically human traits, likability of the robot, shared reality, and future contact intentions after interacting with the robot. For this, we manipulated the non-verbal behaviors of the humanoid robot in three experimental conditions: (1) no gesture, (2) congruent gesture, and (3) incongruent gesture. We hypothesized higher ratings on all dependent measures in the two gesture (vs. no gesture) conditions. The results confirm our predictions: when the robot used gestures during interaction, it was anthropomorphized more, participants perceived it as more likable, reported greater shared reality with it, and showed increased future contact intentions than when the robot gave instructions without using gestures. Surprisingly, this effect was particularly pronounced when the robots gestures were partly incongruent with speech. These findings show that communicative non-verbal behaviors in robotic systems affect both anthropomorphic perceptions and the mental models humans form of a humanoid robot during interaction.
intelligent robots and systems | 2006
Martin Heckmann; Tobias Rodemann; Frank Joublin; Christian Goerick; Björn Schölling
We propose a new approach for binaural sound source localization in real world environments implementing a new model of the precedence effect. This enables the robust measurement of the localization cue values (ITD, UD and IED) in echoic environments. The system is inspired by the auditory system of mammals. It uses a Gammatone filter bank for preprocessing and extracts the ITD and IED cues via zero crossings (UD calculation is straight forward). The mapping between the cue values and the different angles is learned offline which facilitates the adaptation to different head geometries. The performance of the system is demonstrated by localization results for two simultaneous speakers and the mixture of a speaker, music, and fan noise in a normal meeting room. A real time demonstrator of the system is presented in T. Rodemann, et al. (2006)
international conference on acoustics, speech, and signal processing | 2008
Xavier Domont; Martin Heckmann; Frank Joublin; Christian Goerick
Previously we presented an auditory-inspired feed-forward architecture which achieves good performance in noisy conditions on a segmented word recognition task. In this paper we propose to use a modified version of this hierarchical model to generate features for standard hidden Markov models. To obtain these features we firstly compute the spectrograms using a Gammatone filterbank. A filtering over the channels permits to enhance the formant frequencies which are afterwards detected using Gabor-like receptive fields. Then the responses of the receptive fields are combined to complex features which span the whole frequency range and extend over three different time windows. The features have been evaluated on a single digit recognition task. The results show that their combination with MFCCs or RASTA features yields improved recognition scores in noise.
intelligent robots and systems | 2006
Antonello Ceravola; Frank Joublin; Mark Dunn; Julian Eggert; Marcus Stein; Christian Goerick
In the field of intelligent systems, research and design approaches vary from predefined architectures to self-organizing systems. Regardless of the architectural approach, such systems may grow in size and complexity to levels where the capacities of people are strongly challenged. Such systems are commonly researched, designed and developed following several methods and with the help of a variety of software tools. In this paper we want to describe our research and development environment. It is composed of a set of tools that support our research and enable us to develop large scale intelligent systems used in our robots and in our test platforms. The main parts of our research and development environment are: the component models BBCM (brain bytes component model) and BBDM (brain bytes data model), the middleware RTBOS (real-time brain operating system), the monitoring system CMBOS (control-monitor brain operating system) and the design environment DTBOS (design tool for brain operating system). We will compare our research and development environment with others available on the market or still in research phase and we will describe some of our experiments