Nicolai Bæk Thomsen
Aalborg University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Nicolai Bæk Thomsen.
conference of the international speech communication association | 2016
Tomi Kinnunen; Sahidullah; Ivan Kukanov; Héctor Delgado; Massimiliano Todisco; Achintya Kumar Sarkar; Nicolai Bæk Thomsen; Ville Hautamäki; Nicholas W. D. Evans; Zheng-Hua Tan
Text-dependent automatic speaker verification naturally calls for the simultaneous verification of speaker identity and spoken content. These two tasks can be achieved with automatic speaker verification (ASV) and utterance verification (UV) technologies. While both have been addressed previously in the literature, a treatment of simultaneous speaker and utterance verification with a modern, standard database is so far lacking. This is despite the burgeoning demand for voice biometrics in a plethora of practical security applications. With the goal of improving overall verification performance, this paper reports different strategies for simultaneous ASV and UV in the context of short-duration, text-dependent speaker verification. Experiments performed on the recently released RedDots corpus are reported for three different ASV systems and four different UV systems. Results show that the combination of utterance verification with automatic speaker verification is (almost) universally beneficial with significant performance improvements being observed.
3rd IFToMM Symposium on Mechanism Design for Robotics, MEDER 2015 | 2015
Zheng-Hua Tan; Nicolai Bæk Thomsen; Xiaodong Duan
In this paper we present the design and development of the social robot called iSocioBot, which is designed to achieve long-term interaction between humans and robots in a social context. iSocioBot is 149 cm tall and the mechanical body is built on top of the TurtleBot platform and designed to make people feel comfortable in its presence. All electrical components are standard off-the-shelf commercial products making a replication possible. Furthermore, the software is based on Robot Operating Software (ROS) and is made freely available. We present our experience with the design and discuss possible improvements.
conference of the international speech communication association | 2014
Nicolai Bæk Thomsen; Zheng-Hua Tan; Børge Lindberg; Søren Holdt Jensen
This paper presents a multi-modal system for finding out where to direct the attention of a social robot in a dialog scenario, which is robust against environmental sounds (door slamming, phone ringing etc.) and short speech segments. The method is based on combining voice activity detection (VAD) and sound source localization (SSL) and furthermore apply post-processing to SSL to filter out short sounds. The system is tested against a baseline system in four different real-world experiments, where different sounds are used as interfering sounds. The results are promising and show a clear improvement.
intelligent robots and systems | 2015
Nicolai Bæk Thomsen; Zheng-Hua Tan; Børge Lindberg; Søren Holdt Jensen
The use of social robots for elderly care is becoming ever more relevant, thus introducing new challenges which need to be solved to achieve acceptable performance. One fundamental task for a social robot is to move to the person of interest in order to start interacting or perform a service. In this paper we address the task of a robot having to navigate to a possibly occluded person, which needs assistance, based only on audio and range information. Our approach is based on forming a heuristic cost function which is based on combining two audio features, and then moving to the optimum position indicated by this cost function after every interaction with the person. The method is compared to a greedy approach in 20 different tasks using a loud speaker playing at approximately 60dB sound pressure level (SPL) to mimic a human speaker, and the proposed method shows superior performance. A second experiment with an increase of 13dB SPL of the loud speaker is conducted and the proposed method is able to handle this.
conference of the international speech communication association | 2016
Nicolai Bæk Thomsen; Dennis Alexander Lehmann Thomsen; Zheng-Hua Tan; Børge Lindberg; Søren Holdt Jensen
The problem of text-dependent speaker verification under noisy conditions is becoming ever more relevant, due to increased usage for authentication in real-world applications. Classical methods for noise reduction such as spectral subtraction and Wiener filtering introduce distortion and do not perform well in this setting. In this work we compare the performance of different noise reduction methods under different noise conditions in terms of speaker verification when the text is known and the system is trained on clean data (mis-matched conditions). We furthermore propose a new approach based on dictionary-based noise reduction and compare it to the baseline methods.
International Journal of Social Robotics | 2018
Zheng-Hua Tan; Nicolai Bæk Thomsen; Xiaodong Duan; Evgenios Vlachos; Sven Ewan Shepstone; Morten Højfeldt Rasmussen; Jesper Lisby Højvang
We present one way of constructing a social robot, such that it is able to interact with humans using multiple modalities. The robotic system is able to direct attention towards the dominant speaker using sound source localization and face detection, it is capable of identifying persons using face recognition and speaker identification and the system is able to communicate and engage in a dialog with humans by using speech recognition, speech synthesis and different facial expressions. The software is built upon the open-source robot operating system framework and our software is made publicly available. Furthermore, the electrical parts (sensors, laptop, base platform, etc.) are standard components, thus allowing for replicating the system. The design of the robot is unique and we justify why this design is suitable for our robot and the intended use. By making software, hardware and design accessible to everyone, we make research in social robotics available to a broader audience. To evaluate the properties and the appearance of the robot we invited users to interact with it in pairs (active interaction partner/observer) and collected their responses via an extended version of the Godspeed Questionnaire. Results suggest an overall positive impression of the robot and interaction experience, as well as significant differences in responses based on type of interaction and gender.
international conference on tools with artificial intelligence | 2017
Xiaodong Duan; Nicolai Bæk Thomsen; Zheng-Hua Tan; Børge Lindberg; Søren Holdt Jensen
One potential problem in real classification applications is that the amount of labeled training data is insufficient since it is usually time-consuming to label data manually. When multiple modalities are available, it is possible to train an initial classifier for each modality using a small amount of labeled data, and then re-train each classifier using unlabeled data associated with the labels generated from the other modalities. This can be achieved by the well-known CO-training algorithm. Assuming that two modalities are available, it only takes the information from the other modality but not that from the self modality into account when choosing data, which usually results in slow convergence of classification accuracy. This may make the CO-training procedure time-consuming. To overcome this, we present a novel modification to the original CO-training algorithm, which is concerned with how new samples are chosen at each iteration to re-train the classifiers in order to improve the convergence of classification accuracy. In our method, the new data is chosen based on the weighted scores which are generated from both modalities instead of only the scores from the other modality as in the original CO-training. We apply both the modified and original CO-training methods on multi-modal person identification task using speech and vision. Experiments on a publicly available database show that our method outperforms the original CO-training by a large margin, in terms of convergence of classification accuracy on a separate testing data set.
2016 First International Workshop on Sensing, Processing and Learning for Intelligent Machines (SPLINE) | 2016
Nicolai Bæk Thomsen; Xiaodong Duan; Zheng-Hua Tan; Børge Lindberg; Søren Holdt Jensen
Person identification is a very important task for intelligent devices when communicating or interacting with humans. A potential problem in real applications is that the amount of enrollment data is insufficient. When multiple modalities are available, it is possible to re-train the system online by exploiting the conditional independence between the modalities and thus improving classification accuracy. This can be achieved by the well-known CO-training algorithm [1]. In this work we present a novel modification to the CO-training algorithm, which is concerned with how new observations/samples are chosen at each iteration to re-train the system in order to improve the classification accuracy faster, i.e., better convergence. In our method, the new data are chosen not only based on the score from the other modality but also the score from the self modality. We demonstrate our proposed method on a multimodal person identification task using the MOBIO database, and show that it outperforms the baseline method, in terms of convergency, by a large margin.
SAE 2016 World Congress and Exhibition | 2016
Yang Zheng; Navid Shokouhi; Nicolai Bæk Thomsen; Amardeep Sathyanarayana; John Hansen
advances in computer human interaction | 2016
Ibrahim A. Hameed; Zheng-Hua Tan; Nicolai Bæk Thomsen; Xiaodong Duan