Reda Elbarougy
Japan Advanced Institute of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Reda Elbarougy.
asia-pacific signal and information processing association annual summit and conference | 2013
Reda Elbarougy; Masato Akagi
The purpose of this study is to investigate whether emotion dimensions valence, activation, and dominance can be estimated cross-lingually. Most of the previous studies for automatic speech emotion recognition were based on detecting the emotional state working on mono-language. However, in order to develop a generalized emotion recognition system, the performance of these systems must be analyzed in mono-language as well as cross-language. The ultimate goal of this study is to build a bilingual emotion recognition system that has the ability to estimate emotion dimensions from one language using a system trained using another language. In this study, we first propose a novel acoustic feature selection method based on a human perception model. The proposed model consists of three layers: emotion dimensions in the top layer, semantic primitives in the middle layer, and acoustic features in the bottom layer. The experimental results reveal that the proposed method is effective for selecting acoustic features representing emotion dimensions, working with two different databases, one in Japanese and the other in German. Finally, the common acoustic features between the two databases are used as the input to the cross-lingual emotion recognition system. Moreover, the proposed cross-lingual system based on the three-layer model performs just as well as the two separate mono-lingual systems for estimating emotion dimensions values.
asia pacific signal and information processing association annual summit and conference | 2014
Yasuhiro Hamada; Reda Elbarougy; Masato Akagi
Speech to Speech translation (S2ST) systems are very important for processing by which a spoken utterance in one language is used to produce a spoken output in another language. In S2ST techniques, so far, linguistic information has been mainly adopted without para- and non-linguistic information (emotion, individuality and gender, etc.). Therefore, this systems have a limitation in synthesizing affective speech, for example emotional speech, instead of neutral one. To deal with affective speech, a system that can recognize and synthesize emotional speech is required. Although most studies focused on emotions categorically, emotional styles are not categorical but continuously spread in emotion space that are spanned by two dimensions (Valence and Activation). This paper proposes a method for synthesizing emotional speech based on the positions in Valence-Activation (V-A) space. In order to model relationships between acoustic features and V-A space, Fuzzy Inference Systems (FISs) were constructed. Twenty-one acoustic features were morphed using FISs. To verify whether synthesized speech can be perceived as the same intended position in V-A space, listening tests were carried out. The results indicate that the synthesized speech can give the same impression in the V-A space as the intended speech does.
2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA) | 2014
Reda Elbarougy; Han Xiao; Masato Akagi; Junfeng Li
Affective speech-to-speech translation (S2ST) is to preserve the affective state conveyed in the speakers message. The ultimate goal of this study is to construct an affective S2ST system that has the ability to transform the emotional states of a spoken utterance from one language to another language. A universal automatic speech-emotion-recognition system is required to detect emotional state regardless of language. Therefore, this study investigates commonalities and differences of emotion perception across multi-languages. Thirty subjects from three countries, Japan, China and Vietnam, evaluate three emotional speech databases, Japanese, Chinese and German, in valence-activation space. The results reveal that directions from neutral to other emotions are similar among subjects groups. However, the estimated degree of emotional state depend on the expressed emotional styles. Moreover, neutral positions were significantly different among subjects groups. Thus, directions and distances from neutral to other emotions could be adopted as features to recognize emotional states for multi-languages.
International Conference on Advanced Intelligent Systems and Informatics | 2016
Reda Elbarougy; Masato Akagi
Fuzzy Inference System (FIS) is used for pattern recognition and classification purposes in many fields such as emotion recognition. However, the performance of FIS is highly dependent on the radius of clusters which has a very important role for its recognition accuracy. Although many researcher initialize this parameter randomly which does not grantee the best performance of their systems. The purpose of this paper is to optimize FIS parameters in order to construct a high efficient system for speech emotion recognition. Therefore, an optimization algorithm based on particle swarm optimization technique is proposed for finding the best parameters of FIS classifier. In order to evaluate the proposed system it was tested using two emotional speech databases; Fujitsu and Berlin database. The simulation results show that the optimized system has high recognition accuracy for both languages with 97 % recognition accuracy for Japanese and 80 % for German database.
2016 Conference of The Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques (O-COCOSDA) | 2016
Yawen Xue; Yasuhiro Hamada; Reda Elbarougy; Masato Akagi
Commonalities and differences of human perception for perceiving emotions in speech among different languages in dimensional space have been investigated in previous work. Results show that human perception for different languages is identical in dimensional space. Directions from neutral voice to other emotional states are common among languages. According to this result, we assume that, given the same direction in dimensional space, we can convert the neutral voices in multiple languages to emotional ones with the same impression of emotion. It means that the emotion conversion system could work for other languages even if it is trained with a databases in one language. We try to convert neutral speech in two different languages, English and Chinese using an emotion conversion system trained with Japanese database. Chinese is a tone language, English is a stress language and Japanese is an accent language. We find that all converted voices can convey the same impression as Japanese voices. On the case, we can make a conclusion that given the same direction in dimensional space, the synthesized speech among multiple language can convey the same impression of emotion. In a word, the Japanese emotion conversion system is compatible to other languages.
Acoustical Science and Technology | 2014
Reda Elbarougy; Masato Akagi
asia pacific signal and information processing association annual summit and conference | 2014
Masato Akagi; Xiao Han; Reda Elbarougy; Yasuhiro Hamada; Junfeng Li
asia pacific signal and information processing association annual summit and conference | 2012
Reda Elbarougy; Masato Akagi
2015 RISP International Workshop on Nonlinear Circuits, Communications and Signal Processing (NCSP'15) | 2015
Xiao Han; Reda Elbarougy; Masato Akagi; Junfeng Li; Thi Duyen Ngo
Archive | 2014
Masato Akagi; Reda Elbarougy