Yassine BenAyed | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yassine BenAyed is active.

Explore More

Publication

Featured researches published by Yassine BenAyed.

Neural Computing and Applications | 2016

Deep multilayer multiple kernel learning

Ilyes Rebai; Yassine BenAyed; Walid Mahdi

Multiple kernel learning (MKL) approach has been proposed for kernel methods and has shown high performance for solving some real-world applications. It consists on learning the optimal kernel from one layer of multiple predefined kernels. Unfortunately, this approach is not rich enough to solve relatively complex problems. With the emergence and the success of the deep learning concept, multilayer of multiple kernel learning (MLMKL) methods were inspired by the idea of deep architecture. They are introduced in order to improve the conventional MKL methods. Such architectures tend to learn deep kernel machines by exploring the combinations of multiple kernels in a multilayer structure. However, existing MLMKL methods often have trouble with the optimization of the network for two or more layers. Additionally, they do not always outperform the simplest method of combining multiple kernels (i.e., MKL). In order to improve the effectiveness of MKL approaches, we introduce, in this paper, a novel backpropagation MLMKL framework. Specifically, we propose to optimize the network over an adaptive backpropagation algorithm. We use the gradient ascent method instead of dual objective function, or the estimation of the leave-one-out error. We test our proposed method through a large set of experiments on a variety of benchmark data sets. We have successfully optimized the system over many layers. Empirical results over an extensive set of experiments show that our algorithm achieves high performance compared to the traditional MKL approach and existing MLMKL methods.

Computer Speech & Language | 2015

Text-to-speech synthesis system with Arabic diacritic recognition system

Ilyes Rebai; Yassine BenAyed

HighlightsWe developed an Arabic text-to-speech system, including a diacritization system.The speech synthesis system is based on statistical parametric.We address the accuracy of diacritic and acoustic models.We proposed a diacritization system based on the position of the current letter.Neural network per unit type based synthesis system generates high speech quality. Text-to-speech synthesis system has been widely studied for many languages. However, speech synthesis for Arabic language has not sufficient progresses and it is still in its first stage. Statistical parametric synthesis based on hidden Markov models was the most commonly applied approach for Arabic language. Recently, synthesized speech quality based on deep neural networks was found as intelligible as human voice. This paper describes a Text-To-Speech (TTS) synthesis system for modern standard Arabic language based on statistical parametric approach and Mel-cepstral coefficients. Deep neural networks achieved state-of-the-art performance in a wide range of tasks, including speech synthesis. Our TTS system includes a diacritization system which is very important for Arabic TTS application. Our diacritization system is also based on deep neural networks. In addition to the use deep techniques, different methods were also proposed to model the acoustic parameters in order to address the problem of acoustic models accuracy. They are based on linguistic and acoustic characteristics (e.g. letter position based diacritization system, unit types based synthesis system, diacritic marks based synthesis system) and based on deep learning techniques (stacked generalization techniques). Experimental results show that our diacritization system can generate a diacritized text with high accuracy. As regards the speech synthesis system, the experimental results and subjective evaluation show that our proposed method for synthesis system can generate intelligible and natural speech.

international conference on advanced technologies for signal and image processing | 2014

Dynamic Bayesian networks for Arabic phonemes recognition

Elyes Zarrouk; Yassine BenAyed; Faiez Gargouri

The majority of current automatic speech recognition systems uses a probabilistic modeling of the speech signal by hidden Markov models (HMM). In addition, the HMM are just a special case of graphical models which are dynamic Bayesian Networks (DBN). These are modeling tools more sophisticated because they allow to include several specific variables in the problem of automatic speech recognition other than the one used in HMM. The use of DBNs in speech recognition beyond has generated much interest in recent years [1] [2] [3] [4] [5]. This paper describes a brief survey of the use of dynamic Bayesian networks (DBN) for automatic speech recognition and presents the use of the DBN on Arabic phonemes recognition comparing to HMM. The primary motivation of this work is to move away from the limitations of HMM. Performance using DBNs is found to exceed that of HMMs trained on an identical task, giving higher recognition accuracy.

international multi-conference on systems, signals and devices | 2013

Hybrid SVM/HMM model for the recognition of Arabic triphones-based continuous speech

Elyes Zarrouk; Yassine BenAyed; Faïez Gargouri

Even if the progress of Hidden Markov Models (HMM) is huge, those models lack a discriminatory ability especially on speech recognition. In order to ameliorate the results of recognition systems, we apply Support Vectors Machine (SVM) as an estimator of posterior probabilities since they are characterized by a high predictive power and discrimination. Moreover, they are based on a structural risk minimization (SRM) where the aim is to set up a classifier that minimizes a bound on the expected risk, rather than the empirical risk. In this paper, we describe the use of the hybrid model SVM/HMM for Arabic triphones-based continuous speech. Furthermore, our work incorporates the stage of preparing language models. It consists in a novel approach for automatic labeling with respect to syntax and grammar rules of the Arabic language. The best results are obtained with the proposed system SVM/HMM when we achieve 76.96% as the best recognition rate of a tested speaker. The speech recognizer was evaluated with ARABIC_DB corpus and performs at 11.42% WER as compared to 13.32% with triphones mixture-Gaussian HMM system.

software engineering artificial intelligence networking and parallel distributed computing | 2015

Graphical models for the recognition of Arabic continuous speech based triphones modeling

Elyes Zarrouk; Yassine BenAyed; Faïez Gargouri

Recent developments in inference and learning in Dynamic Bayesian networks (DBN) allow their use in real-world applications is the first successful application of DBNs to a large scale speech recognition problem. Even if their progress is huge, those models lack a discriminatory ability especially on speech recognition such as the Hidden Markov models (HMM). In this paper, we present the performance of the hybridization of Supports Vectors machine with Dynamic Bayesian networks for Arabic triphones-based continuous speech. In fact, SVM are based on a structural risk minimization (SRM) where the aim is to set up a classifier that minimizes a bound on the expected risk, rather than the empirical risk. The best results are obtained with the proposed system SVM/DBN when we achieve 78.87% as the best recognition rate of a tested speaker. The speech recognizer was evaluated with ARABIC_DB corpus and performs at 8.04% WER as compared to 10.08% with triphones mixture-Gaussian DBN system, 10.54% with hybrid model SVM/HMM and 12.03% with HMM standards.

Procedia Computer Science | 2015

Graphical Models for Multi-dialect Arabic Isolated Words Recognition

Elyes Zarrouk; Yassine BenAyed; Faiez Gargouri

Abstract This paper presents the use of multiple hybrid systems for the recognition of isolated words from a large multi-dialect Arabic vocabulary. Such as the Hidden Markov models (HMM), Dynamic Bayesian networks (DBN) lack a discriminatory ability especially on speech recognition even if their progress is huge. Multi-Layer perceptrons (MLP) was applied in literature as an estimator of emission probabilities in HMM and proves it effectiveness. In order to ameliorate the results of recognition systems, we apply Support Vectors Machine (SVM) as an estimator of posterior probabilities since they are characterized by a high predictive power and discrimination. Moreover, they are based on a structural risk minimization (SRM) where the aim is to set up a classifier that minimizes a bound on the expected risk, rather than the empirical risk. In this work we have done a comparative study between three hybrid systems MLP/HMM, SVM/HMM and SVM/DBN and the standards models of HMM and DBN. In this paper, we describe the use of the hybrid model SVM/DBN for multi-dialect Arabic isolated words recognition. So, by using 67,132 speech files of Arabic isolated words, this work arises a comparative study of our acknowledgment system of it as the following: the use of especially the HMM standards leads to a recognition rate of 74.18%.as the average rate of 8 domains for everyone of the 4 dialects. Also, with the hybrid systems MLP/HMM and SVM/HMM we succeed in achieving the value of 77.74%.and 7806% respectively. Moreover, our proposed system SVM/DBN realizes the best performances, whereby, we achieve 87.67% as a recognition rate more than 83.01% obtained by GMM/DBN.

international conference on machine vision | 2018

Hierarchical vs non-hierarchical audio indexation and classification for video genres

Nouha Dammak; Yassine BenAyed

In this paper, Support Vector Machines (SVMs) are used for segmenting and indexing video genres based on only audio features extracted at block level, which has a prominent asset by capturing local temporal information. The main contribution of our study is to show the wide effect on the classification accuracies while using an hierarchical categorization structure based on Mel Frequency Cepstral Coefficients (MFCC) audio descriptor. In fact, the classification consists in three common video genres: sports videos, music clips and news scenes. The sub-classification may divide each genre into several multi-speaker and multi-dialect sub-genres. The validation of this approach was carried out on over 360 minutes of video span yielding a classification accuracy of over 99%.

Procedia Computer Science | 2018

A novel keyword rescoring method for improved spoken keyword spotting

Ilyes Rebai; Yassine BenAyed; Walid Mahdi

Abstract In this paper, we present a spoken KeyWord Spotting (KWS) system which creates a search index from word lattices generated by a deep speech recognizer. Basic KWS systems estimate word posteriors from the lattices and use them to make “correct/false alarm” decisions. The main issue of lattice-based posterior probability is that a putative detection can have very low posterior probability so that the decider fails to detect it and considers it as a false alarm. Therefore, our goal is to enhance the keyword decision by detecting and boosting the score of missed detections. Accordingly, inspired by template matching approach, we propose a new keyword rescoring method. More precisely, detected hits are rescored based on the acoustic similarity and the new score are used then by the decider to make the final decision. Experiments demonstrate that the proposed method potentially leads to more accurate keyword results than the conventional KWS system.

international joint conference on neural network | 2016

Deep kernel-SVM network

Ilyes Rebai; Yassine BenAyed; Walid Mahdi

Deep learning techniques have claimed state-of-the-art results in a wide range of tasks, including classification. Despite the promising results, there are limitations for these large networks. In fact, deep neural networks have a poor generalisation performance on small data sets, such as biologic data. This paper describes a new machine learning algorithm for classification tasks. We introduce a Multi-Layer Multiple Kernel Learning (ML-MKL) framework. The input data are first transformed through a set of weighted non-linear kernel functions in a multilayer structure. Then, an SVM classifier is used to make the final decision. The proposed network is trained to minimize the error function. Indeed, we propose to optimize the network over an adaptive backpropagation algorithm. The generalization performance of the proposed method is compared over various state-of-the-art multiple kernel algorithms on several benchmark and two real world applications, including object recognition and spoken language recognition. Experimental results show that the ML-MKL generally outperforms existing kernel methods.

International Journal of Speech Technology | 2016

Arabic speech synthesis and diacritic recognition

Ilyes Rebai; Yassine BenAyed

Text-to-speech system (TTS), known also as speech synthesizer, is one of the important technology in the last years due to the expanding field of applications. Several works on speech synthesizer have been made on English and French, whereas many other languages, including Arabic, have been recently taken into consideration. The area of Arabic speech synthesis has not sufficient progress and it is still in its first stage with a low speech quality. In fact, speech synthesis systems face several problems (e.g. speech quality, articulatory effect, etc.). Different methods were proposed to solve these issues, such as the use of large and different unit sizes. This method is mainly implemented with the concatenative approach to improve the speech quality and several works have proved its effectiveness. This paper presents an efficient Arabic TTS system based on statistical parametric approach and non-uniform units speech synthesis. Our system includes a diacritization engine. Modern Arabic text is written without mention the vowels, called also diacritic marks. Unfortunately, these marks are very important to define the right pronunciation of the text which explains the incorporation of the diacritization engine to our system. In this work, we propose a simple approach based on deep neural networks. Deep neural networks are trained to directly predict the diacritic marks and to predict the spectral and prosodic parameters. Furthermore, we propose a new simple stacked neural network approach to improve the accuracy of the acoustic models. Experimental results show that our diacritization system allows the generation of full diacritized text with high precision and our synthesis system produces high-quality speech.

Explore More