Ilyes Rebai | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ilyes Rebai is active.

Explore More

Publication

Featured researches published by Ilyes Rebai.

Neural Computing and Applications | 2016

Deep multilayer multiple kernel learning

Ilyes Rebai; Yassine BenAyed; Walid Mahdi

Multiple kernel learning (MKL) approach has been proposed for kernel methods and has shown high performance for solving some real-world applications. It consists on learning the optimal kernel from one layer of multiple predefined kernels. Unfortunately, this approach is not rich enough to solve relatively complex problems. With the emergence and the success of the deep learning concept, multilayer of multiple kernel learning (MLMKL) methods were inspired by the idea of deep architecture. They are introduced in order to improve the conventional MKL methods. Such architectures tend to learn deep kernel machines by exploring the combinations of multiple kernels in a multilayer structure. However, existing MLMKL methods often have trouble with the optimization of the network for two or more layers. Additionally, they do not always outperform the simplest method of combining multiple kernels (i.e., MKL). In order to improve the effectiveness of MKL approaches, we introduce, in this paper, a novel backpropagation MLMKL framework. Specifically, we propose to optimize the network over an adaptive backpropagation algorithm. We use the gradient ascent method instead of dual objective function, or the estimation of the leave-one-out error. We test our proposed method through a large set of experiments on a variety of benchmark data sets. We have successfully optimized the system over many layers. Empirical results over an extensive set of experiments show that our algorithm achieves high performance compared to the traditional MKL approach and existing MLMKL methods.

Computer Speech & Language | 2015

Text-to-speech synthesis system with Arabic diacritic recognition system

Ilyes Rebai; Yassine BenAyed

HighlightsWe developed an Arabic text-to-speech system, including a diacritization system.The speech synthesis system is based on statistical parametric.We address the accuracy of diacritic and acoustic models.We proposed a diacritization system based on the position of the current letter.Neural network per unit type based synthesis system generates high speech quality. Text-to-speech synthesis system has been widely studied for many languages. However, speech synthesis for Arabic language has not sufficient progresses and it is still in its first stage. Statistical parametric synthesis based on hidden Markov models was the most commonly applied approach for Arabic language. Recently, synthesized speech quality based on deep neural networks was found as intelligible as human voice. This paper describes a Text-To-Speech (TTS) synthesis system for modern standard Arabic language based on statistical parametric approach and Mel-cepstral coefficients. Deep neural networks achieved state-of-the-art performance in a wide range of tasks, including speech synthesis. Our TTS system includes a diacritization system which is very important for Arabic TTS application. Our diacritization system is also based on deep neural networks. In addition to the use deep techniques, different methods were also proposed to model the acoustic parameters in order to address the problem of acoustic models accuracy. They are based on linguistic and acoustic characteristics (e.g. letter position based diacritization system, unit types based synthesis system, diacritic marks based synthesis system) and based on deep learning techniques (stacked generalization techniques). Experimental results show that our diacritization system can generate a diacritized text with high accuracy. As regards the speech synthesis system, the experimental results and subjective evaluation show that our proposed method for synthesis system can generate intelligible and natural speech.

Procedia Computer Science | 2017

Improving speech recognition using data augmentation and acoustic model fusion

Ilyes Rebai; Yessine BenAyed; Walid Mahdi; Jean-Pierre Lorré

Abstract Deep learning based systems have greatly improved the performance in speech recognition tasks, and various deep architectures and learning methods have been developed in the last few years. Along with that, Data Augmentation (DA), which is a common strategy adopted to increase the quantity of training data, has been shown to be effective for neural network training to make invariant predictions. On the other hand, Ensemble Method (EM) approaches have received considerable attention in the machine learning community to increase the effectiveness of classifiers. Therefore, we propose in this work a new Deep Neural Network (DNN) speech recognition architecture which takes advantage from both DA and EM approaches in order to improve the prediction accuracy of the system. In this paper, we first explore an existing approach based on vocal tract length perturbation and we propose a different DA technique based on feature perturbation to create a modified training data sets. Finally, EM techniques are used to integrate the posterior probabilities produced by different DNN acoustic models trained on different data sets. Experimental results demonstrate an increase in the recognition performance of the proposed system.

Procedia Computer Science | 2018

A novel keyword rescoring method for improved spoken keyword spotting

Ilyes Rebai; Yassine BenAyed; Walid Mahdi

Abstract In this paper, we present a spoken KeyWord Spotting (KWS) system which creates a search index from word lattices generated by a deep speech recognizer. Basic KWS systems estimate word posteriors from the lattices and use them to make “correct/false alarm” decisions. The main issue of lattice-based posterior probability is that a putative detection can have very low posterior probability so that the decider fails to detect it and considers it as a false alarm. Therefore, our goal is to enhance the keyword decision by detecting and boosting the score of missed detections. Accordingly, inspired by template matching approach, we propose a new keyword rescoring method. More precisely, detected hits are rescored based on the acoustic similarity and the new score are used then by the decider to make the final decision. Experiments demonstrate that the proposed method potentially leads to more accurate keyword results than the conventional KWS system.

Multimedia Tools and Applications | 2018

Spoken keyword search system using improved ASR engine and novel template-based keyword scoring

Ilyes Rebai; Yassine Ben Ayed; Walid Mahdi

Keyword search for spoken documents has become more and more important nowadays due to the increasing amount of spoken data. The typical system makes use of an Automatic Speech Recognition system (ASR) and information retrieval methods. While a number of studies have been done to get the optimal system performance, KeyWord Search (KWS) systems still suffer from two main drawbacks. First, the system performance depends strongly on the ASR transcripts which are inherently inexact. Due to the speech signal variabilities, ASR systems are far from being powerful. Second, KWS systems make detection decisions based on the lattice-based posterior probability which is incomparable across keywords. In addition, posterior probabilities of true detection usually fall into different ranges which decrease the spotting performance. This paper considers the problems of ASR transcriptions and keyword detection decision based on posterior probabilities. More specifically, we propose to enhance the ASR transcripts accuracy by introducing a new ASR architecture in which we integrate data augmentation and ensemble learning techniques into a single framework. In addition, we proposed a novel keyword rescoring method that provides scores from a new perspective. Precisely, inspired by template-based KWS approach, scores of similarity between the detected keywords are computed by computing the distance between the acoustic features and are used as new scores for decision. Experiments on French and English datasets show that the proposed KWS system potentially leads to more accurate keyword results than the conventional systems.

international joint conference on neural network | 2016

Deep kernel-SVM network

Ilyes Rebai; Yassine BenAyed; Walid Mahdi

Deep learning techniques have claimed state-of-the-art results in a wide range of tasks, including classification. Despite the promising results, there are limitations for these large networks. In fact, deep neural networks have a poor generalisation performance on small data sets, such as biologic data. This paper describes a new machine learning algorithm for classification tasks. We introduce a Multi-Layer Multiple Kernel Learning (ML-MKL) framework. The input data are first transformed through a set of weighted non-linear kernel functions in a multilayer structure. Then, an SVM classifier is used to make the final decision. The proposed network is trained to minimize the error function. Indeed, we propose to optimize the network over an adaptive backpropagation algorithm. The generalization performance of the proposed method is compared over various state-of-the-art multiple kernel algorithms on several benchmark and two real world applications, including object recognition and spoken language recognition. Experimental results show that the ML-MKL generally outperforms existing kernel methods.

International Journal of Speech Technology | 2016

Arabic speech synthesis and diacritic recognition

Ilyes Rebai; Yassine BenAyed

Text-to-speech system (TTS), known also as speech synthesizer, is one of the important technology in the last years due to the expanding field of applications. Several works on speech synthesizer have been made on English and French, whereas many other languages, including Arabic, have been recently taken into consideration. The area of Arabic speech synthesis has not sufficient progress and it is still in its first stage with a low speech quality. In fact, speech synthesis systems face several problems (e.g. speech quality, articulatory effect, etc.). Different methods were proposed to solve these issues, such as the use of large and different unit sizes. This method is mainly implemented with the concatenative approach to improve the speech quality and several works have proved its effectiveness. This paper presents an efficient Arabic TTS system based on statistical parametric approach and non-uniform units speech synthesis. Our system includes a diacritization engine. Modern Arabic text is written without mention the vowels, called also diacritic marks. Unfortunately, these marks are very important to define the right pronunciation of the text which explains the incorporation of the diacritization engine to our system. In this work, we propose a simple approach based on deep neural networks. Deep neural networks are trained to directly predict the diacritic marks and to predict the spectral and prosodic parameters. Furthermore, we propose a new simple stacked neural network approach to improve the accuracy of the acoustic models. Experimental results show that our diacritization system allows the generation of full diacritized text with high precision and our synthesis system produces high-quality speech.

acs/ieee international conference on computer systems and applications | 2015

Deep architecture using Multi-Kernel Learning and multi-classifier methods

Ilyes Rebai; Yassine BenAyed; Walid Mahdi

Kernel Methods have been successfully applied in different tasks and used on a variety of data sample sizes. Multiple Kernel Learning (MKL) and Multilayer Multiple Kernel Learning (MLMKL), as new families of kernel methods, consist of learning the optimal kernel from a set of predefined kernels by using an optimization algorithm. However, learning this optimal combination is considered to be an arduous task. Furthermore, existing algorithms often do not converge to the optimal solution (i.e., weight distribution). They achieve worse results than the simplest method, which is based on the average combination of base kernels, for some real-world applications. In this paper, we present a hybrid model that integrates two methods: Support Vector Machine (SVM) and Multiple Classifier (MC) methods. More precisely, we propose a multiple classifier framework of deep SVMs for classification tasks. We adopt the MC approach to train multiple SVMs based on multiple kernel in a multi-layer structure in order to avoid solving the complicated optimization tasks. Since the average combination of kernels gives high performance, we train multiple models with a predefined combination of kernels. Indeed, we apply a specific distribution of weights for each model. To evaluate the performance of the proposed method, we conducted an extensive set of classification experiments on a number of benchmark data sets. Experimental results show the effectiveness and efficiency of the proposed method as compared to various state-of-the-art MKL and MLMKL algorithms.

acs/ieee international conference on computer systems and applications | 2017