Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Walid Mahdi is active.

Publication


Featured researches published by Walid Mahdi.


intelligent data engineering and automated learning | 2009

Hand localization and fingers features extraction: application to digit recognition in sign language

A. Ben Jmaa; Walid Mahdi; Y. Ben Jemaa; A. Ben Hamadou

We present in this paper an approach of hand gesture analysis that aims at recognizing a digit. The analysis is based on extracting a set of features from a hand image and then combining them by using an induction graph. The most important features we extract from each image are the fingers locations, their heights and the distance between each pair of fingers. Our approach consists of three steps: (i) Hand localization, (ii) fingers extraction and (iii) features identification and combination to digit recognition. Each input image is assumed to contain only one hand with black background, thus we apply a classifier based on one skin color to identify the skin pixels. In the finger extraction step, we attempt to remove all the hand components except the fingers, this process is based on the hand anatomy properties. The final step is based on histogram representation of the detected fingers which results in the features identification, which results in the digit recognition. The approach is invariant to scale, rotation and translation of the hand. Some experiments have been undertaken to show the effectivness of the proposed approach.


international conference on information and communication technologies | 2008

AViTExt: Automatic Video Text Extraction; A new Approach for video content indexing Application

Baseem Bouaziz; Tarek Zlitni; Walid Mahdi

In this paper, we propose a spatial temporal video-text detection technique which proceed in two principal steps: potential text region detection and a filtering process. In the first step we divide dynamically each pair of consecutive video frames into sub block in order to detect change. A significant difference between homologous blocks implies the appearance of an important object which may be a text region. The temporal redundancy is then used to filter these regions and forms an effective text region. The experimentation driven on a variety of video sequences shows the effectiveness of our approach by obtaining a 89,39% as precision rate and 90,19 as recall.


Procedia Computer Science | 2017

Improving speech recognition using data augmentation and acoustic model fusion

Ilyes Rebai; Yessine BenAyed; Walid Mahdi; Jean-Pierre Lorré

Abstract Deep learning based systems have greatly improved the performance in speech recognition tasks, and various deep architectures and learning methods have been developed in the last few years. Along with that, Data Augmentation (DA), which is a common strategy adopted to increase the quantity of training data, has been shown to be effective for neural network training to make invariant predictions. On the other hand, Ensemble Method (EM) approaches have received considerable attention in the machine learning community to increase the effectiveness of classifiers. Therefore, we propose in this work a new Deep Neural Network (DNN) speech recognition architecture which takes advantage from both DA and EM approaches in order to improve the prediction accuracy of the system. In this paper, we first explore an existing approach based on vocal tract length perturbation and we propose a different DA technique based on feature perturbation to create a modified training data sets. Finally, EM techniques are used to integrate the posterior probabilities produced by different DNN acoustic models trained on different data sets. Experimental results demonstrate an increase in the recognition performance of the proposed system.


acs/ieee international conference on computer systems and applications | 2015

Deep architecture using Multi-Kernel Learning and multi-classifier methods

Ilyes Rebai; Yassine BenAyed; Walid Mahdi

Kernel Methods have been successfully applied in different tasks and used on a variety of data sample sizes. Multiple Kernel Learning (MKL) and Multilayer Multiple Kernel Learning (MLMKL), as new families of kernel methods, consist of learning the optimal kernel from a set of predefined kernels by using an optimization algorithm. However, learning this optimal combination is considered to be an arduous task. Furthermore, existing algorithms often do not converge to the optimal solution (i.e., weight distribution). They achieve worse results than the simplest method, which is based on the average combination of base kernels, for some real-world applications. In this paper, we present a hybrid model that integrates two methods: Support Vector Machine (SVM) and Multiple Classifier (MC) methods. More precisely, we propose a multiple classifier framework of deep SVMs for classification tasks. We adopt the MC approach to train multiple SVMs based on multiple kernel in a multi-layer structure in order to avoid solving the complicated optimization tasks. Since the average combination of kernels gives high performance, we train multiple models with a predefined combination of kernels. Indeed, we apply a specific distribution of weights for each model. To evaluate the performance of the proposed method, we conducted an extensive set of classification experiments on a number of benchmark data sets. Experimental results show the effectiveness and efficiency of the proposed method as compared to various state-of-the-art MKL and MLMKL algorithms.


international conference on signal processing and multimedia applications | 2014

Lip tracking using particle filter and geometric model for visual speech recognition

Islem Jarraya; Salah Werda; Walid Mahdi

The automatic lip-reading is a technology which helps understanding messages exchanged in the case of a noisy environment or of elderly hearing impairment. To carry out this system, we need to implement three subsystems. There is a locating and tracking lips system, labial descriptors extraction system and a classification and speech recognition system. In this work, we present a spatio-temporal approach to track and characterize lip movements for the automatic recognition of visemes of the French language. First, we segment lips using the color information and a geometric model of lips. Then, we apply a particle filter to track lip movements. Finally, we propose to extract and classify the visual informations to recognize the pronounced viseme. This approach is applied with multiple speakers in natural conditions.


international conference on image analysis and processing | 2011

A video grammar-based approach for TV news localization and intra-structure identification in TV streams

Tarek Zlitni; Walid Mahdi; Hanêne Ben-Abdallah

The growing number of TV channels led to an expansion of the mass of video documents produced and broadcast on TV channels according to precise rules (e.g. consideration of the graphic charter, recurring of studios...). Thus, the use of a priori knowledge deduced from these rules contributes to the amelioration of the quality of segmentation and indexing of video documents. However, the effectiveness of automatic video segmentation works depends on video type. So, for a better quality of the segmentation, it is necessary to consider a priori knowledge concerning video types. In this context, this paper suggests an approach based on video grammar to identify programs in TV streams and deduce their internal structure. This approach attempts to automatically extract a priori knowledge to conceive the grammar descriptors. The study case of TV news programs is selected to validate the adopted approach since it is one of the most important types of multimedia content.


international conference on image analysis and processing | 2007

Colour and Geometric based Model for Lip Localisation: Application for Lip-reading System

Salah Werda; Walid Mahdi; A. Ben Hamadou


signal-image technology and internet-based systems | 2005

Automatic Text Regions Location in Video Frames

Bassem Bouaziz; Walid Mahdi; Abdelmajid Ben Hamadou


Computer-Aided Engineering | 2008

A hybrid approach for automatic lip localization and viseme classification to enhance visual speech recognition

Walid Mahdi; Salah Werda; Abdelmajid Ben Hamadou


international conference on information and communication technologies | 2006

ALiFE: Automatic Lip Feature Extraction: A New Approach for Speech Recognition Application

S. Werda; Walid Mahdi; Mohamed Tmar; A. Ben Hamadou

Collaboration


Dive into the Walid Mahdi's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge