Nadia Essoussi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nadia Essoussi is active.

Explore More

Publication

Featured researches published by Nadia Essoussi.

Bioinformation | 2008

A comparison of MSA tools.

Nadia Essoussi; Khaddouja Boujenfa; Mohamed Limam

Multiple sequence alignment (MSA) is essential in phylogenetic, evolutionary and functional analysis. Several MSA tools are available in the literature. Here, we use several MSA tools such as ClustalX, Align-m, T-Coffee, SAGA, ProbCons, MAFFT, MUSCLE and DIALIGN to illustrate comparative phylogenetic trees analysis for two datasets. Results show that there is no single MSA tool that consistently outperforms the rest in producing reliable phylogenetic trees.

Biodata Mining | 2009

Partitioning clustering algorithms for protein sequence data sets

Sondes Fayech; Nadia Essoussi; Mohamed Limam

BackgroundGenome-sequencing projects are currently producing an enormous amount of new sequences and cause the rapid increasing of protein sequence databases. The unsupervised classification of these data into functional groups or families, clustering, has become one of the principal research objectives in structural and functional genomics. Computer programs to automatically and accurately classify sequences into families become a necessity. A significant number of methods have addressed the clustering of protein sequences and most of them can be categorized in three major groups: hierarchical, graph-based and partitioning methods. Among the various sequence clustering methods in literature, hierarchical and graph-based approaches have been widely used. Although partitioning clustering techniques are extremely used in other fields, few applications have been found in the field of protein sequence clustering. It is not fully demonstrated if partitioning methods can be applied to protein sequence data and if these methods can be efficient compared to the published clustering methods.MethodsWe developed four partitioning clustering approaches using Smith-Waterman local-alignment algorithm to determine pair-wise similarities of sequences. Four different sets of protein sequences were used as evaluation data sets for the proposed methods.ResultsWe show that these methods outperform several other published clustering methods in terms of correctly predicting a classifier and especially in terms of the correctness of the provided prediction. The software is available to academic users from the authors upon request.

Archive | 2015

Overview of Overlapping Partitional Clustering Methods

Chiheb-Eddine Ben N’Cir; Guillaume Cleuziou; Nadia Essoussi

Identifying non-disjoint clusters is an important issue in clustering referred to as Overlapping Clustering. While traditional clustering methods ignore the possibility that an observation can be assigned to several groups and lead to k exhaustive and exclusive clusters representing the data, Overlapping Clustering methods offer a richer model for fitting existing structures in several applications requiring a non-disjoint partitioning. In fact, the issue of overlapping clustering has been studied since the last four decades leading to several methods in the literature adopting many usual approaches such as hierarchical, generative, graphical and k-means based approach. We review in this paper the fundamental concepts of overlapping clustering while we survey the widely known overlapping partitional clustering algorithms and the existing techniques to evaluate the quality of non-disjoint partitioning. Furthermore, a comparative theoretical and experimental study of used techniques to model overlaps is given over different multi-labeled benchmarks.

Pattern Recognition Letters | 2014

Generalization of c-means for identifying non-disjoint clusters with overlap regulation ☆

Chiheb-Eddine Ben N’Cir; Guillaume Cleuziou; Nadia Essoussi

Clustering is an unsupervised learning method that enables to fit structures in unlabeled data sets. Detecting overlapping structures is a specific challenge involving its own theoretical issues but offering relevant solutions for many application domains. This paper presents generalizations of the c-means algorithm allowing the parametrization of the overlap sizes. Two regulation principles are introduced, that aim to control the overlap shapes and sizes as regard to the number and the dispersal of the cluster concerned. The experiments performed on real world datasets show the efficiency of the proposed principles and especially the ability of the second one to build reliable overlaps with an easy tuning and whatever the requirement on the number of clusters.

international conference on computer applications technology | 2013

Identification of non-disjoint clusters with small and parameterizable overlaps

Chiheb-Eddine Ben N'cir; Guillaume Cleuziou; Nadia Essoussi

Identification of non-disjoint groups in unlabeled data sets is an important issue in clustering. Many real life applications require to find overlapping clusters in order to fit the data set structures such as clustering of films where each film can have different genres. This paper presents an overlapping k-means method refereed as Restricted-OKM (Restricted Overlapping k-means) that generalizes the well known k-means algorithm to detect overlapping clusters. The proposed method produces restricted overlapping boundaries between clusters and improves clustering accuracy to make the method adapted for clustering data with small overlaps. The proposed method is extended to control sizes of overlaps between clusters with respect to user expectations. Experiments, performed on overlapping data sets, show that proposed methods outperform OKM (Overlapping k-means) and fuzzy c-means in terms of clustering accuracy and produce clusters with small overlapping boundaries.

International Journal of Computer Applications | 2012

Overlapping Patterns Recognition with Linear and Non-Linear Separations using Positive Definite Kernels

Chiheb-Eddine Ben N'cir; Nadia Essoussi

The detection of overlapping patterns in unlabeled data sets referred as overlapping clustering is an important issue in data mining. In real life applications, overlapping clustering algorithm should be able to detect clusters with linear and non-linear separations between clusters. We propose in this paper an overlapping clustering method based k-means algorithm using positive definite kernel. The proposed method is well adapted for clustering multi label data with linear and non linear separations between clusters. Experiments, performed on overlapping data sets, show the ability of the proposed method to detect clusters with complex and non linear boundaries. Empirical results obtained with the proposed method outperforms existing overlapping methods.

international conference on computational collective intelligence | 2016

A Parallel Implementation of Relief Algorithm Using Mapreduce Paradigm

Jamila Yazidi; Waad Bouaguel; Nadia Essoussi

Feature selection is an important research topic in machine learning and pattern recognition. In recent years, data has become increasingly larger in both number of instances and number of features. In fact the number of features that can be contained in a Big Data is hard to deal with. Unfortunately, the number of features that can be processed by most classification algorithms is considerably less. As a result, it is important to develop techniques for selecting features from very large data sets. However the efficiency of existing feature selection algorithms significantly downgrades, if not totally inapplicable, when data size exceeds hundreds of gigabytes. Traditional methods like Filters, Wrappers and Embedded methods lack enough scalability to cope with datasets of millions of instances and extract successful results in a finite time. Therefore, the main purpose of this paper is to propose a new parallel feature selection framework that enable the use of feature selection methods in large datasets.

international conference on mining intelligence and knowledge exploration | 2013

Non-disjoint Cluster Analysis with Non-uniform Density

Chiheb-Eddine Ben N'cir; Nadia Essoussi

Non-disjoint clustering, also referred to as overlapping clustering, is a challenging issue in clustering which allows an observation to belong to more than one cluster. Several overlapping methods were proposed to solve this issue. Although the effectiveness of these methods to build non-disjoint partitioning, they usually fail when clusters have different densities. In order to detect overlapping clusters with uneven densities, we propose two clustering methods based on a new optimized criterion that incorporates the distance variation in a cluster to regularize the distance between a data point and the cluster representative. Experiments performed on simulated data and real world benchmarks show that proposed methods have better performance, compared to existing ones, when clusters have different densities.

Bioinformation | 2007

A comparison of four pair-wise sequence alignment methods

Nadia Essoussi; Sondes Fayech

Protein sequence alignment has become an essential task in modern molecular biology research. A number of alignment techniques have been documented in literature and their corresponding tools are made available as freeware and commercial software. The choice and use of these tools for sequence alignment through the complete interpretation of alignment results is often considered non-trivial by end-users with limited skill in Bioinformatics algorithm development. Here, we discuss the comparison of sequence alignment techniques based on dynamic programming (N-W, S-W) and heuristics (LFASTA, BL2SEQ) for four sets of sequence data towards an educational purpose. The analysis suggests that heuristics based methods are faster than dynamic programming methods in alignment speed.

Archive | 2019

Spark-Based Design of Clustering Using Particle Swarm Optimization

Mariem Moslah; Mohamed Aymen Ben HajKacem; Nadia Essoussi

Particle swarm optimization (PSO) algorithm is widely used in cluster analysis. PSO clustering has been fitted into MapReduce model and has become an effective solution for Big data. However, MapReduce is unsuitable for iterative algorithms since it requires repeated times of reading and writing to disks. In addition, PSO suffers from a low convergence speed when it approaches the global optimum region. To deal with these issues, we propose in this chapter a new Spark-based PSO clustering method. We take advantage of in-memory operations of Spark to build grouping from large-scale data. Furthermore, we propose a new version of PSO which is based on running k-means when approaching the global optimum region to accelerate the convergence. Experiments conducted on real and simulated large data sets show that the proposed method is scalable and improves the efficiency of the existing PSO methods.

Explore More