Elnaz Bigdeli | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Elnaz Bigdeli is active.

Explore More

Publication

Featured researches published by Elnaz Bigdeli.

conference on information and knowledge management | 2012

French presidential elections: what are the most efficient measures for tweets?

Flavien Bouillot; Pascal Poncelet; Mathieu Roche; Dino Ienco; Elnaz Bigdeli; Stan Matwin

Tweets exchanged over the Internet are an important source of information even if their characteristics make them difficult to analyze (e.g., a maximum of 140 characters; noisy data). In this paper, we address the problem of extracting relevant topics through tweets coming from different communities. More precisely we are interested to address the following question: which are the most relevant terms given a community. To answer this question we define and evaluate new variants of the traditional TF-IDF. Furthermore we also show that our measures are well suited to recommend a community affiliation to a new user. Experiments have been conducted on tweets collected during French Presidential and Legislative elections in 2012. The results underline the quality and the usefulness of our proposal.

Pattern Analysis and Applications | 2017

A fast and noise resilient cluster-based anomaly detection

Elnaz Bigdeli; Mahdi Mohammadi; Bijan Raahemi; Stan Matwin

Clustering, while systematically applied in anomaly detection, has a direct impact on the accuracy of the detection methods. Existing cluster-based anomaly detection methods are mainly based on spherical shape clustering. In this paper, we focus on arbitrary shape clustering methods to increase the accuracy of the anomaly detection. However, since the main drawback of arbitrary shape clustering is its high memory complexity, we propose to summarize clusters first. For this, we design an algorithm, called Summarization based on Gaussian Mixture Model (SGMM), to summarize clusters and represent them as Gaussian Mixture Models (GMMs). After GMMs are constructed, incoming new samples are presented to the GMMs, and their membership values are calculated, based on which the new samples are labeled as “normal” or “anomaly.” Additionally, to address the issue of noise in the data, instead of labeling samples individually, they are clustered first, and then each cluster is labeled collectively. For this, we present a new approach, called Collective Probabilistic Anomaly Detection (CPAD), in which, the distance of the incoming new samples and the existing SGMMs is calculated, and then the new cluster is labeled the same as of the closest cluster. To measure the distance of two GMM-based clusters, we propose a modified version of the Kullback–Libner measure. We run several experiments to evaluate the performances of the proposed SGMM and CPAD methods and compare them against some of the well-known algorithms including ABACUS, local outlier factor (LOF), and one-class support vector machine (SVM). The performance of SGMM is compared with ABACUS using Dunn and DB metrics, and the results indicate that the SGMM performs superior in terms of summarizing clusters. Moreover, the proposed CPAD method is compared with the LOF and one-class SVM considering the performance criteria of (a) false alarm rate, (b) detection rate, and (c) memory efficiency. The experimental results show that the CPAD method is noise resilient, memory efficient, and its accuracy is higher than the other methods.

Expert Systems With Applications | 2015

An enhanced noise resilient K-associated graph classifier

Mahdi Mohammadi; Bijan Raahemi; Saeed Adel Mehraban; Elnaz Bigdeli; Ahmad Akbari

We propose a non-parametric, noise resilient, graph-based classification algorithm.We employ relational data such as the degree of relevancy.We combine smaller components together to build larger ones.The algorithm is less noise sensitive than SVM and Decision Tree.The algorithm shows a superior performance in presence of different levels of noise. In this paper, we propose a non-parametric, noise resilient, graph-based classification algorithm. By modifying the training phase of the k-associated optimal graph algorithm, and proposing a new labeling algorithm in the testing phase, we introduce a novel approach that is robust in the presence of different level of noise. In designing the proposed classification method, each class of dataset is represented by a set of sub-graphs (components), and a new extension of the k-associated optimal graph algorithm is introduced in the training phase to combine the smaller components. With this enhancement, we demonstrate that our algorithm distinguishes between noisy and non-noisy sub-graphs. Moreover, in the testing phase, we combine relational data, such as the degree of relevancy, with non-relational attributes, such as distance, for each sample in a graph to make the proposed algorithm less sensitive to noise. Gravity formula is the main concept behind the proposed test sample with various modifications to tailor it to the arbitrary shape and non-uniform sample scattering of the graph structure. We compare the proposed method with a graph-based classifier, as well as two other well-known classifiers, namely, Decision Tree and Multi-Class Support Vector Machine. Confirmed by the t-Test score, our proposed method shows a superior performance in the presence of different levels of noise on various datasets from the UCI repository. At a noise level of 5% or higher, the proposed algorithm performs, in average, 7% better than the graph-based classification algorithm. At a noise level of 20%, the proposed method performs, in average, 5% better than Decision Tree and multi-class SVM.

Information Sciences | 2018

Incremental anomaly detection using two-layer cluster-based structure

Elnaz Bigdeli; Mahdi Mohammadi; Bijan Raahemi; Stan Matwin

Abstract Anomaly detection algorithms face several challenges, including processing speed, adapting to changes in dynamic environments, and dealing with noise in data. In this paper, a two-layer cluster-based anomaly detection structure is presented which is fast, noise-resilient and incremental. The proposed structure comprises three main steps. In the first step, the data are clustered. The second step is to represent each cluster in a way that enables the model to classify new instances. The Summarization based on Gaussian Mixture Model (SGMM) proposed in this paper represents each cluster as a GMM. In the third step, a two-layer structure efficiently updates clusters using GMM representation, while detecting and ignoring redundant instances. A new approach, called Collective Probabilistic Labeling (CPL) is presented to update clusters incrementally. This approach makes the updating phase noise-resistant and fast. An important step in the updating is the merging of new clusters with existing ones. To this end, a new distance measure is proposed, which is a modified Kullback–Leibler distance between two GMMs. In most real-time anomaly detection applications, incoming instances are often similar to previous ones. In these cases, there is no need to update clusters based on duplicates, since they have already been modeled in the cluster distribution. The two-layer structure is responsible for identifying redundant instances. Ignoring redundant instances, which are typically in the majority, makes the detection phase faster. The proposed method is found to lower the false alarm rate, which is one of the basic problems for the one-class SVM. Experiments show the false alarm rate is decreased from 5% to 15% among different datasets, while the detection rate is increased from 5% to 10% in different datasets with two-layer structure. The memory usage for the two-layer structure is 20 to 50 times less than that of one-class SVM. The one-class SVM uses support vectors in labeling new instances, while the labeling of the two-layer structure depends on the number of GMMs. The experiments show that the two-layer structure is 20 to 50 times faster than the one-class SVM in labeling new instances. Moreover, the updating time of the two-layer structure is two to three times less than for a one-layer structure. This reduction is the result of using two-layer structure and ignoring redundant instances.

canadian conference on artificial intelligence | 2016

Distributed Gaussian Mixture Model Summarization Using the MapReduce Framework

Arina Esmaeilpour; Elnaz Bigdeli; Fatemeh Cheraghchi; Bijan Raahemi; Behrouz H. Far

With an accelerating rate of data generation, sophisticated techniques are essential to meet scalability requirements. One of the promising avenues for handling large datasets is distributed storage and processing. Further, data summarization is a useful concept for managing large datasets, wherein a subset of the data can be used to provide an approximate yet useful representation. Consolidation of these tools can allow a distributed implementation of data summarization. In this paper, we achieve this by proposing and implementing a distributed Gaussian Mixture Model Summarization using the MapReduce framework MR-SGMM. In MR-SGMM, we partition input data, cluster the data within each partition with a density-based clustering algorithm called DBSCAN, and for all clusters we discover SGMM core points and their features. We test the implementation with synthetic and real datasets to demonstrate its validity and efficiency. This paves the way for a scalable implementation of Summarization using Gaussian Mixture Model SGMM.

science and information conference | 2015

A fast noise resilient anomaly detection using GMM-based collective labelling

Elnaz Bigdeli; Bijan Raahemi; Mahdi Mohammadi; Stan Matwin

Anomaly detection algorithms face several challenges including computational complexity and resiliency to noise in input data. In this paper, we propose a fast and noise-resilient cluster-based anomaly detection method using collective labelling approach. In the proposed Collective Probabilistic Anomaly Detection method, first instead of labelling each new sample (as normal or anomaly) individually, the new samples are clustered, then labelled. This collective labelling mitigates the negative impact of noise by relying on group behaviour rather than individual characteristics of incoming samples. Second, since grouping and labelling new samples may be time-consuming, we summarize clusters using Gaussian Mixture Model (GMM). Not only does GMM offer faster processing speed; it also facilitates summarizing clusters with arbitrary shape, and consequently, reducing the memory space requirement. Finally, a modified distance measure, based on Kullback-Liebner method, is proposed to calculate the similarity among clusters represented by GMMs. We evaluate the proposed method on various datasets by measuring its false alarm rate, detection rate and memory requirement. We also add different levels of noise to the input datasets to demonstrate the performance of the proposed collective anomaly detection method in the presence of noise. The experimental results confirm superior performance of the proposed method compared to individually-based labelling techniques in terms of memory usage, detection rate and false alarm rate.

international joint conference on knowledge discovery, knowledge engineering and knowledge management | 2014

Cluster Summarization with Dense Region Detection

Elnaz Bigdeli; Mahdi Mohammadi; Bijan Raahemi; Stan Matwin

This paper introduces a new approach to summarize clusters by finding dense regions, and representing each cluster as a Gaussian Mixture Model (GMM). The GMM summarization allows us to summarize a cluster efficiently, then regenerate the original data with high accuracy. Unlike the classical representation of a cluster using a radius and a center, the proposed approach keeps information of the shape, as well as distributions of the samples in the clusters. Considering the GMM as a parametric model (number of Gaussian mixtures in each GMM), we propose a method to find number of Gaussian mixtures automatically. Each GMM is able to summarize a cluster generated by any kind of clustering algorithms and regenerate the original data with high accuracy. Moreover, when a new sample is presented to the GMMs of clusters, a membership value is calculated for each cluster. Then, using the membership values, the new incoming sample is assigned to the closest cluster. Employing the GMMs to summarize clusters offers several advantages with regards to accuracy, detection rate, memory efficiency and time complexity. We evaluate the proposed method on a variety of datasets, both synthetic dataset and real datasets from the UCI repository. We examine the quality of the summarized clusters generated by the proposed method in terms of DUNN, DB, SD and SSD indexes, and compare them with that of the well-known ABACUS method. We also employ the proposed algorithm in anomaly detection applications, and study the performance of the proposed method in terms of false alarm and detection rates, and compare them with Negative Selection, Naive models, and ABACUS. Furthermore, we evaluate the memory usage and processing time of the proposed algorithms with other algorithms. The results illustrate that our algorithm outperforms other well-known anomaly detection algorithms in terms of accuracy, detection rate, as well as memory usage and processing time.

international joint conference on knowledge discovery knowledge engineering and knowledge management | 2014

A Noise Resilient and Non-parametric Graph-based Classifier

Mahdi Mohammadi; Saeed Adel Mehraban; Elnaz Bigdeli; Bijan Raahemi; Ahmad Akbari

In this paper, we propose a non-parametric and noise resilient graph-based classification algorithm. In designing the proposed method, we represent each class of dataset as a set of sub-graphs. The main part of the training phase is how to build the classification graph based on the non-parametric k-associated optimal graph algorithm which is an extension of the parametric k-associated graph algorithm. In this paper, we propose a new extension and modification of the training phase of the k-associated optimal graph algorithm. We compare the modified version of the k-associated optimal graph (MKAOG) algorithm with the original k-associated optimal graph algorithm (KAOG). The experimental results demonstrate superior performance of our proposed method in the presence of different levels of noise on various datasets from the UCI repository.

international conference on knowledge discovery and information retrieval | 2014

Arbitrary Shape Cluster Summarization with Gaussian Mixture Model.

Elnaz Bigdeli; Mahdi Mohammadi; Bijan Raahemi; Stan Matwin

One of the main concerns in the area of arbitrary shape clustering is how to summarize clusters. An accurate representation of clusters with arbitrary shapes is to characterize a cluster with all its members. However, this approach is neither practical nor efficient. In many applications such as stream data mining, preserving all samples for a long period of time in presence of thousands of incoming samples is not practical. Moreover, in the absence of labelled data, clusters are representative of each class, and in case of arbitrary shape clusters, finding the closest cluster to a new incoming sample using all objects of clusters is not accurate and efficient. In this paper, we present a new algorithm to summarize arbitrary shape clusters. Our proposed method, called SGMM, summarizes a cluster using a set of objects as core objects, then represents each cluster with corresponding Gaussian Mixture Model (GMM). Using GMM, the closest cluster to the new test sample is identified with low computational cost. We compared the proposed method with ABACUS, a well-known algorithm, in terms of time, space and accuracy for both categorization and summarization purposes. The experimental results confirm that the proposed method outperforms ABACUS on various datasets including syntactic and real datasets.

canadian conference on artificial intelligence | 2015

Incremental Cluster Updating Using Gaussian Mixture Model

Elnaz Bigdeli; Mahdi Mohammadi; Bijan Raahemi; Stan Matwin

In this paper, we present a new approach for updating clusters incrementally. The proposed incremental approach preserves comprehensive statistical information of the clusters in form of Gaussian Mixture Models (GMM). As each GMM needs the number of Gaussian (component) as an input parameter, we proposed a method to determine the number of components automatically with introducing the concept of core points. In the updating phase, instead of processing each new sample individually, we collect the new incoming samples and cluster them. By employing the concepts of core points and GMMs, we build a number of GMMs for the new samples and we label the new GMMs based on their similarity to the already existing GMMs. To find the similarity among GMMs, we introduce a new modified version of Kullback-Leibler as a distance function. For merging the current GMMs and the new GMMs, we proposed a new merging mechanism in which the closest components in both GMMs are merged to create a new GMM. Since GMM structure is a compact representation of clusters, there is no increase in the time neither in clustering side nor in updating phase. We measured the accuracy of clusters based on different clustering validity metrics (DB, Dunn, SD and purity) and the results show that our algorithm outperforms other incremental clustering algorithms in terms of quality of the final clusters.

Explore More