Sarah M. Erfani | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sarah M. Erfani is active.

Explore More

Publication

Featured researches published by Sarah M. Erfani.

IEEE Transactions on Fuzzy Systems | 2018

Ensemble Fuzzy Clustering Using Cumulative Aggregation on Random Projections

Punit Rathore; James C. Bezdek; Sarah M. Erfani; Sutharshan Rajasegarar; Marimuthu Palaniswami

Random projection is a popular method for dimensionality reduction due to its simplicity and efficiency. In the past few years, random projection and fuzzy c-means based cluster ensemble approaches have been developed for high-dimensional data clustering. However, they require large amounts of space for storing a big affinity matrix, and incur large computation time while clustering in this affinity matrix. In this paper, we propose a new random projection, fuzzy c-means based cluster ensemble framework for high-dimensional data. Our framework uses cumulative agreement to aggregate fuzzy partitions. Fuzzy partitions of random projections are ranked using external and internal cluster validity indices. The best partition in the ranked queue is the core (or base) partition. Remaining partitions then provide cumulative inputs to the core, thus, arriving at a consensus best overall partition built from the ensemble. Experimental results with Gaussian mixture datasets and a variety of real datasets demonstrate that our approach outperforms three state-of-the-art methods in terms of accuracy and space-time complexity. Our algorithm runs one to two orders of magnitude faster than other state-of-the-arts algorithms.

knowledge discovery and data mining | 2016

Unsupervised Parameter Estimation for One-Class Support Vector Machines

Zahra Ghafoori; Sutharshan Rajasegarar; Sarah M. Erfani; Shanika Karunasekera; Christopher Leckie

Although the hyper-plane based One-Class Support Vector Machine OCSVM and the hyper-spherical based Support Vector Data Description SVDD algorithms have been shown to be very effective in detecting outliers, their performance on noisy and unlabeled training data has not been widely studied. Moreover, only a few heuristic approaches have been proposed to set the different parameters of these methods in an unsupervised manner. In this paper, we propose two unsupervised methods for estimating the optimal parameter settings to train OCSVM and SVDD models, based on analysing the structure of the data. We show that our heuristic is substantially faster than existing parameter estimation approaches while its accuracy is comparable with supervised parameter learning methods, such as grid-search with cross-validation on labeled data. In addition, our proposed approaches can be used to prepare a labeled data set for a OCSVM or a SVDD from unlabeled data.

pacific-asia conference on knowledge discovery and data mining | 2014

Privacy-Preserving Collaborative Anomaly Detection for Participatory Sensing

Sarah M. Erfani; Yee Wei Law; Shanika Karunasekera; Christopher Leckie; Marimuthu Palaniswami

In collaborative anomaly detection, multiple data sources submit their data to an on-line service, in order to detect anomalies with respect to the wider population. A major challenge is how to achieve reasonable detection accuracy without disclosing the actual values of the participants’ data. We propose a lightweight and scalable privacy-preserving collaborative anomaly detection scheme called Random Multiparty Perturbation (RMP), which uses a combination of nonlinear and participant-specific linear perturbation. Each participant uses an individually perturbed uniformly distributed random matrix, in contrast to existing approaches that use a common random matrix. A privacy analysis is given for Bayesian Estimation and Independent Component Analysis attacks. Experimental results on real and synthetic datasets using an auto-encoder show that RMP yields comparable results to non-privacy preserving anomaly detection.

australasian telecommunication networks and applications conference | 2011

An efficient approach to detecting concept-evolution in network data streams

Sarah M. Erfani; Sutharshan Rajasegarar; Christopher Leckie

An important challenge in network management and intrusion detection is the problem of data stream classification to identify new and abnormal traffic flows. An open research issue in this context is concept-evolution, which involves the emergence of a new class in the data stream. Most traditional data classification techniques are based on the assumption that the number of classes does not change over time. However, that is not the case in real world networks, and existing methods generally do not have the capability of identifying the evolution of a new class in the data stream. In this paper, we present a novel approach to the detection of novel classes in data streams that exhibit concept-evolution. In particular, our approach is able to improve both accuracy and computational efficiency by eliminating “noise” clusters in the analysis of concept evolution. Through an evaluation on simulated and benchmark data sets, we demonstrate that our approach achieves comparable accuracy to an existing scheme from the literature with a significant reduction in computational complexity.

international symposium on neural networks | 2017

Improving load forecasting based on deep learning and K-shape clustering

Fateme Fahiman; Sarah M. Erfani; Sutharshan Rajasegarar; Marimuthu Palaniswami; Christopher Leckie

One of the most crucial tasks for utility companies is load forecasting in order to plan future demand for generation capacity and infrastructure. Improving load forecasting accuracy over a short period is a challenging open problem due to the variety of factors that influence the load, and the volume of data that needs to be considered. This paper proposes a new approach for short term load forecasting using an effective new combination of clustering and deep learning methods, along with a new weighted aggregation mechanism. Our evaluation using smart meter data from a publicly available real-life dataset demonstrates the improved accuracy of our approach over existing methods.

ieee international conference on fuzzy systems | 2017

An efficient visual assessment of cluster tendency tool for large-scale time series data sets

Timothy B. Iredale; Sarah M. Erfani; Christopher Leckie

Data visualization has always been a vital tool to explore and understand underlying data structures and patterns. However, emerging technologies such as the Internet of Things (IoT) have enabled the collection of very large amounts of data over time. The sheer quantity of data available challenges existing time series visualisation methods. In this paper we present an introductory analysis of time series clustering with a focus on a novel shape-based measure of similarity, which is invariant under uniform time shift and uniform amplitude scaling. Based on this measure we develop a Visual Assessment of cluster Tendency (VAT) algorithm to assess large time series data sets and demonstrate its advantages in terms of complexity and propensity for implementation in a distributed computing environment. This algorithm is implemented as a cloud application using Spark where the run-time of the high complexity dissimilarity matrix calculations are reduced by up to 7.0 times in a 16 core computing cluster with even higher speed-up factors expected for larger computing clusters.

international conference on pattern recognition | 2016

Training robust models using Random Projection

Nguyen Xuan Vinh; Sarah M. Erfani; Sakrapee Paisitkriangkrai; James Bailey; Christopher Leckie; Kotagiri Ramamohanarao

Regularization plays an important role in machine learning systems. We propose a novel methodology for model regularization using random projection. We demonstrate the technique on neural networks, since such models usually comprise a very large number of parameters, calling for strong regularizers. It has been shown recently that neural networks are sensitive to two kinds of samples: (i) adversarial samples, which are generated by imperceptible perturbations of previously correctly-classified samples—yet the network will misclassify them; and (ii) fooling samples, which are completely unrecognizable, yet the network will classify them with extremely high confidence. In this paper, we show how robust neural networks can be trained using random projection. We show that while random projection acts as a strong regularizer, boosting model accuracy similar to other regularizers, such as weight decay and dropout, it is far more robust to adversarial noise and fooling samples. We further show that random projection also helps to improve the robustness of traditional classifiers, such as Random Forrest and Gradient Boosting Machines.

australasian conference on information security and privacy | 2016

Improved Classification of Known and Unknown Network Traffic Flows Using Semi-supervised Machine Learning

Timothy Glennan; Christopher Leckie; Sarah M. Erfani

Modern network traffic classification approaches apply machine learning techniques to statistical flow properties, allowing accurate classification even when traditional approaches fail. We base our approach to the task on a state-of-the-art semi-supervised classifier to identify known and unknown flows with little labelled training data. We propose a new algorithm for mapping clusters to classes to target classes that were previously difficult to classify. We also apply alternative statistical features. We find our approach has an accuracy of 95.10i¾?%, over 17i¾?% above the technique on which it is based. Additionally, our approach improves the classification performance on every class.

international joint conference on artificial intelligence | 2018

Predicting Complex Activities from Ongoing Multivariate Time Series

Weihao Cheng; Sarah M. Erfani; Rui Zhang; Ramamohanarao Kotagiri

© 2018 International Joint Conferences on Artificial Intelligence. All right reserved. The rapid development of sensor networks enables recognition of complex activities (CAs) using multivariate time series. However, CAs are usually performed over long periods of time, which causes slow recognition by models based on fully observed data. Therefore, predicting CAs at early stages becomes an important problem. In this paper, we propose Simultaneous Complex Activities Recognition and Action Sequence Discovering (SimRAD), an algorithm which predicts a CA over time by mining a sequence of multivariate actions from sensor data using a Deep Neural Network. SimRAD simultaneously learns two probabilistic models for inferring CAs and action sequences, where the estimations of the two models are conditionally dependent on each other. SimRAD continuously predicts the CA and the action sequence, thus the predictions are mutually updated until the end of the CA. We conduct evaluations on a real-world CA dataset consisting of a rich amount of sensor data, and the results show that SimRAD outperforms state-of-the-art methods by average 7.2% in prediction accuracy with high confidence.

decision and game theory for security | 2018

Reinforcement Learning for Autonomous Defence in Software-Defined Networking

Yi Han; Benjamin I. P. Rubinstein; Tamas Abraham; Tansu Alpcan; Olivier Y. de Vel; Sarah M. Erfani; David Hubczenko; Christopher Leckie; Paul Montague

Despite the successful application of machine learning (ML) in a wide range of domains, adaptability—the very property that makes machine learning desirable—can be exploited by adversaries to contaminate training and evade classification. In this paper, we investigate the feasibility of applying a specific class of machine learning algorithms, namely, reinforcement learning (RL) algorithms, for autonomous cyber defence in software-defined networking (SDN). In particular, we focus on how an RL agent reacts towards different forms of causative attacks that poison its training process, including indiscriminate and targeted, white-box and black-box attacks. In addition, we also study the impact of the attack timing, and explore potential countermeasures such as adversarial training.

Explore More