Is this you? Create Your Porfile

Xiangliang Zhang

King Abdullah University of Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xiangliang Zhang is active.

Explore More

Publication

Featured researches published by Xiangliang Zhang.

european conference on machine learning | 2008

Data Streaming with Affinity Propagation

Xiangliang Zhang; Cyril Furtlehner; Michèle Sebag

This paper proposed StrAP (Streaming AP), extending Affinity Propagation (AP) to data steaming. AP, a new clustering algorithm, extracts the data items, or exemplars, that best represent the dataset using a message passing method. Several steps are made to build StrAP . The first one (Weighted AP) extends AP to weighted items with no loss of generality. The second one (Hierarchical WAP) is concerned with reducing the quadratic AP complexity, by applying AP on data subsets and further applying Weighted AP on the exemplars extracted from all subsets. Finally StrAP extends Hierarchical WAP to deal with changes in the data distribution. Experiments on artificial datasets, on the Intrusion Detection benchmark (KDD99) and on a real-world problem, clustering the stream of jobs submitted to the EGEE grid system, provide a comparative validation of the approach.

network operations and management symposium | 2012

Virtual machine migration in an over-committed cloud

Xiangliang Zhang; Zon-Yin Shae; Shuai Zheng; Hani Jamjoom

While early emphasis of Infrastructure as a Service (IaaS) clouds was on providing resource elasticity to end users, providers are increasingly interested in over-committing their resources to maximize the utilization and returns of their capital investments. In principle, over-committing resources hedges that users - on average - only need a small portion of their leased resources. When such hedge fails (i.e., resource demand far exceeds available physical capacity), providers must mitigate this provider-induced overload, typically by migrating virtual machines (VMs) to underutilized physical machines. Recent works on VM placement and migration assume the availability of target physical machines [1], [2]. However, in an over-committed cloud data center, this is not the case. VM migration can even trigger cascading overloads if performed haphazardly. In this paper, we design a new VM migration algorithm (called Scattered) that minimizes VM migrations in over-committed data centers. Compared to a traditional implementation, our algorithm can balance host utilization across all time epochs. Using real-world data traces from an enterprise cloud, we show that our migration algorithm reduces the risk of overload, minimizes the number of needed migrations, and has minimal impact on communication cost between VMs.

international symposium on pervasive systems, algorithms, and networks | 2009

Attribute Normalization in Network Intrusion Detection

Wei Wang; Xiangliang Zhang; Sylvain Gombault; Svein Johan Knapskog

Anomaly intrusion detection is an important issue in computer network security. As a step of data preprocessing, attribute normalization is essential to detection performance. However, many anomaly detection methods do not normalize attributes before training and detection. Few methods consider to normalize the attributes but the question of which normalization method is more effective still remains. In this paper, we introduce four different schemes of attribute normalization to preprocess the data for anomaly intrusion detection. Three methods, k-NN, PCA as well as SVM, are then employed on the normalized data for comparison of the detection results. KDD Cup 1999 data are used to evaluate the normalization schemes and the detection methods. The systematical evaluation results show that the process of attribute normalization improves a lot the detection performance. The statistical normalization scheme is the best choice for detection if the data set is large.

IEEE Transactions on Knowledge and Data Engineering | 2014

Data Stream Clustering With Affinity Propagation

Xiangliang Zhang; Cyril Furtlehner; Cécile Germain-Renaud; Michèle Sebag

Data stream clustering provides insights into the underlying patterns of data flows. This paper focuses on selecting the best representatives from clusters of streaming data. There are two main challenges: how to cluster with the best representatives and how to handle the evolving patterns that are important characteristics of streaming data with dynamic distributions. We employ the Affinity Propagation (AP) algorithm presented in 2007 by Frey and Dueck for the first challenge, as it offers good guarantees of clustering optimality for selecting exemplars. The second challenging problem is solved by change detection. The presented StrAP algorithm combines AP with a statistical change point detection test; the clustering model is rebuilt whenever the test detects a change in the underlying data distribution. Besides the validation on two benchmark data sets, the presented algorithm is validated on a real-world application, monitoring the data flow of jobs submitted to the EGEE grid.

international conference on machine learning and cybernetics | 2004

Modeling program behaviors by hidden Markov models for intrusion detection

Wei Wang; Xiaohong Guan; Xiangliang Zhang

Intrusion detection is an important technique in the defense-in-depth network security framework and a hot topic in computer network security in recent years. In this paper, a new efficient intrusion detection method based on hidden Markov models (HMMs) is presented. HMMs are applied to model the normal program behaviors using traces of system calls issued by processes. The output probability of a sequence of system calls is calculated by the normal model built. If the probability of a sequence in a trace is below a certain threshold, the sequence is flagged as a mismatch. If the ratio between the mismatches and all the sequences in a trace exceeds another threshold, the trace is then considered as a possible intrusion. The method is implemented and tested on the sendmail system call data from the University of New Mexico. Experimental results show that the performance of the proposed method in intrusion detection is better than other methods.

international conference on data mining | 2010

K-AP: Generating Specified K Clusters by Efficient Affinity Propagation

Xiangliang Zhang; Wei Wang; Kjetil Nørvåg; Michèle Sebag

The Affinity Propagation (AP) clustering algorithm proposed by Frey and Dueck (2007) provides an understandable, nearly optimal summary of a data set. However, it suffers two major shortcomings: i) the number of clusters is vague with the user-defined parameter called self-confidence, and ii) the quadratic computational complexity. When aiming at a given number of clusters due to prior knowledge, AP has to be launched many times until an appropriate setting of self-confidence is found. The re-launched AP increases the computational cost by one order of magnitude. In this paper, we propose an algorithm, called K-AP, to exploit the immediate results of K clusters by introducing a constraint in the process of message passing. Through theoretical analysis and experimental validation, K-AP was shown to be able to directly generate K clusters as user defined, with a negligible increase of computational cost compared to AP. In the meanwhile, KAP preserves the clustering quality as AP in terms of the distortion. K-AP is more effective than k-medoids w.r.t. the distortion minimization and higher clustering purity.

Journal of Network and Computer Applications | 2009

Fast intrusion detection based on a non-negative matrix factorization model

Xiaohong Guan; Wei Wang; Xiangliang Zhang

In this paper, we present an efficient fast anomaly intrusion detection model incorporating a large amount of data from various data sources. A novel method based on non-negative matrix factorization (NMF) is presented to profile program and user behaviors of a computer system. A large amount of high-dimensional data is collected in our experiments and divided into smaller data blocks by a specific scheme. The system call data is divided into blocks by processes, while command data is divided into consecutive blocks with a fixed length. The frequencies of individual elements in each block of data are computed and placed column by column as data vectors to construct a matrix representation. NMF is employed to reduce the high-dimensional data vectors and anomaly detection can be realized as a very simple classifier in low dimensions. Experimental results show that the model presented in this paper is promising in terms of detection accuracy, computation efficiency and implementation for fast intrusion detection.

Journal of Systems and Software | 2009

Constructing attribute weights from computer audit data for effective intrusion detection

Wei Wang; Xiangliang Zhang; Sylvain Gombault

Attributes construction and selection from audit data is the first and very important step for anomaly intrusion detection. In this paper, we present several cross frequency attribute weights to model user and program behaviors for anomaly intrusion detection. The frequency attribute weights include plain term frequency (TF) and various forms of term frequency-inverse document frequency (tfidf), referred to as Ltfidf, Mtfidf and LOGtfidf. Nearest Neighbor (NN) and k-NN methods with Euclidean and Cosine distance measures as well as principal component analysis (PCA) and Chi-square test method based on these frequency attribute weights are used for anomaly detection. Extensive experiments are performed based on command data from Schonlau et al. The testing results show that the LOGtfidf weight gives better detection performance compared with plain frequency and other types of weights. By using the LOGtfidf weight, the simple NN method and PCA method achieve the better masquerade detection results than the other 7 methods in the literature while the Chi-square test consistently returns the worst results. The PCA method is suitable for fast intrusion detection because of its capability of reducing data dimensionality while NN and k-NN methods are suitable for detection of a small data set because of its no need of training process. A HTTP log data set collected in a real environment and the sendmail system call data from University of New Mexico (UNM) are used as well and the results also demonstrate the effectiveness of the LOGtfidf weight for anomaly intrusion detection.

international symposium on neural networks | 2004

A Novel Intrusion Detection Method Based on Principle Component Analysis in Computer Security

Wei Wang; Xiaohong Guan; Xiangliang Zhang

Intrusion detection is an important technique in the defense-in-depth network security framework and a hot topic in computer security in recent years. In this paper, a new intrusion detection method based on Principle Component Analysis (PCA) with low overhead and high efficiency is presented. System call data and command sequences data are used as information sources to validate the proposed method. The frequencies of individual system calls in a trace and individual commands in a data block are computed and then data column vectors which represent the traces and blocks of the data are formed as data input. PCA is applied to reduce the high dimensional data vectors and distance between a vector and its projection onto the subspace reduced is used for anomaly detection. Experimental results show that the proposed method is promising in terms of detection accuracy, computational expense and implementation for real-time intrusion detection.

knowledge discovery and data mining | 2009

Toward autonomic grids: analyzing the job flow with affinity streaming

Xiangliang Zhang; Cyril Furtlehner; Julien Perez; Cécile Germain-Renaud; Michèle Sebag

The Affinity Propagation (AP) clustering algorithm proposed by Frey and Dueck (2007) provides an understandable, nearly optimal summary of a dataset, albeit with quadratic computational complexity. This paper, motivated by Autonomic Computing, extends AP to the data streaming framework. Firstly a hierarchical strategy is used to reduce the complexity to O(N1+ε); the distortion loss incurred is analyzed in relation with the dimension of the data items. Secondly, a coupling with a change detection test is used to cope with non-stationary data distribution, and rebuild the model as needed. The presented approach StrAP is applied to the stream of jobs submitted to the EGEE Grid, providing an understandable description of the job flow and enabling the system administrator to spot online some sources of failures.

Explore More