A. Nur Zincir-Heywood

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where A. Nur Zincir-Heywood is active.

Explore More

Publication

Featured researches published by A. Nur Zincir-Heywood.

computational intelligence and security | 2009

Machine learning based encrypted traffic classification: Identifying SSH and Skype

Riyad Alshammari; A. Nur Zincir-Heywood

The objective of this work is to assess the robustness of machine learning based traffic classification for classifying encrypted traffic where SSH and Skype are taken as good representatives of encrypted traffic. Here what we mean by robustness is that the classifiers are trained on data from one network but tested on data from an entirely different network. To this end, five learning algorithms — AdaBoost, Support Vector Machine, Naïe Bayesian, RIPPER and C4.5 — are evaluated using flow based features, where IP addresses, source/destination ports and payload information are not employed. Results indicate the C4.5 based approach performs much better than other algorithms on the identification of both SSH and Skype traffic on totally different networks.

international symposium on neural networks | 2002

Host-based intrusion detection using self-organizing maps

Peter Lichodzijewski; A. Nur Zincir-Heywood; Malcolm I. Heywood

Hierarchical SOMs are applied to the problem of host based intrusion detection on computer networks. Unlike systems based on operating system audit trails, the approach operates on real-time data without extensive off-line training and with minimal expert knowledge. Specific recommendations are made regarding the representation of time, network parameters and SOM architecture.

knowledge discovery and data mining | 2009

Clustering event logs using iterative partitioning

Adetokunbo Makanju; A. Nur Zincir-Heywood; Evangelos E. Milios

The importance of event logs, as a source of information in systems and network management cannot be overemphasized. With the ever increasing size and complexity of todays event logs, the task of analyzing event logs has become cumbersome to carry out manually. For this reason recent research has focused on the automatic analysis of these log files. In this paper we present IPLoM (Iterative Partitioning Log Mining), a novel algorithm for the mining of clusters from event logs. Through a 3-Step hierarchical partitioning process IPLoM partitions log data into its respective clusters. In its 4th and final stage IPLoM produces cluster descriptions or line formats for each of the clusters produced. Unlike other similar algorithms IPLoM is not based on the Apriori algorithm and it is able to find clusters in data whether or not its instances appear frequently. Evaluations show that IPLoM outperforms the other algorithms statistically significantly, and it is also able to achieve an average F-Measure performance 78% when the closest other algorithm achieves an F-Measure performance of 10%.

Computer Networks | 2011

Can encrypted traffic be identified without port numbers, IP addresses and payload inspection?

Riyad Alshammari; A. Nur Zincir-Heywood

Abstract Identifying encrypted application traffic represents an important issue for many network tasks including quality of service, firewall enforcement and security. Solutions should ideally be both simple – therefore efficient to deploy – and accurate. This paper presents a machine learning based approach employing simple packet header feature sets and statistical flow feature sets without using the IP addresses, source/destination ports and payload information to unveil encrypted application tunnels in network traffic. We demonstrate the effectiveness of our approach as a forensic analysis tool on two encrypted applications, Secure SHell (SSH) and Skype, using traces captured from entirely different networks. Results indicate that it is possible to identify encrypted traffic tunnels with high accuracy without inspecting payload, IP addresses and port numbers. Moreover, it is also possible to identify which services run in encrypted tunnels.

web information and data management | 2005

Narrative text classification for automatic key phrase extraction in web document corpora

Yongzheng Zhang; A. Nur Zincir-Heywood; Evangelos E. Milios

Automatic key phrase extraction is a useful tool in many text related applications such as clustering and summarization. State-of-the-art methods are aimed towards extracting key phrases from traditional text such as technical papers. Application of these methods on Web documents, which often contain diverse and heterogeneous contents, is of particular interest and challenge in the information age. In this work, we investigate the significance of narrative text classification in the task of automatic key phrase extraction in Web document corpora. We benchmark three methods, TFIDF, KEA, and Keyterm, used to extract key phrases from all the plain text and from only the narrative text of Web pages. ANOVA tests are used to analyze the ranking data collected in a user study using quantitative measures of acceptable percentage and quality value. The evaluation shows that key phrases extracted from the narrative text only are significantly better than those obtained from all plain text of Web pages. This demonstrates that narrative text classification is indispensable for effective key phrase extraction in Web document corpora.

genetic and evolutionary computation conference | 2003

A linear genetic programming approach to intrusion detection

Dong Song; Malcolm I. Heywood; A. Nur Zincir-Heywood

Page-based Linear Genetic Programming (GP) is proposed and implemented with two-layer Subset Selection to address a two-class intrusion detection classification problem as defined by the KDD-99 benchmark dataset. By careful adjustment of the relationship between subset layers, over fitting by individuals to specific subsets is avoided. Moreover, efficient training on a dataset of 500,000 patterns is demonstrated. Unlike the current approaches to this benchmark, the learning algorithm is also responsible for deriving useful temporal features. Following evolution, decoding of a GP individual demonstrates that the solution is unique and comparative to hand coded solutions found by experts.

international syposium on methodologies for intelligent systems | 2005

Evaluation of two systems on multi-class multi-label document classification

Xiao Luo; A. Nur Zincir-Heywood

In the world of text document classification, the most general case is that in which a document can be classified into more than one category, the multi-label problem. This paper investigates the performance of two document classification systems applied to the task of multi-class multi-label document classification. Both systems consider the pattern of co-occurrences in documents of multiple categories. One system is based on a novel sequential data representation combined with a kNN classifier designed to make use of sequence information. The other is based on the “Latent Semantic Indexing” analysis combined with the traditional kNN classifier. The experimental results show that the first system performs better than the second on multi-labeled documents, while the second performs better on uni-labeled documents. Performance therefore depends on the dataset applied and the objective of the application.

congress on evolutionary computation | 2010

Unveiling Skype encrypted tunnels using GP

Riyad Alshammari; A. Nur Zincir-Heywood

The classification of Encrypted Traffic, namely Skype, from network traffic represents a particularly challenging problem. Solutions should ideally be both simple — therefore efficient to deploy — and accurate. Recent advances to team-based Genetic Programming provide the opportunity to decompose the original problem into a subset of classifiers with non-overlapping behaviors. Thus, in this work we have investigated the identification of Skype encrypted traffic using Symbiotic Bid-Based (SBB) paradigm of team based Genetic Programming (GP) found on flow features without using IP addresses, port numbers and payload data. Evaluation of SBB-GP against C4.5 and AdaBoost — representing current best practice — indicates that SBB-GP solutions are capable of providing simpler solutions in terms number of features used and the complexity of the solution/model without sacrificing accuracy.

conference on network and service management | 2010

An investigation on the identification of VoIP traffic: Case study on Gtalk and Skype

Riyad Alshammari; A. Nur Zincir-Heywood

The classification of encrypted traffic on the fly from network traces represents a particularly challenging application domain. Recent advances in machine learning provide the opportunity to decompose the original problem into a subset of classifiers with non-overlapping behaviors, in effect providing further insight into the problem domain. Thus, the objective of this work is to classify VoIP encrypted traffic, where Gtalk and Skype applications are taken as good representatives. To this end, three different machine learning based approaches, namely, C4.5, AdaBoost and Genetic Programming (GP), are evaluated under data sets common and independent from the training condition. In this case, flow based features are employed without using the IP addresses, source/destination ports and payload information. Results indicate that C4.5 based machine learning approach has the best performance.

2009 IEEE Symposium on Computational Intelligence in Cyber Security | 2009

Generalization of signatures for SSH encrypted traffic identification

Riyad Alshammari; A. Nur Zincir-Heywood

The objective of this work is to discover generalized signatures for identifying encrypted traffic where SSH is taken as an example application. What we mean by generalized signatures is that the signatures learned by training on one network are still valid when they are applied to traffic coming from a totally different network. We identified 13 signatures and 14 flow attributes for SSH traffic classification where IP addresses, source/destination ports and payload information are not employed. The signatures are able to identify encrypted traffic with high detection rate and low false positive rate. We can achieve up to 97% DR and 0.8% FPR for identifying SSH traffic.

Explore More