Hai Thanh Nguyen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hai Thanh Nguyen is active.

Explore More

Publication

Featured researches published by Hai Thanh Nguyen.

Security and Communication Networks | 2015

Combining expert knowledge with automatic feature extraction for reliable web attack detection

Carmen Torrano-Gimenez; Hai Thanh Nguyen; Gonzalo Alvarez; Katrin Franke

In the detection of web attacks, it is necessary that Web Application Firewalls WAFs are effective, at the same time than efficient. In this paper, we propose a new methodology for web attack detection that enhances these two aspects of WAFs. It involves both feature construction and feature selection. For the feature construction phase, many professionals rely on their expert knowledge to define a set of important features, what normally leads to high and reliable attack detection rates. Nevertheless, it is a manual process and not quickly adaptive to the changing network environments. Alternatively, automatic feature construction methods such as n-grams overcome this drawback, but they provide unreliable results. Therefore, in this paper, we propose to combine expert knowledge with n-gram feature construction method for reliable and efficient web attack detection. However, the number of n-grams grows exponentially with n, which usually leads to high dimensionality problems. Hence, we propose to apply feature selection to reduce the number of redundant and irrelevant features. In particular, we study the recently proposed Generic Feature Selection GeFS measure, which has been successfully tested in intrusion detection systems. Additionally, we use several decision tree algorithms as classifiers of WAFs. The experiments are conducted on the publicly available ECML/PKDD 2007 dataset. The results show that the combination of expert knowledge and n-grams outperforms each separate technique and that the GeFS measure can greatly reduce the number of features, thus enhancing both the effectiveness and efficiency of WAFs. Copyright

Archive | 2012

Reliability in A Feature-Selection Process for Intrusion Detection

Hai Thanh Nguyen; Katrin Franke; Slobodan Petrovic

Reliability of decision making performed by a real pattern-recognition system, such as intrusion-detection systems (IDSs), is a critical issue. Previous works have analyzed the reliability of a pattern classifier trained in the learning stage. However, the reliability in feature-selection stage was not studied so far. As we believe that reliability should be taken into account at the earliest possible stages, in this chapter we focus on the reliability of feature-selection. Firstly, we analyze the main factors that affect the reliability in the feature-selection process: (i) the choice of feature-selection methods and (ii) the search strategies for relevant features. Further on, we introduce a formal definition of a reliable feature-selection process. The definition provides formal measurements of reliability in feature-selection, i.e., the steadiness of a classifier’s performance and the consistency in search for relevant features. Secondly, we propose new methods to address the main causes of unreliable feature-selection process. In particular, we introduce a new methodology of determining appropriate instances from a class of feature-selection methods. We call this class a generic-feature-selection (GeFS) measure. We also propose a new search approach that ensures the globally optimal feature subset by means of the GeFS measure. Finally, we validate our new proposed methods by applying the GeFS measure to intrusion detection systems.

soft computing | 2015

Detecting IMSI-Catcher Using Soft Computing

Thanh van Do; Hai Thanh Nguyen; Nikolov Momchil; Van Thuan Do

Lately, from a secure system providing adequate user’s protection of confidentiality and privacy, the mobile communication has been degraded to be a less trustful one due to the revelation of IMSI catchers that enable mobile phone tapping. To fight against these illegal infringements there are a lot of activities aiming at detecting these IMSI catchers. However, so far the existing solutions are only device-based and intended for the users in their self-protection. This paper presents an innovative network-based IMSI catcher solution that makes use of machine learning techniques. After giving a brief description of the IMSI catcher the paper identifies the attributes of the IMSI catcher anomaly. The challenges that the proposed system has to surmount are also explained. Last but least, the overall architecture of the proposed Machine Learning based IMSI catcher Detection system is described thoroughly.

international workshop on security | 2011

Applying feature selection to payload-based Web Application Firewalls

Carmen Torrano-Gimenez; Hai Thanh Nguyen; Gonzalo Alvarez; Slobodan Petrovic; Katrin Franke

Web Application Firewalls (WAFs) analyze the HTTP traffic in order to protect Web applications from attacks. To be effective, WAFs need to analyze the payload of the packets. One of the techniques used for intrusion detection is to extract features from the payload by means of n-grams. An n-gram is a subsequence of n items from a given sequence. The number of n-grams is 256 to the nth power. Since it grows exponentially with n, the curse of dimensionality and computational complexity problem arise. In this paper we propose to apply feature selection in order to reduce the number of features extracted by n-grams and thus to improve the effectiveness of WAFs. We conduct experiments on our own HTTP data set. After extracting n-grams from this data set, we apply the Generic-Feature-Selection (GeFS) measure for intrusion detection [5] to select important features. We use four different classifiers to test the detection accuracy before and after feature selection. The experiments show that we can remove more than 95% of irrelevant and redundant features from the original data set (and thus improve the performance by more than 80% on average), while reducing only slightly (by less than 6%) the accuracy of WAFs.

machine learning and data mining in pattern recognition | 2012

A general lp-norm support vector machine via mixed 0-1 programming

Hai Thanh Nguyen; Katrin Franke

Identifying a good feature subset that contributes most to the performance of Lp-norm Support Vector Machines (Lp-SVMs with p=1 or p=2) is an important task. We realize that the Lp-SVMs do not comprehensively consider irrelevant and redundant features, because the Lp-SVMs consider all n full-set features be important for training while skipping other 2n−1 possible feature subsets at the same time. In previous work, we have studied the L1-norm SVM and applied it to the feature selection problem. In this paper, we extend our research to the L2-norm SVM and propose to generalize the Lp-SVMs into one general Lp-norm Support Vector Machine (GLp-SVM) that takes into account all 2n possible feature subsets. We represent the GLp-SVM as a mixed 0-1 nonlinear programming problem (M01NLP). We prove that solving the new proposed M01NLP optimization problem results in a smaller error penalty and enlarges the margin between two support vector hyper-planes, thus possibly giving a better generalization capability of SVMs than solving the traditional Lp-SVMs. Moreover, by following the new formulation we can easily control the sparsity of the GLp-SVM by adding a linear constraint to the proposed M01NLP optimization problem. In order to reduce the computational complexity of directly solving the M01NLP problem, we propose to equivalently transform it into a mixed 0-1 linear programming (M01LP) problem if p=1 or into a mixed 0-1 quadratic programming (M01QP) problem if p=2. The M01LP and M01QP problems are then solved by using the branch and bound algorithm. Experimental results obtained over the UCI, LIBSVM, UNM and MIT Lincoln Lab datasets show that our new proposed GLp-SVM outperforms the traditional Lp-SVMs by improving the classification accuracy by more than 13.49%.

Cluster Computing | 2017

A big data analytics approach to combat telecommunication vulnerabilities

Kristoffer Jensen; Hai Thanh Nguyen; Thanh van Do; André Årnes

Both the telecommunication networks and the mobile communication networks are using the Signaling System No. 7 (SS7) as the nervous system. It allows mobile users to communicate using SMS and phone calls, manage billing for operators and much more. Primarily, it is a set of protocols that allows telecommunication network elements to communicate, collaborate and deliver services to its users. Deregulation and migration to IP have made SS7 vulnerable to serious attacks such as location tracking of subscribers, interception of calls and SMS, fraud, and denial of services. Unfortunately, current protection measures such as firewalls, filters, and blacklists are not able to provide adequate protection of SS7. In this paper, a method for detection of SS7 attacks using big data analytics and machine learning is proposed. The paper clarifies the vulnerabilities of SS7 networks and explains how the proposed techniques can help improve SS7 security. A proof-of-concept SS7 protection system based on big data techniques and machine learning is also described thoroughly.

IDC | 2015

Anomalous Web Payload Detection: Evaluating the Resilience of 1-Grams Based Classifiers

Sergio Pastrana; Carmen Torrano-Gimenez; Hai Thanh Nguyen; Agustin Orfila

Anomaly payload detection looks for payloads that deviate from a predefined model of normality. Defining normality requires an intelligent approach. Machine learning algorithms have been widely applied to build classifiers that distinguish normal from anomalous activity. These algorithms construct vectors of features extracted from raw payloads of a given dataset and train the classifier with them. The success of the detection highly depends on the potential of the training dataset to properly represent network traffic. In this paper we show that an adversary knowing the distribution of the dataset and the specific feature construction method may generate attack vectors evading the classifier. Particularly, in the case the classifier uses a simple feature construction method based on 1-grams, getting real-world payloads to evade the classifier is feasible.We present experimental results regarding fourwell-known classification algorithms, namely,C4.5, CART, SupportVector Machines (SVM) and MultiLayer Perceptron (MLP).

Information Sciences | 2018

BPRH: Bayesian Personalized Ranking for Heterogeneous Implicit Feedback

Huihuai Qiu; Yun Liu; Guibing Guo; Zhu Sun; Jie Zhang; Hai Thanh Nguyen

Abstract Personalized recommendation for online service systems aims to predict potential demand by analysing user preference. User preference can be inferred from heterogeneous implicit feedback (i.e. various user actions) especially when explicit feedback (i.e. ratings) is not available. However, most methods either merely focus on homogeneous implicit feedback (i.e. target action), e.g., purchase in shopping websites and forward in Twitter, or dispose heterogeneous implicit feedback without the investigation of its speciality. In this paper, we adopt two typical actions in online service systems, i.e., view and like, as auxiliary feedback to enhance recommendation performance, whereby we propose a Bayesian personalized ranking method for heterogeneous implicit feedback (BPRH). Specifically, items are first classified into different types according to the actions they received. Then by analysing the co-occurrence of different types of actions, which is one of the fundamental speciality of heterogeneous implicit feedback systems, we quantify their correlations, based on which the difference of users’ preference among different types of items is investigated. An adaptive sampling strategy is also proposed to tackle the unbalanced correlation among different actions. Extensive experimentation on three real-world datasets demonstrates that our approach significantly outperforms state-of-the-art algorithms.

international conference on it convergence and security, icitcs | 2016

Better Protection of SS7 Networks with Machine Learning

Kristoffer Jensen; Thanh van Do; Hai Thanh Nguyen; André Årnes

Deregulation and migration to IP have made SS7 vulnerable to serious attacks such as location tracking of subscribers, interception of calls and SMS, fraud, and denial of services. Unfortunately, current protection measures such as firewalls, filters, and blacklists are not able to provide adequate protection of SS7. In this paper, a method for detection of SS7 attacks using machine learning is proposed. The paper clarifies the vulnerabilities of SS7 networks and explains how machine learning techniques can help improve SS7 security. A proof- of- concept SS7 protection system using machine learning is also described thoroughly.

intelligent systems design and applications | 2012

Generic feature selection measure for botnet malware detection

Peter Ekstrand Berg; Katrin Franke; Hai Thanh Nguyen

Feature selection for botnet malware detection is an important task. In this paper, we study the recently proposed Generic-Feature-Selection (GeFS) measure [18]. Since there is no benchmark dataset of botnet malware, we conduct experiments on the dataset that is generated by using public available tools. We utilize the static and dynamic approaches [24], [29], [12] to extract features from the generated dataset and to produce two separate feature sets. We analyze the statistical properties of these feature sets to provide more insights of their nature and quality. Subsequently we determine appropriate instances of the GeFS measure for feature selection. The GeFS measure was compared experimentally with two different methods regarding the feature selection capabilities in botnet malware detection: the genetic-algorithm-CFS and the best-first-CFS algorithms. We use five different classifiers to test the detection rates and false positive rates. The experiments show that we can remove 99.9% of irrelevant and redundant features from the datasets, while keeping or yielding even better classification performances. Moreover, the GeFS measure outperforms the genetic-algorithm-CFS and the best-first-CFS methods by removing much more redundant features.

Explore More