Shigang Liu
Deakin University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shigang Liu.
Proceedings of the Australasian Computer Science Week Multiconference on | 2017
Tingmin Wu; Shigang Liu; Jun Zhang; Yang Xiang
Twitter spam has long been a critical but difficult problem to be addressed. So far, researchers have developed a series of machine learning-based methods and blacklisting techniques to detect spamming activities on Twitter. According to our investigation, current methods and techniques have achieved the accuracy of around 80%. However, due to the problems of spam drift and information fabrication, these machine-learning based methods cannot efficiently detect spam activities in real-life scenarios. Moreover, the blacklisting method cannot catch up with the variations of spamming activities as manually inspecting suspicious URLs is extremely time-consuming. In this paper, we proposed a novel technique based on deep learning techniques to address the above challenges. The syntax of each tweet will be learned through WordVector Training Mode. We then constructed a binary classifier based on the preceding representation dataset. In experiments, we collected and implemented a 10-day real Tweet datasets in order to evaluate our proposed method. We first studied the performance of different classifiers, and then compared our method to other existing text-based methods. We found that our method largely outperformed existing methods. We further compared our method to non-text-based detection techniques. According to the experiment results, our proposed method was more accurate.
computer and communications security | 2016
Shigang Liu; Jun Zhang; Yang Xiang
Spam has become a critical problem in online social networks. This paper focuses on Twitter spam detection. Recent research works focus on applying machine learning techniques for Twitter spam detection, which make use of the statistical features of tweets. We observe existing machine learning based detection methods suffer from the problem of Twitter spam drift, i.e., the statistical properties of spam tweets vary over time. To avoid this problem, an effective solution is to train one twitter spam classifier every day. However, it faces a challenge of the small number of imbalanced training data because labelling spam samples is time-consuming. This paper proposes a new method to address this challenge. The new method employs two new techniques, fuzzy-based redistribution and asymmetric sampling. We develop a fuzzy-based information decomposition technique to re-distribute the spam class and generate more spam samples. Moreover, an asymmetric sampling technique is proposed to re-balance the sizes of spam samples and non-spam samples in the training data. Finally, we apply the ensemble technique to combine the spam classifiers over two different training sets. A number of experiments are performed on a real-world 10-day ground-truth dataset to evaluate the new method. Experiments results show that the new method can significantly improve the detection performance for drifting Twitter spam.
Computers & Security | 2017
Shigang Liu; Yu Wang; Jun Zhang; Chao Chen; Yang Xiang
In recent years, microblogging sites like Twitter have become an important and popular source for real-time information and news dissemination, and they have become a prime target of spammers inevitably. A series of incidents have shown that the security threats caused by Twitter spam can reach far beyond the social media platform to impact the real world. To mitigate the threat, a lot of recent studies apply machine learning techniques to classify Twitter spam and promising results are reported. However, most of these studies overlook the class imbalance problem in real-world Twitter data. In this paper, we experimentally demonstrate that the unequal distribution between spam and non-spam classes has a great impact on spam detection rate. To address the problem, we propose FOS, a fuzzy-based oversampling method that generates synthetic data samples from limited observed samples based on the idea of fuzzy-based information decomposition. Moreover, we develop an ensemble learning approach that learns more accurate classifiers from imbalanced data in three steps. In the first step, the class distribution in the imbalanced data set is adjusted by using various strategies, including random oversampling, random undersampling and FOS. In the second step, a classification model is built upon each of the redistributed data sets. In the final step, a majority voting scheme is introduced to combine the predictions from all the classification models. We conduct experiments on real-world Twitter data for the purpose of evaluation. The results indicate that the proposed learning approach can significantly improve the spam detection rate in data sets with imbalanced class distribution.
asian conference on intelligent information and database systems | 2016
Shigang Liu; Jun Zhang; Yu Wang; Yang Xiang
The severe class distribution shews the presence of under-represented data, which has great effects on the performance of learning algorithm, is still a challenge of data mining and machine learning. Lots of researches currently focus on experimental comparison of the existing re-sampling approaches. We believe it requires new ways of constructing better algorithms to further balance and analyse the data set. This paper presents a Fuzzy-based Information Decomposition oversampling (FIDoS) algorithm used for handling the imbalanced data. Generally speaking, this is a new way of addressing imbalanced learning problems from missing data perspective. First, we assume that there are missing instances in the minority class that result in the imbalanced dataset. Then the proposed algorithm which takes advantages of fuzzy membership function is used to transfer information to the missing minority class instances. Finally, the experimental results demonstrate that the proposed algorithm is more practical and applicable compared to sampling techniques.
International Journal of Machine Learning and Cybernetics | 2018
Shigang Liu; Honghua Dai; Min Gan
Missing data estimation is an important strategy for improving learning performance in learning from incomplete data, especially, when there are non discardable records with missing values. However, most of the existing algorithms are focused on missing at random (MAR) or missing completely at random (MCAR), and less attention has been paid to data not missing at random (NMAR). In this paper, an information decomposition imputation (IDIM) algorithm using fuzzy membership function is proposed for addressing the missing value problem under NMAR. Firstly, the proposed IDIM algorithm is presented with detailed examples. Then, the proposed approach is evaluated with extensive experiments compared with some typical algorithms. The experimental results demonstrate that the proposed algorithm has higher accuracy than the exiting imputation approaches in terms of normal root mean square error (NRMSE) and TP+TN evaluation under different missing strategies.
Concurrency and Computation: Practice and Experience | 2018
Chaoliang Li; Shigang Liu
Recently, online social network (OSN) such as Twitter has become an important and popular source for real‐time information and news dissemination, and Twitter is inevitably a prime target of spammers. It has been showed that the security threats caused by Twitter spam can reach far beyond the social media platform itself. To mitigate the damage caused by Twitter spam, machine learning classification algorithms have been employed by researchers and communities to detect the Twitter spam. However, most of these studies have overlooked the class imbalance problem in Twitter spam detection. In this paper, we have studied the class imbalance problem in Twitter spam detection. Firstly, we have conducted a comparative study regarding some popular methods in handling the class imbalance problem in order to identify the most effective approach for addressing the class imbalance problem. Then, we have conducted another comparative study from Twitter spam detection based on several classic techniques. Experimental results demonstrate that a fuzy‐based ensemble learning can significantly improve the classification performance on imbalance ground truth Twitter data.
Concurrency and Computation: Practice and Experience | 2017
Tingmin Wu; Sheng Wen; Shigang Liu; Jun Zhang; Yang Xiang; Majed A. AlRubaian; Mohammad Mehedi Hassan
Twitter spam has long been a critical but difficult problem to be addressed. So far, researchers have developed a series of machine learning–based methods and blacklisting techniques to detect spamming activities on Twitter. According to our investigation, current methods and techniques have achieved the accuracy of around 87%. However, because of the problems of spam drift and information fabrication, these machine learning–based methods cannot efficiently detect spam activities in real‐life scenarios. Meanwhile, the blacklisting method also cannot catch up with the variations of spamming activities, as manually inspecting suspicious URLs is extremely timeconsuming. In this paper, we proposed a novel technique based on deep‐learning technique to address the above challenges. The syntax of each tweet will be learned through WordVector and trained by deep learning. We then constructed a binary classifier to differentiate spam and regular tweets. In experiments, we collected and labeled a 10‐day real tweet dataset as ground truth to evaluate our proposed method. We first went for empirical analysis with a series of comparisons to other methods: (1) performance of different classifiers, (2) other existing text‐based methods, and (3) nontext‐based detection techniques. According to the experiment results, our proposed method largely outperformed previous methods. We further conducted principle component analysis on typical methods to theoretically justify the outperformance of our method. We extracted all kinds of features via dimensionality reduction. It was found that our features were most distinct among all the detection methods. This well demonstrated the outperformance of our method.
australasian conference on information security and privacy | 2016
Shigang Liu; Yu Wang; Chao Chen; Yang Xiang
Being an important source for real-time information dissemination in recent years, Twitter is inevitably a prime target of spammers. It has been showed that the damage caused by Twitter spam can reach far beyond the social media platform itself. To mitigate the threat, a lot of recent studies use machine learning techniques to classify Twitter spam and report very satisfactory results. However, most of the studies overlook a fundamental issue that is widely seen in real-world Twitter data, i.e., the class imbalance problem. In this paper, we show that the unequal distribution between spam and non-spam classes in the data has a great impact on spam detection rate. To address the problem, we propose an ensemble learning approach, which involves three steps. In the first step, we adjust the class distribution in the imbalanced data set using various strategies, including random oversampling, random undersampling and fuzzy-based oversampling. In the next step, a classification model is built upon each of the redistributed data sets. In the final step, a majority voting scheme is introduced to combine all the classification models. Experimental results obtained using real-world Twitter data indicate that the proposed approach can significantly improve the spam detection rate in data sets with imbalanced class distribution.
ITNAC '15 Proceedings of the 2015 International Telecommunication Networks and Applications Conference (ITNAC) | 2015
Keshav Sood; Shigang Liu; Shui Yu; Yong Xiang
Previous attempts in addressing Access Point (AP) association at overlapping zone of IEEE 802.11 networks have shown some issues. They work passively and estimate load from different network metrics such as frame delay, packet loss, number of users etc. that may not always true. Further the user behaviour is selfish i.e. illegitimate user consume high network resources. This adversely affect existing or new users which in turn motivates them to change locations. To alleviate these issues, we propose the use of a Software Defined Networking (SDN) enabled client side (wireless end user) solution. In this paper, we start by proposing a dynamic AP selection algorithm/framework in wireless user device. The device receive network resource related statistics from SDN Controller and guide the client device to associate itself with the best selected AP. We justify that the use of SDN discourage users to act selfishly. Further, a mathematical modelling of the proposed scheme is derived using Fuzzy membership function and the simulation is carried out. Results obtained from simulation necessitates to implement SDN enabled client side methods.
international conference on data mining | 2014
Shigang Liu; Honghua Dai
Missing data imputation is an important task in cases where it is crucial to use all available data and no discard records with missing values. However, most of the existing algorithms are focused on missing at random (MAR) or missing completely at random (MCAR). In this paper, an information decomposition imputation (IDIM) algorithm using fuzzy membership function is proposed for addressing the missing value problem which is not missing at random (NMAR). The reliability of missing value recovery at not missing at random is examined. Firstly, this paper will discuss the proposed IDIM algorithm with detailed examples. Then, the reliability of the proposed approach is evaluated with extensive experiments compared with some typical algorithms, and the results demonstrate that the proposed algorithm has a higher accuracy rate than the exiting imputation methods in terms of normal root mean square error (NRMSE) and predictive accuracy at different set of data with missing values, which shows our method is more reliable in imputing missing values.