Lifei Chen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Lifei Chen is active.

Explore More

Publication

Featured researches published by Lifei Chen.

Journal in Computer Virology | 2009

SBMDS: an interpretable string based malware detection system using SVM ensemble with bagging

Yanfang Ye; Lifei Chen; Dingding Wang; Tao Li; Qingshan Jiang; Min Zhao

Malicious executables are programs designed to infiltrate or damage a computer system without the owner’s consent, which have become a serious threat to the security of computer systems. There is an urgent need for effective techniques to detect polymorphic, metamorphic and previously unseen malicious executables of which detection fails in most of the commercial anti-virus software. In this paper, we develop interpretable string based malware detection system (SBMDS), which is based on interpretable string analysis and uses support vector machine (SVM) ensemble with Bagging to classify the file samples and predict the exact types of the malware. Interpretable strings contain both application programming interface (API) execution calls and important semantic strings reflecting an attacker’s intent and goal. Our SBMDS is carried out with four major steps: (1) first constructing the interpretable strings by developing a feature parser; (2) performing feature selection to select informative strings related to different types of malware; (3) followed by using SVM ensemble with bagging to construct the classifier; (4) and finally conducting the malware detector, which not only can detect whether a program is malicious or not, but also can predict the exact type of the malware. Our case study on the large collection of file samples collected by Kingsoft Anti-virus lab illustrate that: (1) The accuracy and efficiency of our SBMDS outperform several popular anti-virus software; (2) Based on the signatures of interpretable strings, our SBMDS outperforms data mining based detection systems which employ single SVM, Naive Bayes with bagging, Decision Trees with bagging; (3) Compared with the IMDS which utilizes the objective-oriented association (OOA) based classification on API calls, our SBMDS achieves better performance. Our SBMDS system has already been incorporated into the scanning tool of a commercial anti-virus software.

International Journal of Machine Learning and Cybernetics | 2012

Soft subspace clustering with an improved feature weight self-adjustment mechanism

Gongde Guo; Si Chen; Lifei Chen

Traditional clustering algorithms are often defeated by high dimensionality. In order to find clusters hiding in different subspaces, soft subspace clustering has become an effective means of dealing with high dimensional data. However, most existing soft subspace clustering algorithms contain parameters which are difficult to be determined by users in real-world applications. A new soft subspace clustering algorithm named SC-IFWSA is proposed, which uses an improved feature weight self-adjustment mechanism IFWSA to update adaptively the weights of all features for each cluster according to the importance of the features to clustering quality and does not require users to set any parameter values. In addition, SC-IFWSA can overcome the traditional FWSA mechanism which may fail to calculate feature weights in some particular cases. In comparison with its related approaches, the experimental results carried out on ten data sets demonstrate the effectiveness and feasibility of the proposed method.

Expert Systems With Applications | 2016

Malicious sequential pattern mining for automatic malware detection

Yujie Fan; Yanfang Ye; Lifei Chen

An effective framework using sequence mining technique is proposed for automatic malware detection.An efficient sequential pattern mining algorithm for discovering discriminative patterns between malware and benign samples.A new nearest neighbor classifier as the detection module to identify unknown malware.The strong results of the proposed framework compared with the existing malware detection methods in detecting new malicious samples. Due to its damage to Internet security, malware (e.g., virus, worm, trojan) and its detection has caught the attention of both anti-malware industry and researchers for decades. To protect legitimate users from the attacks, the most significant line of defense against malware is anti-malware software products, which mainly use signature-based method for detection. However, this method fails to recognize new, unseen malicious executables. To solve this problem, in this paper, based on the instruction sequences extracted from the file sample set, we propose an effective sequence mining algorithm to discover malicious sequential patterns, and then All-Nearest-Neighbor (ANN) classifier is constructed for malware detection based on the discovered patterns. The developed data mining framework composed of the proposed sequential pattern mining method and ANN classifier can well characterize the malicious patterns from the collected file sample set to effectively detect newly unseen malware samples. A comprehensive experimental study on a real data collection is performed to evaluate our detection framework. Promising experimental results show that our framework outperforms other alternate data mining based detection methods in identifying new malicious executables.

IEEE Transactions on Knowledge and Data Engineering | 2012

Model-Based Method for Projective Clustering

Lifei Chen; Qingshan Jiang; Shengrui Wang

Clustering high-dimensional data is a major challenge due to the curse of dimensionality. To solve this problem, projective clustering has been defined as an extension to traditional clustering that attempts to find projected clusters in subsets of the dimensions of a data space. In this paper, a probability model is first proposed to describe projected clusters in high-dimensional data space. Then, we present a model-based algorithm for fuzzy projective clustering that discovers clusters with overlapping boundaries in various projected subspaces. The suitability of the proposal is demonstrated in an empirical study done with synthetic data set and some widely used real-world data set.

international conference on data mining | 2008

A Probability Model for Projective Clustering on High Dimensional Data

Lifei Chen; Qingshan Jiang; Shengrui Wang

Clustering high dimensional data is a big challenge in data mining due to the curse of dimensionality. To solve this problem, projective clustering has been defined as an extension of traditional clustering that seeks to find projected clusters in subsets of dimensions of a data space. In this paper, the problem of modeling projected clusters is first discussed, and an extended Gaussian model is proposed. Second, a general objective criterion used with k-means type projective clustering is presented based on the model. Finally, the expressions to learn model parameters are derived and then used in a new algorithm named FPC to perform fuzzy clustering on high dimensional data. The experimental results on document clustering show the effectiveness of the proposed clustering model.

web intelligence | 2016

Deep4MalDroid: A Deep Learning Framework for Android Malware Detection Based on Linux Kernel System Call Graphs

Shifu Hou; Aaron Saas; Lifei Chen; Yanfang Ye

With explosive growth of Android malware and due to its damage to smart phone users (e.g., stealing user credentials, resource abuse), Android malware detection is one of the cyber security topics that are of great interests. Currently, the most significant line of defense against Android malware is anti-malware software products, such as Norton, Lookout, and Comodo Mobile Security, which mainly use the signature-based method to recognize threats. However, malware attackers increasingly employ techniques such as repackaging and obfuscation to bypass signatures and defeat attempts to analyze their inner mechanisms. The increasing sophistication of Android malware calls for new defensive techniques that are harder to evade, and are capable of protecting users against novel threats. In this paper, we propose a novel dynamic analysis method named Component Traversal that can automatically execute the code routines of each given Android application (app) as completely as possible. Based on the extracted Linux kernel system calls, we further construct the weighted directed graphs and then apply a deep learning framework resting on the graph based features for newly unknown Android malware detection. A comprehensive experimental study on a real sample collection from Comodo Cloud Security Center is performed to compare various malware detection approaches. Promising experimental results demonstrate that our proposed method outperforms other alternative Android malware detection techniques. Our developed system Deep4MalDroid has also been integrated into a commercial Android anti-malware software.

web age information management | 2016

DroidDelver: An Android Malware Detection System Using Deep Belief Network Based on API Call Blocks

Shifu Hou; Aaron Saas; Yanfang Ye; Lifei Chen

Because of the explosive growth of Android malware and due to the severity of its damages, the detection of Android malware has become an increasing important topic in cyber security. Currently, the major defense against Android malware is commercial mobile security products which mainly use signature-based method for detection. However, attackers can easily devise methods, such as obfuscation and repackaging, to evade the detection, which calls for new defensive techniques that are harder to evade. In this paper, resting on the analysis of Application Programming Interface (API) calls extracted from the smali files, we further categorize the API calls which belong to the some method in the smali code into a block. Based on the generated code blocks, we then apply a deep learning framework (i.e., Deep Belief Network) for newly unknown Android malware detection. Using a real sample collection from Comodo Cloud Security Center, a comprehensive experimental study is performed to compare various malware detection approaches. Promising experimental results demonstrate that DroidDelver which integrates our proposed method outperform other alternative Android malware detection techniques.

advanced information networking and applications | 2008

A New Centroid-Based Classifier for Text Categorization

Lifei Chen; Yanfang Ye; Qingshan Jiang

In recent years, centroid-based document classifiers receive wide interests from text mining community because of their simplicity and linear-time complexity. However, the traditional centroid-based classifiers usually perform less effectively for Chinese text categorization. In this paper, we tackle the problem by developing a new way to calculate the class-specific weights for each term in the training phase; in the testing phase, the new documents are assigned to the centroid to which the document is most similar based on the weighted distance measurement. The experimental results demonstrate that the accuracy of our algorithm outperforms the traditional centroid-based classifiers, as well as outstanding efficiency compared with the Support Vector Machine (SVM) based classifiers for Chinese text categorization.

Journal of Theoretical Biology | 2016

A centroid-based gene selection method for microarray data classification.

Shun Guo; Donghui Guo; Lifei Chen; Qingshan Jiang

For classification problems based on microarray data, the data typically contains a large number of irrelevant and redundant features. In this paper, a new gene selection method is proposed to choose the best subset of features for microarray data with the irrelevant and redundant features removed. We formulate the selection problem as a L1-regularized optimization problem, based on a newly defined linear discriminant analysis criterion. Instead of calculating the mean of the samples, a kernel-based approach is used to estimate the class centroid to define both the between-class separability and the within-class compactness for the criterion. Theoretical analysis indicates that the global optimal solution of the L1-regularized criterion can be reached with a general condition, on which an efficient algorithm is derived to the feature selection problem in a linear time complexity with respect to the number of features and the number of samples. The experimental results on ten publicly available microarray datasets demonstrate that the proposed method performs effectively and competitively compared with state-of-the-art methods.

advanced information networking and applications | 2010

A New Over-Sampling Method Based on Cluster Ensembles

Si Chen; Gongde Guo; Lifei Chen

Most of the traditional classification methods behave undesirable, particularly producing poor predictive accuracy for the minority class of the imbalanced data from real world applications. This paper proposes a novel over-sampling strategy to handle imbalanced data based on cluster ensembles, named CE-SMOTE, which aims to provide a better training platform by introducing clustering consistency index to find out the cluster boundary minority samples and then over-sampling these minority samples to augment the original data set. Experiments carried out on some imbalanced public data sets show that the proposed method is effective and feasible to deal with the imbalanced data sets, and can produce high predictions for both minority and majority classes.

Explore More