Yanfang Ye | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yanfang Ye is active.

Explore More

Publication

Featured researches published by Yanfang Ye.

knowledge discovery and data mining | 2007

IMDS: intelligent malware detection system

Yanfang Ye; Dingding Wang; Tao Li; Dongyi Ye

The proliferation of malware has presented a serious threat to the security of computer systems. Traditional signature-based anti-virus systems fail to detect polymorphic and new, previously unseen malicious executables. In this paper, resting on the analysis of Windows API execution sequences called by PE files, we develop the Intelligent Malware Detection System (IMDS) using Objective-Oriented Association (OOA) mining based classification. IMDS is an integrated system consisting of three major modules: PE parser, OOA rule generator, and rule based classifier. An OOA_Fast_FP-Growth algorithm is adapted to efficiently generate OOA rules for classification. A comprehensive experimental study on a large collection of PE files obtained from the anti-virus laboratory of King-Soft Corporation is performed to compare various malware detection approaches. Promising experimental results demonstrate that the accuracy and efficiency of our IMDS system out perform popular anti-virus software such as Norton AntiVirus and McAfee VirusScan, as well as previous data mining based detection systems which employed Naive Bayes, Support Vector Machine (SVM) and Decision Tree techniques.

Journal in Computer Virology | 2008

An intelligent PE-malware detection system based on association mining

Yanfang Ye; Dingding Wang; Tao Li; Dongyi Ye; Qingshan Jiang

The proliferation of malware has presented a serious threat to the security of computer systems. Traditional signature-based anti-virus systems fail to detect polymorphic/metamorphic and new, previously unseen malicious executables. Data mining methods such as Naive Bayes and Decision Tree have been studied on small collections of executables. In this paper, resting on the analysis of Windows APIs called by PE files, we develop the Intelligent Malware Detection System (IMDS) using Objective-Oriented Association (OOA) mining based classification. IMDS is an integrated system consisting of three major modules: PE parser, OOA rule generator, and rule based classifier. An OOA_Fast_FP-Growth algorithm is adapted to efficiently generate OOA rules for classification. A comprehensive experimental study on a large collection of PE files obtained from the anti-virus laboratory of KingSoft Corporation is performed to compare various malware detection approaches. Promising experimental results demonstrate that the accuracy and efficiency of our IMDS system outperform popular anti-virus software such as Norton AntiVirus and McAfee VirusScan, as well as previous data mining based detection systems which employed Naive Bayes, Support Vector Machine (SVM) and Decision Tree techniques. Our system has already been incorporated into the scanning tool of KingSoft’s Anti-Virus software.

knowledge discovery and data mining | 2010

Automatic malware categorization using cluster ensemble

Yanfang Ye; Tao Li; Yong Chen; Qingshan Jiang

In this paper, resting on the analysis of instruction frequency and function-based instruction sequences, we develop an Automatic Malware Categorization System (AMCS) for automatically grouping malware samples into families that share some common characteristics using a cluster ensemble by aggregating the clustering solutions generated by different base clustering algorithms. We propose a principled cluster ensemble framework for combining individual clustering solutions based on the consensus partition. The domain knowledge in the form of sample-level constraints can be naturally incorporated in the ensemble framework. In addition, to account for the characteristics of feature representations, we propose a hybrid hierarchical clustering algorithm which combines the merits of hierarchical clustering and k-medoids algorithms and a weighted subspace K-medoids algorithm to generate base clusterings. The categorization results of our AMCS system can be used to generate signatures for malware families that are useful for malware detection. The case studies on large and real daily malware collection from Kingsoft Anti-Virus Lab demonstrate the effectiveness and efficiency of our AMCS system.

Journal in Computer Virology | 2009

SBMDS: an interpretable string based malware detection system using SVM ensemble with bagging

Yanfang Ye; Lifei Chen; Dingding Wang; Tao Li; Qingshan Jiang; Min Zhao

Malicious executables are programs designed to infiltrate or damage a computer system without the owner’s consent, which have become a serious threat to the security of computer systems. There is an urgent need for effective techniques to detect polymorphic, metamorphic and previously unseen malicious executables of which detection fails in most of the commercial anti-virus software. In this paper, we develop interpretable string based malware detection system (SBMDS), which is based on interpretable string analysis and uses support vector machine (SVM) ensemble with Bagging to classify the file samples and predict the exact types of the malware. Interpretable strings contain both application programming interface (API) execution calls and important semantic strings reflecting an attacker’s intent and goal. Our SBMDS is carried out with four major steps: (1) first constructing the interpretable strings by developing a feature parser; (2) performing feature selection to select informative strings related to different types of malware; (3) followed by using SVM ensemble with bagging to construct the classifier; (4) and finally conducting the malware detector, which not only can detect whether a program is malicious or not, but also can predict the exact type of the malware. Our case study on the large collection of file samples collected by Kingsoft Anti-virus lab illustrate that: (1) The accuracy and efficiency of our SBMDS outperform several popular anti-virus software; (2) Based on the signatures of interpretable strings, our SBMDS outperforms data mining based detection systems which employ single SVM, Naive Bayes with bagging, Decision Trees with bagging; (3) Compared with the IMDS which utilizes the objective-oriented association (OOA) based classification on API calls, our SBMDS achieves better performance. Our SBMDS system has already been incorporated into the scanning tool of a commercial anti-virus software.

systems man and cybernetics | 2012

Ensemble Clustering for Internet Security Applications

Weiwei Zhuang; Yanfang Ye; Yong Chen; Tao Li

Due to their damage to Internet security, malware and phishing website detection has been the Internet security topics that are of great interests. Compared with malware attacks, phishing website fraud is a relatively new Internet crime. However, they share some common properties: 1) both malware samples and phishing websites are created at a rate of thousands per day driven by economic benefits; and 2) phishing websites represented by the term frequencies of the webpage content share similar characteristics with malware samples represented by the instruction frequencies of the program. Over the past few years, many clustering techniques have been employed for automatic malware and phishing website detection. In these techniques, the detection process is generally divided into two steps: 1) feature extraction, where representative features are extracted to capture the characteristics of the file samples or the websites; and 2) categorization, where intelligent techniques are used to automatically group the file samples or websites into different classes based on computational analysis of the feature representations. However, few have been applied in real industry products. In this paper, we develop an automatic categorization system to automatically group phishing websites or malware samples using a cluster ensemble by aggregating the clustering solutions that are generated by different base clustering algorithms. We propose a principled cluster ensemble framework to combine individual clustering solutions that are based on the consensus partition, which can not only be applied for malware categorization, but also for phishing website clustering. In addition, the domain knowledge in the form of sample-level/website-level constraints can be naturally incorporated into the ensemble framework. The case studies on large and real daily phishing websites and malware collection from the Kingsoft Internet Security Laboratory demonstrate the effectiveness and efficiency of our proposed method.

Expert Systems With Applications | 2016

Malicious sequential pattern mining for automatic malware detection

Yujie Fan; Yanfang Ye; Lifei Chen

An effective framework using sequence mining technique is proposed for automatic malware detection.An efficient sequential pattern mining algorithm for discovering discriminative patterns between malware and benign samples.A new nearest neighbor classifier as the detection module to identify unknown malware.The strong results of the proposed framework compared with the existing malware detection methods in detecting new malicious samples. Due to its damage to Internet security, malware (e.g., virus, worm, trojan) and its detection has caught the attention of both anti-malware industry and researchers for decades. To protect legitimate users from the attacks, the most significant line of defense against malware is anti-malware software products, which mainly use signature-based method for detection. However, this method fails to recognize new, unseen malicious executables. To solve this problem, in this paper, based on the instruction sequences extracted from the file sample set, we propose an effective sequence mining algorithm to discover malicious sequential patterns, and then All-Nearest-Neighbor (ANN) classifier is constructed for malware detection based on the discovered patterns. The developed data mining framework composed of the proposed sequential pattern mining method and ANN classifier can well characterize the malicious patterns from the collected file sample set to effectively detect newly unseen malware samples. A comprehensive experimental study on a real data collection is performed to evaluate our detection framework. Promising experimental results show that our framework outperforms other alternate data mining based detection methods in identifying new malicious executables.

ACM Computing Surveys | 2017

A Survey on Malware Detection Using Data Mining Techniques

Yanfang Ye; Tao Li; Donald A. Adjeroh; S. Sitharama Iyengar

In the Internet age, malware (such as viruses, trojans, ransomware, and bots) has posed serious and evolving security threats to Internet users. To protect legitimate users from these threats, anti-malware software products from different companies, including Comodo, Kaspersky, Kingsoft, and Symantec, provide the major defense against malware. Unfortunately, driven by the economic benefits, the number of new malware samples has explosively increased: anti-malware vendors are now confronted with millions of potential malware samples per year. In order to keep on combating the increase in malware samples, there is an urgent need to develop intelligent methods for effective and efficient malware detection from the real and large daily sample collection. In this article, we first provide a brief overview on malware as well as the anti-malware industry, and present the industrial needs on malware detection. We then survey intelligent malware detection methods. In these methods, the process of detection is usually divided into two stages: feature extraction and classification/clustering. The performance of such intelligent malware detection approaches critically depend on the extracted features and the methods for classification/clustering. We provide a comprehensive investigation on both the feature extraction and the classification/clustering techniques. We also discuss the additional issues and the challenges of malware detection using data mining techniques and finally forecast the trends of malware development.

intelligent information systems | 2010

Hierarchical associative classifier (HAC) for malware detection from the large and imbalanced gray list

Yanfang Ye; Tao Li; Kai Huang; Qingshan Jiang; Yong Chen

Nowadays, numerous attacks made by the malware (e.g., viruses, backdoors, spyware, trojans and worms) have presented a major security threat to computer users. Currently, the most significant line of defense against malware is anti-virus products which focus on authenticating valid software from a whitelist, blocking invalid software from a blacklist, and running any unknown software (i.e., the gray list) in a controlled manner. The gray list, containing unknown software programs which could be either normal or malicious, is usually authenticated or rejected manually by virus analysts. Unfortunately, along with the development of the malware writing techniques, the number of file samples in the gray list that need to be analyzed by virus analysts on a daily basis is constantly increasing. The gray list is not only large in size, but also has an imbalanced class distribution where malware is the minority class. In this paper, we describe our research effort on building automatic, effective, and interpretable classifiers resting on the analysis of Application Programming Interfaces (APIs) called by Windows Portable Executable (PE) files for detecting malware from the large and imbalanced gray list. Our effort is based on associative classifiers due to their high interpretability as well as their capability of discovering interesting relationships among API calls. We first adapt several different post-processing techniques of associative classification, including rule pruning and rule re-ordering, for building effective associative classifiers from large collections of training data. In order to help the virus analysts detect malware from the imbalanced gray list, we then develop the Hierarchical Associative Classifier (HAC). HAC constructs a two-level associative classifier to maximize precision and recall of the minority (malware) class: in the first level, it uses high precision rules of majority (benign file samples) class and low precision rules of minority class to achieve high recall; and in the second level, it ranks the minority class files and optimizes the precision. Finally, since our case studies are based on a large and real data collection obtained from the Anti-virus Lab of Kingsoft corporation, including 8,000,000 malware, 8,000,000 benign files, and 100,000 file samples from the gray list, we empirically examine the sampling strategy to build the classifiers for such a large data collection to avoid over-fitting and achieve great effectiveness as well as high efficiency. Promising experimental results demonstrate the effectiveness and efficiency of the HAC classifier. HAC has already been incorporated into the scanning tool of Kingsoft’s Anti-Virus software.

knowledge discovery and data mining | 2009

Intelligent file scoring system for malware detection from the gray list

Yanfang Ye; Tao Li; Qingshan Jiang; Zhixue Han; Li Wan

Currently, the most significant line of defense against malware is anti-virus products which focus on authenticating valid software from a white list, blocking invalid software from a black list, and running any unknown software (i.e., the gray list) in a controlled manner. The gray list, containing unknown software programs which could be either normal or malicious, is usually authenticated or rejected manually by virus analysts. Unfortunately, along with the development of the malware writing techniques, the number of file samples in the gray list that need to be analyzed by virus analysts on a daily basis is constantly increasing. In this paper, we develop an intelligent file scoring system (IFSS for short) for malware detection from the gray list by an ensemble of heterogeneous base-level classifiers derived by different learning methods, using different feature representations on dynamic training sets. To the best of our knowledge, this is the first work of applying such ensemble methods for malware detection. IFSS makes it practical for virus analysts to identify malware samples from the huge gray list and improves the detection ability of anti-virus software. It has already been incorporated into the scanning tool of Kingsofts Anti-Virus software. The case studies on large and real daily collection of the gray list illustrate that the detection ability and efficiency of our IFSS system outperforms other popular scanning tools such as NOD32 and Kaspersky.

web intelligence | 2016

Deep4MalDroid: A Deep Learning Framework for Android Malware Detection Based on Linux Kernel System Call Graphs

Shifu Hou; Aaron Saas; Lifei Chen; Yanfang Ye

With explosive growth of Android malware and due to its damage to smart phone users (e.g., stealing user credentials, resource abuse), Android malware detection is one of the cyber security topics that are of great interests. Currently, the most significant line of defense against Android malware is anti-malware software products, such as Norton, Lookout, and Comodo Mobile Security, which mainly use the signature-based method to recognize threats. However, malware attackers increasingly employ techniques such as repackaging and obfuscation to bypass signatures and defeat attempts to analyze their inner mechanisms. The increasing sophistication of Android malware calls for new defensive techniques that are harder to evade, and are capable of protecting users against novel threats. In this paper, we propose a novel dynamic analysis method named Component Traversal that can automatically execute the code routines of each given Android application (app) as completely as possible. Based on the extracted Linux kernel system calls, we further construct the weighted directed graphs and then apply a deep learning framework resting on the graph based features for newly unknown Android malware detection. A comprehensive experimental study on a real sample collection from Comodo Cloud Security Center is performed to compare various malware detection approaches. Promising experimental results demonstrate that our proposed method outperforms other alternative Android malware detection techniques. Our developed system Deep4MalDroid has also been integrated into a commercial Android anti-malware software.

Explore More