Yuxin Ding
Harbin Institute of Technology Shenzhen Graduate School
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yuxin Ding.
international conference on machine learning and cybernetics | 2009
Yuxin Ding; Min Xiao; Ai-Wu Liu
Since most of current intrusion detection systems (IDS) only use one of the two detection methods, misused detection or anomaly detection, both of them have their own limitations. In this paper, the technique that combines misuse detection system with anomaly detection system (ADS) is used. The hybrid intrusion detection system (HIDS) contains three sub-modules, misused detection module, anomaly detection module and signature generation module. The basis of misused detection module is snort. Anomaly detection module is constructed by using frequent episode rule. And signature generation module is based on a variant of Apriori algorithm. Misused detection module uses the signature of attacks to detection the known attacks. Anomaly detection module can detect the unknown attacks and signature generation module extracts the signature of attacks that are detected by ADS module, and maps the signatures into snort rules.
Journal of Computers | 2010
Yonghui Wu; Yuxin Ding; Xiaolong Wang; Jun Xu
In this paper we present our research of online hot topic detection and label extraction method for our hot topic recommendation system. Using a new topical feature selection method, the feature space is compressed suitable for an online system. The tolerance rough set model is used to enriching the small set of topical feature words to a topical approximation space. According to the distance defined on the topical approximation space, the web pages are clustered into groups which will be merged with document overlap. The topic labels are extracted based on the approximation topical space enriched with the useful but high frequency topical words dropped by the clustering process. The experiments show that our method could generate more information abundant classes and more topical class labels, alleviate the topical drift caused by the non-topical and noise words.
international conference on computer science and information technology | 2010
Yonghui Wu; Yuxin Ding; Xiaolong Wang; Jun Xu
Topic model is an increasing useful tool to analyze the semantic level meanings and capture the topical features. However, there is few research about the comparative study of the topic models. In this paper, we describe our comparative study of three topic models in the extrinsic application of topic clustering. The topic model distance is defined on the converged parameters of topic models, which is used in the topic clustering. Then, the topic models are compared using the clustering result of the corresponding topic distance matrix. A series of comparative experiments are carried on a corpus containing 5033 web news from 30 topics using the cosine distance as the base-line. Web page collections with different number of topics and documents are used in experiments. The experiment results show that topic clustering using topic distance achieves a better precision and recall in the data set containing related topics. The topic clustering using topic distance benefits from the topic features captured by topic models. The complex topic model does provide further help than the simple topic model in topic clustering.
AST/UCMA/ISA/ACN'10 Proceedings of the 2010 international conference on Advances in computer science and information technology | 2010
Yonghui Wu; Yuxin Ding; Xiaolong Wang; Jun Xu
Clustering is widely used in topic detection task. However, the vector space model based distance, such as cosine-like distance, will get a low precision and recall when the corpus contains many related topics. In this paper, we propose a new distance measure method: the Topic Model (TM) induced distance. Assuming that the distribution of word is different in each topic, the documents can be treated as a sample of the mixture of k topic models, which can be estimated using expectation maximization (EM). A biased initiation method is proposed in this paper for topic decomposition using EM, which will generate a converged matrix for the generation of TM induced distance. The collections of web news are clustered into classes using this TM distance. A series of experiments are described on a corpus containing 5033 web news from 30 topics. K-means clustering is processed on test set with different topic numbers. A comparison of clustering result using the TM induced distance and the traditional cosine-like distance are given. The experiment results show that the proposed topic decomposition method using biased initiation is effective than the topic decomposition using random values. The TM induced distance will generate more topical groups than the VS model based cosine-like distance. In the web news collections containing related topics, the TM induced distance can achieve a better precision and recall.
systems, man and cybernetics | 2008
Jun Xu; Yuxin Ding; Xiaolong Wang; Yonghui Wu
Document genre information is one of the most distinguishing features in information retrieval, which brings order to the search results. What the genre classification concerned is not the topic but the genre of document. In this paper, we examine the effectiveness of using machine learning techniques to solve genre classification of Chinese text with the same topic, viz. finance. Based on the likelihood ratio test, we present a new method for selecting feature terms, which can improve the performance clearly and perform better than others with up to 80% terms removal. In empirical results with SVMs classifier on the real world corpora, we find that this method can gain a better selecting effect and likelihood ratio is a reliable measure for selecting informative features.
international conference on machine learning and cybernetics | 2006
Yuxin Ding; Xiaolong Wang; Le-bin Lin; Qi Zhang; Yonghui Wu
This paper discusses the design and implementation of a Web crawler - Inar written in C++ executed on Linux. It is a single-threaded crawler base on asynchronous I/O technology, which is under development. This paper describes the architecture of the Web crawler and discusses the design and the function of its each component in detail. For some design problems that we met in practice, such as URL queues design, hash algorithm design, we proposed our solution
international conference on machine learning and cybernetics | 2017
Xiao-Ling Xia; Yuxin Ding; Jing-Zhi Jiang; Rong Zeng
Malware in form of Internet worms, computer viruses, and Trojan horses poses a major threat to the security of networked systems. So how to describe the behavior knowledge of malware is an interesting and meaningful work. In recent years, different ontology technologies have been proposed to represent domain knowledge. In the study, we apply ontology techniques into the field of malware detection, and propose the malware detection method based on ontology. This method is based on the behavior of malicious code, and makes a knowledge representation of the malware behaviors from a variety of perspectives. We use the common behaviors of individuals to represent the behaviors of a malware family, and use the ontology reasoning mechanism to detect unknown malware samples. Experiments show that the method has high malicious code detection rate and low false alarm rate.
international conference on machine learning and cybernetics | 2015
Yang Xiao; Yuxin Ding
Text classification algorithms based on topic models represent documents as topic vectors and use topic vectors to train classification models. One problem of topic based representation is (bat the topics generated by topic models have different qualities, so the topics with poor qualities will seriously affect the classification accuracy. To solve this problem, in this paper Laplace weight algorithm is proposed to calculate the weight of topics. We use the Laplace weight as the weights of topics, which can evaluate the importance of topics. The experiments show that the Laplace weight can improve the classification accuracy.
Computer Communications | 2016
Yuxin Ding; Shengli Yan; Yibin Zhang; Wei Dai; Li Dong
international conference on machine learning and cybernetics | 2011
Wen Zhang; Yuxin Ding; Yan Tang; Bin Zhao