Xiaochun Yun | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xiaochun Yun is active.

Explore More

Publication

Featured researches published by Xiaochun Yun.

international conference on network protocols | 2012

A semantics aware approach to automated reverse engineering unknown protocols

Yipeng Wang; Xiaochun Yun; M. Zubair Shafiq; Liyan Wang; Alex X. Liu; Zhibin Zhang; Danfeng Yao; Yongzheng Zhang; Li Guo

Extracting the protocol message format specifications of unknown applications from network traces is important for a variety of applications such as application protocol parsing, vulnerability discovery, and system integration. In this paper, we propose ProDecoder, a network trace based protocol message format inference system that exploits the semantics of protocol messages without the executable code of application protocols. ProDecoder is based on the key insight that the n-grams of protocol traces exhibit highly skewed frequency distribution that can be leveraged for accurate protocol message format inference. In ProDecoder, we first discover the latent relationship among n-grams by first grouping protocol messages with the same semantics and then inferring message formats by keyword based clustering and cluster sequence alignment. We implemented and evaluated ProDecoder to infer message format specifications of SMB (a binary protocol) and SMTP (a textual protocol). Our experimental results show that ProDecoder accurately parses and infers SMB protocol with 100% precision and recall. For SMTP, ProDecoder achieves approximately 95% precision and recall.

international conference on information technology | 2007

Analyzing the Characteristics of Gnutella Overlays

Yong Wang; Xiaochun Yun; Yifei Li

Mapping and analyzing the topological properties of P2P overlay network will benefit the further design and development of the P2P networks. In this paper, the measured Gnutella network topology is basically taken as an example. The properties of degree-rank distribution and frequency-degree distributions of the measured topology graphs are analyzed in detail. The small world characteristics for Gnutella network are discussed. The results indicate that each tier of Gnutella network shows individual characters, namely, the top level graph fits the power law in degree-rank distribution, but follows the Gaussian function in frequency-degree distribution. The bottom level graph shows power law both in its degree-rank distribution and in its frequency-degree distribution. Fitting results indicate that power law could fit better for the degree-rank distribution and frequency-degree distribution of bottom level graphs, while Gaussian could describe the frequency-degree distribution of the top level graphs. Gnutella overlay network has the small world characters, but it is not a scale-free network, which has developed over time following a different set of growth processes from those of the BA (Barabdsi-Albert) model. The measured results show that Gnutella network has pretty well scalability as well as the abilities to tolerating failures and attacks against peers, but with low routing efficiencies

IEEE Transactions on Neural Networks | 2015

Bidirectional Active Learning: A Two-Way Exploration Into Unlabeled and Labeled Data Set

Xiaoyu Zhang; Shupeng Wang; Xiaochun Yun

In practical machine learning applications, human instruction is indispensable for model construction. To utilize the precious labeling effort effectively, active learning queries the user with selective sampling in an interactive way. Traditional active learning techniques merely focus on the unlabeled data set under a unidirectional exploration framework and suffer from model deterioration in the presence of noise. To address this problem, this paper proposes a novel bidirectional active learning algorithm that explores into both unlabeled and labeled data sets simultaneously in a two-way process. For the acquisition of new knowledge, forward learning queries the most informative instances from unlabeled data set. For the introspection of learned knowledge, backward learning detects the most suspiciously unreliable instances within the labeled data set. Under the two-way exploration framework, the generalization ability of the learning model can be greatly improved, which is demonstrated by the encouraging experimental results.

Neurocomputing | 2015

Update vs. upgrade

Xiaoyu Zhang; Shupeng Wang; Xiaobin Zhu; Xiaochun Yun; Guangjun Wu; Yipeng Wang

This paper brings up a very important issue for active learning in practice. Traditional active learning mechanism is based on the assumption that the number of classes happens to be known in advance, and thus selective sampling is confined to the determinate model. However, as is the case for many applications, the model class is usually indeterminate and there is every chance that the hypothesis itself is inappropriate. To address this problem, we propose a novel indeterminate multi-class active learning algorithm, which comprehensively evaluates the instance based on both the value in refining the existing model and the potential in triggering model rectification. In this way, balance is effectively achieved between model update and model upgrade. Advantage of the proposed algorithm is demonstrated by experiments of classification tasks on both synthetic and real-world dataset.

ieee international conference on cloud computing technology and science | 2015

FastRAQ: A Fast Approach to Range-Aggregate Queries in Big Data Environments

Xiaochun Yun; Guangjun Wu; Guangyan Zhang; Keqin Li; Shupeng Wang

Range-aggregate queries are to apply a certain aggregate function on all tuples within given query ranges. Existing approaches to range-aggregate queries are insufficient to quickly provide accurate results in big data environments. In this paper, we propose FastRAQ-a fast approach to range-aggregate queries in big data environments. FastRAQ first divides big data into independent partitions with a balanced partitioning algorithm, and then generates a local estimation sketch for each partition. When a range-aggregate query request arrives, FastRAQ obtains the result directly by summarizing local estimates from all partitions. FastRAQ has O(1) time complexity for data updates and O(N/P×B) time complexity for range-aggregate queries, where N is the number of distinct tuples for all dimensions, P is the partition number, and B is the bucket number in the histogram. We implement the FastRAQ approach on the Linux platform, and evaluate its performance with about 10 billions data records. Experimental results demonstrate that FastRAQ provides range-aggregate query results within a time period two orders of magnitude lower than that of Hive, while the relative error is less than 3 percent within the given confidence interval.

international conference on machine learning and cybernetics | 2004

A risk assessment approach for network information system

Yongzheng Zhang; Binxing Fang; Xiaochun Yun

Currently, risk assessment has been an effective technology of protecting network information system. In order to achieve more accurate result, a new assessment approach is presented in this paper. Our approach introduces the idea of network node correlation (NNC), and based on NNC, we give the conception and characteristic of risk propagation. Also, we design a quantitative taxonomy of network node, and describe the assessment process. Compared with other works, our approach more truly reflects the existence of the correlative risk.

web age information management | 2008

A Survey of Alert Fusion Techniques for Security Incident

Tianning Zang; Xiaochun Yun; Yongzheng Zhang

Security incident have been imposing tremendous threats on todaypsilas network information system. To protect this information system from the increasing threat of intrusion, various kinds of detection systems and sensors for security incident have been developed. The main disadvantages of current systems and sensors are a high false detection rate and the lack of post-incident decision support capability. To minimize these drawbacks, various alert fusion technologies have been proposed in the recent years. This paper presents a general summary of these technologies. Basic models and key technologies of alert fusion are analyzed and discussed. Moreover, important aggregation and correlation algorithms are discussed. Finally, we make concluding remarks by predicting the development tendencies of alert correlation technologies.

IEEE ACM Transactions on Networking | 2016

A semantics-aware approach to the automated network protocol identification

Xiaochun Yun; Yipeng Wang; Yongzheng Zhang; Yu Zhou

Traffic classification, a mapping of traffic to network applications, is important for a variety of networking and security issues, such as network measurement, network monitoring, as well as the detection of malware activities. In this paper, we propose Securitas, a network trace-based protocol identification system, which exploits the semantic information in protocol message formats. Securitas requires no prior knowledge of protocol specifications. Deeming a protocol as a language between two processes, our approach is based upon the new insight that the n-grams of protocol traces, just like those of natural languages, exhibit highly skewed frequency-rank distribution that can be leveraged in the context of protocol identification. In Securitas, we first extract the statistical protocol message formats by clustering n-grams with the same semantics, and then use the corresponding statistical formats to classify raw network traces. Our tool involves the following key features: 1) applicable to both connection oriented protocols and connection less protocols; 2) suitable for both text and binary protocols; 3) no need to assemble IP packets into TCP or UDP flows; and 4) effective for both long-live flows and short-live flows. We implement Securitas and conduct extensive evaluations on real-world network traces containing both textual and binary protocols. Our experimental results on BitTorrent, CIFS/SMB, DNS, FTP, PPLIVE, SIP, and SMTP traces show that Securitas has the ability to accurately identify the network traces of the target application protocol with an average recall of about 97.4% and an average precision of about 98.4%. Our experimental results prove Securitas is a robust system, and meanwhile displaying a competitive performance in practice.

trust security and privacy in computing and communications | 2011

CNSSA: A Comprehensive Network Security Situation Awareness System

Rongrong Xi; Shuyuan Jin; Xiaochun Yun; Yongzheng Zhang

With tremendous attacks in the Internet, there is a high demand for network analysts to know about the situations of network security effectively. Traditional network security tools lack the capability of analyzing and assessing network security situations comprehensively. In this paper, we introduce a novel network situation awareness tool - CNSSA (Comprehensive Network Security Situation Awareness) - to perceive network security situations comprehensively. Based on the fusion of network information, CNSSA makes a quantitative assessment on the situations of network security. It visualizes the situations of network security in its multiple and various views, so that network analysts can know about the situations of network security easily and comprehensively. The case studies demonstrate how CNSSA can be deployed into a real network and how CNSSA can effectively comprehend the situation changes of network security in real time.

networking architecture and storages | 2012

A General Framework of Trojan Communication Detection Based on Network Traces

Shicong Li; Xiaochun Yun; Yongzheng Zhang; Jun Xiao; Yipeng Wang

Because of the widespread Trojan, Internet users become more and more vulnerable to the threat of information leakage. Traditional techniques of Trojan detection were classified into two main categories: host-based and network-based. Unfortunately, existing techniques are insufficient and limited, because of the following reasons: (1)only uncover the known Trojan while inefficiently detecting novel samples, (2) should be adjusted in a timely fashion even a trivial change is applied, and (3)become computationally more expensive. In our work, we focus on a network behavior based method to address the limitations of previous network-based approaches. We analyze the profile of network behavior at two levels: (i)flow-level, (ii)IP-level. Our approach present two main advantages: (1)capture more detailed information to describe the network behavior profile, (2)consume lower computational overhead. We proposed a system, Manto, which detects Trojan communication with high accuracy using clustering technique. We implement Manto on real-world traces. The evaluation results exhibit that Manto is suitable for detecting Trojan communication amongst the vast amount of network traffic, with over 91% accuracy and less than 3.2% false positive ratio. We confidently regard our approach as a complementary way to the existing network-based techniques for we could address their main shortcomings.

Explore More