Yipeng Wang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yipeng Wang is active.

Explore More

Publication

Featured researches published by Yipeng Wang.

international conference on network protocols | 2012

A semantics aware approach to automated reverse engineering unknown protocols

Yipeng Wang; Xiaochun Yun; M. Zubair Shafiq; Liyan Wang; Alex X. Liu; Zhibin Zhang; Danfeng Yao; Yongzheng Zhang; Li Guo

Extracting the protocol message format specifications of unknown applications from network traces is important for a variety of applications such as application protocol parsing, vulnerability discovery, and system integration. In this paper, we propose ProDecoder, a network trace based protocol message format inference system that exploits the semantics of protocol messages without the executable code of application protocols. ProDecoder is based on the key insight that the n-grams of protocol traces exhibit highly skewed frequency distribution that can be leveraged for accurate protocol message format inference. In ProDecoder, we first discover the latent relationship among n-grams by first grouping protocol messages with the same semantics and then inferring message formats by keyword based clustering and cluster sequence alignment. We implemented and evaluated ProDecoder to infer message format specifications of SMB (a binary protocol) and SMTP (a textual protocol). Our experimental results show that ProDecoder accurately parses and infers SMB protocol with 100% precision and recall. For SMTP, ProDecoder achieves approximately 95% precision and recall.

applied cryptography and network security | 2011

Inferring protocol state machine from network traces: a probabilistic approach

Yipeng Wang; Zhibin Zhang; Danfeng Daphne Yao; Buyun Qu; Li Guo

Application-level protocol specifications (i.e., how a protocol should behave) are helpful for network security management, including intrusion detection and intrusion prevention. The knowledge of protocol specifications is also an effective way of detecting malicious code. However, current methods for obtaining unknown protocol specifications highly rely on manual operations, such as reverse engineering which is a major instrument for extracting application-level specifications but is time-consuming and laborious. Several works have focus their attentions on extracting protocol messages from real-world trace automatically, and leave protocol state machine unsolved. n nIn this paper, we propose Veritas, a system that can automatically infer protocol state machine from real-world network traces. The main feature of Veritas is that it has no prior knowledge of protocol specifications, and our technique is based on the statistical analysis on the protocol formats. We also formally define a new model - probabilistic protocol state machine (P-PSM), which is a probabilistic generalization of protocol state machine. In our experiments, we evaluate a text-based protocol and two binary-based protocols to test the performance of Veritas. Our results show that the protocol state machines that Veritas infers can accurately represent 92% of the protocol flows on average. Our system is general and suitable for both text-based and binary-based protocols. Veritas can also be employed as an auxiliary tool for analyzing unknown behaviors in real-world applications.

IEEE ACM Transactions on Networking | 2016

A semantics-aware approach to the automated network protocol identification

Xiaochun Yun; Yipeng Wang; Yongzheng Zhang; Yu Zhou

Traffic classification, a mapping of traffic to network applications, is important for a variety of networking and security issues, such as network measurement, network monitoring, as well as the detection of malware activities. In this paper, we propose Securitas, a network trace-based protocol identification system, which exploits the semantic information in protocol message formats. Securitas requires no prior knowledge of protocol specifications. Deeming a protocol as a language between two processes, our approach is based upon the new insight that the n-grams of protocol traces, just like those of natural languages, exhibit highly skewed frequency-rank distribution that can be leveraged in the context of protocol identification. In Securitas, we first extract the statistical protocol message formats by clustering n-grams with the same semantics, and then use the corresponding statistical formats to classify raw network traces. Our tool involves the following key features: 1) applicable to both connection oriented protocols and connection less protocols; 2) suitable for both text and binary protocols; 3) no need to assemble IP packets into TCP or UDP flows; and 4) effective for both long-live flows and short-live flows. We implement Securitas and conduct extensive evaluations on real-world network traces containing both textual and binary protocols. Our experimental results on BitTorrent, CIFS/SMB, DNS, FTP, PPLIVE, SIP, and SMTP traces show that Securitas has the ability to accurately identify the network traces of the target application protocol with an average recall of about 97.4% and an average precision of about 98.4%. Our experimental results prove Securitas is a robust system, and meanwhile displaying a competitive performance in practice.

parallel and distributed computing: applications and technologies | 2011

Biprominer: Automatic Mining of Binary Protocol Features

Yipeng Wang; Xingjian Li; Jiao Meng; Yong Zhao; Zhibin Zhang; Li Guo

Application-level protocol specifications are helpful for network security management, including intrusion detection and intrusion prevention which rely on monitoring technologies such as deep packet inspection. Moreover, detailed knowledge of protocol specifications is also an effective way of detecting malicious code. However, current methods for obtaining unknown and proprietary protocol message formats (i.e., no publicly available protocol specification), especially binary protocols, highly rely on manual operations, such as reverse engineering which is time-consuming and laborious. In this paper, we propose Biprominer, a tool that can automatically extract binary protocol message formats of an application from its real-world network trace. In addition, we present a transition probability model for a better description of the protocol. The chief feature of Biprominer is that it does not need to have any priori knowledge of protocol formats, because Biprominer is based on the statistical nature of the protocol format. We evaluate the efficacy of Biprominer over three binary protocols, with an average precision more than 99% and a recall better than 96.7%.

networking architecture and storages | 2011

Using Entropy to Classify Traffic More Deeply

Yipeng Wang; Zhibin Zhang; Li Guo; Shuhao Li

The network community always pays its attention to find better methods for traffic classification, which is crucial for Internet Service Providers (ISPs) to provide better QoS for users. Prior works on traffic classification mainly focus their attentions on dividing Internet traffic into different categories based on application layer protocols (such as HTTP, Bit Torrent etc.). Making traffic classification from another point of view, we divide Internet traffic into different content types. Our technology is an attempt to solve the classification problem of network traffic, which contains unknown and proprietary protocols (i.e., no publicly available protocol specification). In this paper, we design a classifier which can distinguish Internet traffic into different content types using machine learning techniques. Features of our classifier are entropy of consecutive bytes and frequencies of characters. Our method is capable of classifying real-world traces into different content types (including Text, Picture, Audio, Video, Compressed, Base 64-encoded image, Base 64-encoded text and Encrypted). The chief features of our classifier are small computing space (about 1K Bytes) and high classification accuracy (about 81%).

networking architecture and storages | 2012

A General Framework of Trojan Communication Detection Based on Network Traces

Shicong Li; Xiaochun Yun; Yongzheng Zhang; Jun Xiao; Yipeng Wang

Because of the widespread Trojan, Internet users become more and more vulnerable to the threat of information leakage. Traditional techniques of Trojan detection were classified into two main categories: host-based and network-based. Unfortunately, existing techniques are insufficient and limited, because of the following reasons: (1)only uncover the known Trojan while inefficiently detecting novel samples, (2) should be adjusted in a timely fashion even a trivial change is applied, and (3)become computationally more expensive. In our work, we focus on a network behavior based method to address the limitations of previous network-based approaches. We analyze the profile of network behavior at two levels: (i)flow-level, (ii)IP-level. Our approach present two main advantages: (1)capture more detailed information to describe the network behavior profile, (2)consume lower computational overhead. We proposed a system, Manto, which detects Trojan communication with high accuracy using clustering technique. We implement Manto on real-world traces. The evaluation results exhibit that Manto is suitable for detecting Trojan communication amongst the vast amount of network traffic, with over 91% accuracy and less than 3.2% false positive ratio. We confidently regard our approach as a complementary way to the existing network-based techniques for we could address their main shortcomings.

international conference on network protocols | 2015

Rethinking Robust and Accurate Application Protocol Identification: A Nonparametric Approach

Yipeng Wang; Xiaochun Yun; Yongzheng Zhang

Protocol traffic analysis is important for a variety of networking and security infrastructures, such as intrusion detection and prevention systems, network management systems, and protocol specification parsers. In this paper, we propose ProHacker, a nonparametric approach that extracts robust and accurate protocol keywords from network traces and effectively identifies the protocol trace from mixed Internet traffic. ProHacker is based on the key insight that the n-grams of protocol traces have highly predictable statistical nature that can be effectively captured by statistical language models and leveraged for robust and accurate protocol identification. In ProHacker, we first extract protocol keywords using a nonparametric Bayesian statistical model, and then use the corresponding protocol keywords to classify protocol traces by a semi-supervised learning algorithm. We implement and evaluate ProHacker on real-world traces, including SMTP, FTP, PPLive, SopCast, and PPStream, and our experimental results show that ProHacker can accurately identify the protocol trace with an average precision of about 99.42% and an average recall of about 98.64%. We also compare the results of ProHacker to two state-of-the-art approaches ProWord and Securitas using backbone traffic. We show that ProHacker provides significant improvements on precision and recall for online protocol identification.

Neurocomputing | 2015

Unsupervised adaptive sign language recognition based on hypothesis comparison guided cross validation and linguistic prior filtering

Yu Zhou; Xiaokang Yang; Yongzheng Zhang; Xiang Xu; Yipeng Wang; Xiujuan Chai; Weiyao Lin

Abstract Signer adaptation is important for sign language recognition systems because a fixed system cannot perform well on all kinds of signers. In supervised signer adaptation, the labeled adaptation data must be collected explicitly. To skip the data collecting process in signer adaptation, we propose a novel unsupervised adaptation method, namely the hypothesis comparison guided cross validation method. The method not only addresses the problem of the overlap between the data set to be labeled and the data set for adaptation, but also employs an additional hypothesis comparison step to decrease the noise rate of the adaptation data set. We also utilize linguistic prior knowledge to down sample the adaptation data list to further decrease the noise rate. To evaluate the effectiveness of the proposed method, the CASIIE-SL-Database is formed, which is the first specialized data set for unsupervised signer adaptation to the best of our knowledge. Experimental results show that the proposed method can achieve relative word error rate reductions of 3.93% and 4.05% respectively compared with self-teaching method and cross validation method. Though the method is proposed for signer adaptation, it can also be applied to speaker adaptation and writer adaptation directly.

trust security and privacy in computing and communications | 2014

Visual Similarity Based Anti-phishing with the Combination of Local and Global Features

Yu Zhou; Yongzheng Zhang; Jun Xiao; Yipeng Wang; Weiyao Lin

Phishing uses a fake Web page to steal personal sensitive information such as credit card numbers and passwords. Generally, the fake Web page is visually similar to the legitimate target Web page. The phishers can obtain financial benefits through these information. Anti-phishing is very important for a variety of applications such as phishing attacks, online transaction security, and user privacy protection. In this paper, we propose a novel and effective visual similarity based phishing detection approach that compares the snapshot image pair of the suspected Web page and the protected Web page. The proposed approach is based on the key insight that both the local and the global features of the Web page image can be used to represent the visual characteristics of the Web page together. This approach is purely on the image level, and thus can effectively deal with the non-text phishing tricks including images or Flashes objects in the HTML contents. For the local feature, the existence of the target logo is detected. For the global feature, the similarity of the visible part of the Web page is considered. We implemented and evaluated the proposed approach on a large scale dataset consisting of 2,129 real world phishing Web pages and 1,367 irrelevant legitimate Web pages. The experimental results show that the proposed approach can achieve over 90.00% true positive rate and 97.00% true negative rate. Our approach has been applied in the anti-phishing project of a major Internet Service Provider and gives a periodical reports to the potential users.

parallel and distributed computing: applications and technologies | 2011

A Propagation Model for Social Engineering Botnets in Social Networks

Shuhao Li; Xiaochun Yun; Zhiyu Hao; Xiang Cui; Yipeng Wang

With the rapid development of social networking services and the diversification of social engineering attacks, new high-infection botnet (called SE-botnet by us), which exploits social engineering attacks to spread bots in social networks, has become an underlying threat. Predicting the threat of SE-botnet can help defenders mitigate it effectively. In this paper, we focus on SE-botnets infection and defense, presenting a propagation model for it. We take full account of social networks characteristics and human dynamics, and abstract the general process of social engineering attacks used by SE-botnet. Our preliminary simulation results demonstrate that the SE-botnet can capture tens of thousands of bots in one day with a great infection capacity. our propagation model can accurately predict this process with less than 5% deviation.

Explore More