Tingwen Liu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tingwen Liu is active.

Explore More

Publication

Featured researches published by Tingwen Liu.

international conference on computer communications | 2011

An efficient regular expressions compression algorithm from a new perspective

Tingwen Liu; Yifu Yang; Yanbing Liu; Yong Sun; Li Guo

Deep packet inspection plays a increasingly important role in network security devices and applications, which use more regular expressions to depict patterns. DFA engine is usually used as a classical representation for regular expressions to perform pattern matching, because it only need O(1) time to process one input character. However, DFAs of regular expression sets require large amount of memory, which limits the practical application of regular expressions in high-speed networks. Some compression algorithms have been proposed to address this issue in recent literatures. In this paper, we reconsider this problem from a new perspective, namely observing the characteristic of transition distribution inside each state, which is different from previous algorithms that observe transition characteristic among states. Furthermore, we introduce a new compression algorithm which can reduce 95% memory usage of DFA stably without significant impact on matching speed. Moreover, our work is orthogonal to previous compression algorithms, such as D2FA, δFA. Our experiment results show that applying our work to them will have several times memory reduction, and matching speed of up to dozens of times comparing with original δFA in software implementation.

IEEE Journal on Selected Areas in Communications | 2014

Towards Fast and Optimal Grouping of Regular Expressions via DFA Size Estimation

Tingwen Liu; Alex X. Liu; Jinqiao Shi; Yong Sun; Li Guo

Regular Expression (RegEx) matching, as a core operation in many network and security applications, is typically performed on Deterministic Finite Automata (DFA) to process packets at wire speed; however, DFA size is often exponential in the number of RegExes. RegEx grouping is the practical way to address DFA state explosion. Prior RegEx grouping algorithms are extremely slow and memory intensive. In this paper, we first propose DFAestimator, an algorithm that can quickly estimate DFA size for a given RegEx set without building the actual DFA. Second, we propose RegexGrouper, a RegEx grouping algorithm based on DFA size estimation. In terms of speed and memory consumption, our work is orders of magnitude more efficient than prior art because DFA size estimation is much faster and memory efficient than DFA construction. In terms of the resulting size sum of DFAs, our work is significantly more effective than prior art because we use a much finer grained quantification of the degree of interaction between two RegExes. For example, to divide the RegEx set of the L7-filter system into 7 groups, prior art uses 279.3 minutes and the resulting 7 DFAs have a total of 29047 states, whereas RegexGrouper uses 3.2 minutes and the resulting 7 DFAs have a total of 15578 states.

Procedia Computer Science | 2013

EPLogCleaner: Improving Data Quality of Enterprise Proxy Logs for Efficient Web Usage Mining☆

Hongzhou Sha; Tingwen Liu; Peng Qin; Yong Sun; Qingyun Liu

Abstract Data cleaning is an important step performed in the preprocessing stage of web usage mining, and is widely used in many data mining systems. Despite many efforts on data cleaning for web server logs, it is still an open question for enterprise proxy logs. With unlimited accesses to websites, enterprise proxy logs trace web requests from multiple clients to multiple web servers,which make them quite different from web sever logs on both location and content. Therefore, many irrelevant items such as software updating requests cannot be filtered out by traditional data cleaning methods. In this paper, we propose the first method named EPLogCleaner that can filter out plenty of irrelevant items based on the common prefix of their URLs. We make an evaluation of EPLogCleaner with a real network traffic trace captured from one enterprise proxy. Experimental results show that EPLogCleaner can improve data quality of enterprise proxy logs by further filtering out more than 30% URL requests comparing with traditional data cleaning methods.

networking architecture and storages | 2010

Fast and Memory-Efficient Traffic Classification with Deep Packet Inspection in CMP Architecture

Tingwen Liu; Yong Sun; Li Guo

Traffic classification is important to many network applications, such as network monitoring. The classic way to identify flows, e.g., examining the port numbers in the packet headers, becomes ineffective. In this context, deep packet inspection technology, which does not only inspect the packet headers but also the packet payloads, plays a more important role in traffic classification. Meanwhile regular expressions are replacing strings to represent patterns because of their expressive power, simplicity and flexibility. However, regular expressions mathcing technique causes a high memory usage and processing cost, which result in low throughout. In this paper, we analyze the application-level protocol distribution of network traffic and conclude its characteristic. Furthermore, we design a fast and memory-efficient system of a two-layer architecture for traffic classification with the help of regular expressions in multi-core architecture, which is different from previous one-layer architecture. In order to reduce the memory usage of DFA, we use a compression algorithm called CSCA to perform regular expressions matching, which can reduce 95% memory usage of DFA. We also introduce some optimizations to accelerate the matching speed. We use real-world traffic and all L7-filter protocol patterns to make our experiments, and the results show that the system achieves at Gbps level throughout in 4-cores Servers.

applied cryptography and network security | 2012

A prefiltering approach to regular expression matching for network security systems

Tingwen Liu; Yong Sun; Alex X. Liu; Li Guo; Binxing Fang

Regular expression (RegEx) matching has been widely used in various networking and security applications. Despite much effort on this important problem, it remains a fundamentally difficult problem. DFA-based solutions can achieve high throughput, but require too much memory to be executed in high speed SRAM. NFA-based solutions require small memory, but are too slow. In this paper, we propose RegexFilter, a prefiltering approach. The basic idea is to generate the RegEx print of RegEx set and use it to prefilter out most unmatched items. There are two key technical challenges: the generation of RegEx print and the matching process of RegEx print. The generation of RegEx is tricky as we need to tradeoff between two conflicting goals: filtering effectiveness, which means that we want the RegEx print to filter out as many unmatched items as possible, and matching speed, which means that we want the matching speed of the RegEx print as high as possible. To address the first challenge, we propose some measurement tools for RegEx complexity and filtering effectiveness, and use it to guide the generation of RegEx print. To address the second challenge, we propose a fast RegEx print matching solution using Ternary Content Addressable Memory. We implemented our approach and conducted experiments on real world data sets. Our experimental results show that RegexFilter can speedup the potential throughput of RegEx matching by 21.5 times and 20.3 times for RegEx sets of Snort and L7-Filter systems, at the cost of less than 0.2 Mb TCAM chip.

military communications conference | 2015

An automatic approach to extract the formats of network and security log messages

Jing Ya; Tingwen Liu; Haoliang Zhang; Jinqiao Shi; Li Guo

Analyzing massive network and security logs that record network events is crucial for diagnosing network anomalies in large-scale network environments. Extracting log message formats is an important and necessary step to achieve the goal. However, it is time-consuming and costly to automatically and efficiently extract log message formats from massive network and security logs of many different types, which are generated by the increasing number of network and security devices and services used in large-scale networks. In this paper, we propose log template extraction (LTE), an approach that is semantics aware of network and security logs to address the problem. LTE first cleans log messages and then clusters the cleaned log messages based on the DBSCAN algorithm. At last it infers message templates by LDA Gibbs sampling algorithm. We evaluate our work on massive amount of network log messages collected from a large production network. Experimental results show that LTE approach infers and gets multiple log message formats at the same time with more than 90% accuracy and 100% recall.

international conference on conceptual structures | 2016

Identifying Users across Different Sites Using Usernames

Yubin Wang; Tingwen Liu; Qingfeng Tan; Jinqiao Shi; Li Guo

Identifying users across different sites is to find the accounts that belong to the same individual. The problem is fundamental and important, and its results can benefit many applications such as social recommendation. Observing that 1) usernames are essential elements for all sites; 2) most users have limited number of usernames on the Internet; 3) usernames carries information that reflect an individuals characteristics and habits etc., this paper tries to identify users based on username similarity. Specifically, we introduce the self-information vector model to integrate our proposed content and pattern features extracted from usernames into vectors. In this paper, we define two usernames similarity as the cosine similarity between their self-information vectors. We further propose an abbreviation detection method to discover the initialism phenomenon in usernames, which can improve our user identification results. Experimental results on real-world username sets show that we can achieve 86.19% precision rate, 68.53% recall rate and 76.21% F1-measure in average, which is better than the state-of-the-art work.

acm symposium on applied computing | 2011

Improving matching performance of DPI traffic classifier

Tingwen Liu; Yong Sun; Li Guo; Binxing Fang

Traffic classification through DPI technology is considered spending most CPU time in pattern matching, leading to the conclusion that it is not suitable for classifying traffic online on high speed networks. In this paper we focus on how to improve matching performance. We believe that performance can be improved by exploiting some characteristics of network traffic: magic first symbol and zipf-like distribution of application traffic. To the best of our knowledge, we are the first to observe and utilize them in traffic classification. In this paper, we analysis the expected matching times per flow before it is classified. Then, we introduce an enhanced traffic classification engine with the help of above characteristics and some optimizations, which has the same matching accuracy with the original L7-filter engine. We evaluate the enhanced engine, the result shows that it can improve matching performance with one order of magnitude, at the cost of a negligible increase in memory consumption. Furthermore, it does not depend on network environments and not require any training phase.

international conference on conceptual structures | 2017

Mining Host Behavior Patterns From Massive Network and Security Logs

Jing Ya; Tingwen Liu; Quangang Li; Jinqiao Shi; Haoliang Zhang; Pin Lv; Li Guo

Abstract Mining host behavior patterns from massive logs plays an important and crucial role in anomalies diagnosing and management for large-scale networks. Almost all prior work gives a macroscopic link analysis of network events, but fails to microscopically analyze the evolution of behavior patterns for each host in networks. In this paper, we propose a novel approach, namely Log Mining for Behavior Pattern (LogM4BP), to address the limitations of prior work. LogM4BP builds a statistical model that captures each host’s network behavior patterns with the nonnegative matrix factorization algorithm, and finally improve the interpretation and comparability of behavior patterns, and reduce the complexity of analysis. The work is evaluated on a public data set captured from a big marketing company. Experimental results show that it can describe network behavior patterns clearly and accurately, and the significant evolution of behavior patterns can be mapped to anomaly events in real world intuitively.

knowledge science, engineering and management | 2016

An Unsupervised Framework Towards Sci-Tech Compound Entity Recognition

Yang Yan; Tingwen Liu; Li Guo; Jiapeng Zhao; Jinqiao Shi

Classifying sci-tech compound named entities, such as the names of patents and projects, plays an important role in enhancing many high-level applications. However, there are very little work on this novel and hard problem. Traditional sequence labeling strategies cannot apply on sci-tech compound entities due to heavy cost of human annotation and low data redundancy. This paper concludes three intrinsic characteristics of sci-tech compound entities, and further proposes a generic and unsupervised framework named SCSegVal to address the problem. Our SCSegVal consists of two components: text splitting and segment validating. We reduce the best split of a text to the problem of maximizing the stickiness sum of segments. The construction of indicative words used in segment validating is reduced to the classical minimum set cover problem. Experimental results on classifying real-world science-technology entities show that SCSegVal achieves a sharp increase comparing with the classical supervised HMM-based approach.

Explore More