Enhua Tan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Enhua Tan is active.

Explore More

Publication

Featured researches published by Enhua Tan.

internet measurement conference | 2005

Measurements, analysis, and modeling of BitTorrent-like systems

Lei Guo; Songqing Chen; Zhen Xiao; Enhua Tan; Xiaoning Ding; Xiaodong Zhang

Existing studies on BitTorrent systems are single-torrent based, while more than 85% of all peers participate in multiple torrents according to our trace analysis. In addition, these studies are not sufficiently insightful and accurate even for single-torrent models, due to some unrealistic assumptions. Our analysis of representative Bit-Torrent traffic provides several new findings regarding the limitations of BitTorrent systems: (1) Due to the exponentially decreasing peer arrival rate in reality, service availability in such systems becomes poor quickly, after which it is difficult for the file to be located and downloaded. (2) Client performance in the BitTorrent-like systems is unstable, and fluctuates widely with the peer population. (3) Existing systems could provide unfair services to peers, where peers with high downloading speed tend to download more and upload less. In this paper, we study these limitations on torrent evolution in realistic environments. Motivated by the analysis and modeling results, we further build a graph based multi-torrent model to study inter-torrent collaboration. Our model quantitatively provides strong motivation for inter-torrent collaboration instead of directly stimulating seeds to stay longer. We also discuss a system design to show the feasibility of multi-torrent collaboration.

IEEE Journal on Selected Areas in Communications | 2007

A performance study of BitTorrent-like peer-to-peer systems

Lei Guo; Songqing Chen; Zhen Xiao; Enhua Tan; Xiaoning Ding; Xiaodong Zhang

This paper presents a performance study of BitTorrent-like P2P systems by modeling, based on extensive measurements and trace analysis. Existing studies on BitTorrent systems are single-torrent based and usually assume the process of request arrivals to a torrent is Poisson-like. However, in reality, most BitTorrent peers participate in multiple torrents and file popularity changes over time. Our study of representative BitTorrent traffic provides insights into the evolution of single-torrent systems and several new findings regarding the limitations of BitTorrent systems: (1) Due to the exponentially decreasing peer arrival rate in a torrent, the service availability of the corresponding file becomes poor quickly, and eventually it is hard to locate and download this file. (2) Client performance in the BitTorrent-like system is unstable, and fluctuates significantly with the changes of the number of online peers. (3) Existing systems could provide unfair services to peers, where a peer with a higher downloading speed tends to download more and upload less. Motivated by the analysis and modeling results, we have further proposed a graph based model to study interactions among multiple torrents. Our model quantitatively demonstrates that inter-torrent collaboration is much more effective than stimulating seeds to serve longer for addressing the service unavailability in BitTorrent systems. An architecture for inter-torrent collaboration under an exchange based instant incentive mechanism is also discussed and evaluated by simulations

principles of distributed computing | 2008

The stretched exponential distribution of internet media access patterns

Lei Guo; Enhua Tan; Songqing Chen; Zhen Xiao; Xiaodong Zhang

The commonly agreed Zipf-like access pattern of Web workloads is mainly based on Internet measurements when text-based content dominated the Web traffic. However, with dramatic increase of media traffic on the Internet, the inconsistency between the access patterns of media objects and the Zipf model has been observed in a number of studies. An insightful understanding of media access patterns is essential to guide Internet system design and management, including resource provisioning and performance optimizations. In this paper, we have studied a large variety of media workloads collected from both client and server sides in different media systems with different delivery methods. Through extensive analysis and modeling, we find: (1) the object reference ranks of all these workloads follow the stretched exponential (SE) distribution despite their different media systems and delivery methods; (2) one parameter of this distribution well characterizes the media file sizes, the other well characterizes the aging of media accesses; (3) some biased measurements may lead to Zipf-like observations on media access patterns; and (4) the deviation of media access pattern from the Zipf model in these workloads increases along with the workload duration. We have further analyzed the effectiveness of media caching with a mathematical model. Compared with Web caching under the Zipf model, media caching under the SE model is far less effective unless the cache size is enormously large. This indicates that many previous studies based on a Zipf-like assumption have potentially overestimated the media caching benefit, while an effective media caching system must be able to scale its storage size to accommodate the increase of media content over a long time. Our study provides an analytical basis for applying a P2P model rather than a client-server model to build large scale Internet media delivery systems.

international conference on network protocols | 2007

PSM-throttling: Minimizing Energy Consumption for Bulk Data Communications in WLANs

Enhua Tan; Lei Guo; Songqing Chen; Xiaodong Zhang

While the 802.11 power saving mode (PSM) and its enhancements can reduce power consumption by putting the wireless network interface (WNI) into sleep as much as possible, they either require additional infrastructure support, or may degrade the transmission throughput and cause additional transmission delay. These schemes are not suitable for long and bulk data transmissions with strict QoS requirements on wireless devices. With increasingly abundant bandwidth available on the Internet, we have observed that TCP congestion control is often not a constraint of bulk data transmissions as bandwidth throttling is widely used in practice. In this paper, instead of further manipulating the trade-off between the power saving and the incurred delay, we effectively explore the power saving potential by considering the bandwidth throttling on streaming/downloading servers. We propose an application-independent protocol, called PSM-throttling. With a quick detection on the TCP flow throughput, a client can identify bandwidth throttling connections with a low cost Since the throttling enables us to reshape the TCP traffic into periodic bursts with the same average throughput as the server transmission rate, the client can accurately predict the arriving time of packets and turn on/off the WNI accordingly. PSM-throttling can minimize power consumption on TCP-based bulk traffic by effectively utilizing available Internet bandwidth without degrading the applications performance perceived by the user. Furthermore, PSM-throttling is client-centric, and does not need any additional infrastructure support. Our lab-environment and Internet-based evaluation results show that PSM-throttling can effectively improve energy savings (by up to 75%) and/or the QoS for a broad types of TCP-based applications, including streaming, pseudo streaming, and large file downloading, over existing PSM-like methods.

internet measurement conference | 2006

Delving into internet streaming media delivery: a quality and resource utilization perspective

Lei Guo; Enhua Tan; Songqing Chen; Zhen Xiao; Oliver Spatscheck; Xiaodong Zhang

Modern Internet streaming services have utilized various techniques to improve the quality of streaming media delivery. Despite the characterization of media access patterns and user behaviors in many measurement studies, few studies have focused on the streaming techniques themselves, particularly on the quality of streaming experiences they offer end users and on the resources of the media systems that they consume. In order to gain insights into current streaming services techniques and thus provide guidance on designing resource-efficient and high quality streaming media systems, we have collected a large streaming media workload from thousands of broadband home users and business users hosted by a major ISP, and analyzed the most commonly used streaming techniques such as automatic protocol switch, Fast Streaming, MBR encoding and rate adaptation. Our measurement and analysis results show that with these techniques, current streaming systems these techniques tend to over-utilize CPU and bandwidth resources to provide better services to end users, which may not be a desirable and effective is not necessary the best way to improve the quality of streaming media delivery. Motivated by these results, we propose and evaluate a coordination mechanism that effectively takes advantage of both Fast Streaming and rate adaptation to better utilize the server and Internet resources for streaming quality improvement.

measurement and modeling of computer systems | 2007

Does internet media traffic really follow Zipf-like distribution?

Lei Guo; Enhua Tan; Songqing Chen; Zhen Xiao; Xiaodong Zhang

It is commonly agreed that Web traffic follows the Zipf-like distribution, which is an analytical foundation for improving Web access performance by client-server based proxy caching systems on the Internet. However, some recent studies have observed non-Zipf-like distributions of Internet media traffic in different content delivery systems. Due to the variety of media delivery systems and the diversity of media content, existing studies on media traffic are largely workload specific, and the observed access patterns are often different from or even conflict with each other. For Web media systems, study [3] reports that the access pattern of streaming media is Zipf-like in a university campus network, while study [2] finds that it is not Zipf-like in an enterprise media server. For VoD media systems, study [1] finds that it is not Zipf-like in a multicast-based Media-on-Demand server of a campus network, while study [9] reports it is Zipf-like in a large VoD streaming system of an ISP. For P2P media systems, study [4] reports that the access pattern of media workload in KaZaa system collected in a campus network is not Zipf-like, while study [5] reports that it is Zipf-like in another campus network. For live streaming media systems, study [8] reports it is Zipf-like while study [6] reports it is not Zipf-like. A number of models have been proposed to explain the observed media access patterns, such as the generalized Zipf-like model [7], “fetch-at-most-once” model [4], and two-mode Zipf model [6]. However, each of these models can only explain a very limited scope of measurement results. A general model of Internet media access patterns is highly desirable for traffic engineering on the Internet and is critical to design, benchmark, and evaluate Internet media delivery systems. In this study, we have analyzed a wide variety of media workloads on the Internet. The workloads were collected from both the client side and the server side in Web, VoD, P2P, and live streaming environments between 1998 and 2006, where the media content is delivered via Web/P2P downloading or unicast/multicast streaming. The duration of these workloads ranges from a few days to more than two years and the user population ranges from several thousands to more than one hundred thousand. The number of client requests

conference on information and knowledge management | 2013

UNIK: unsupervised social network spam detection

Enhua Tan; Lei Guo; Songqing Chen; Xiaodong Zhang; Yihong Eric Zhao

Social network spam increases explosively with the rapid development and wide usage of various social networks on the Internet. To timely detect spam in large social network sites, it is desirable to discover unsupervised schemes that can save the training cost of supervised schemes. In this work, we first show several limitations of existing unsupervised detection schemes. The main reason behind the limitations is that existing schemes heavily rely on spamming patterns that are constantly changing to avoid detection. Motivated by our observations, we first propose a sybil defense based spam detection scheme SD2 that remarkably outperforms existing schemes by taking the social network relationship into consideration. In order to make it highly robust in facing an increased level of spam attacks, we further design an unsupervised spam detection scheme, called UNIK. Instead of detecting spammers directly, UNIK works by deliberately removing non-spammers from the network, leveraging both the social graph and the user-link graph. The underpinning of UNIK is that while spammers constantly change their patterns to evade detection, non-spammers do not have to do so and thus have a relatively non-volatile pattern. UNIK has comparable performance to SD2 when it is applied to a large social network site, and outperforms SD2 significantly when the level of spam attacks increases. Based on detection results of UNIK, we further analyze several identified spam campaigns in this social network site. The result shows that different spammer clusters demonstrate distinct characteristics, implying the volatility of spamming patterns and the ability of UNIK to automatically extract spam signatures.

international conference on distributed computing systems | 2007

SCAP: Smart Caching inWireless Access Points to Improve P2P Streaming

Enhua Tan; Lei Guo; Songqing Chen; Xiaodong Zhang

The increasing number of wireless users in Internet P2P applications causes two new performance problems due to the requirement of uploading the downloaded traffic for other peers, limited bandwidth of wireless communications, and resource competition between the access point and wireless stations. First, an active P2P wireless user can significantly reduce the downloading throughput of other wireless users in the WLAN. Second, the slowdown of a P2P wireless user communication can also delay its relay and data sharing service for other dependent wired/wireless peers. In order to address these problems, in this paper, we propose an efficient caching mechanism called SCAP (Smart Caching in Access Points). Conducting intensive Internet measurements on representative P2P streaming applications, we observe a high percentage of duplicated data packets in successive downloading and uploading data streams. Through duplication detection and caching at the access point, these duplicated packets can be compressed so that the uploading traffic in the WLAN is significantly reduced. Our prototype-based experimental evaluation demonstrates that by effectively reducing the redundant P2P traffic in the WLAN, SCAP improves the throughput of the WLAN by up to 88% and reduces the response delay to other Internet users meanwhile.

international conference on distributed computing systems | 2012

Spammer Behavior Analysis and Detection in User Generated Content on Social Networks

Enhua Tan; Lei Guo; Songqing Chen; Xiaodong Zhang; Yihong Eric Zhao

Spam content is surging with an explosive increase of user generated content (UGC) on the Internet. Spammers often insert popular keywords or simply copy and paste recent articles from the Web with spam links inserted, attempting to disable content-based detection. In order to effectively detect spam in user generated content, we first conduct a comprehensive analysis of spamming activities on a large commercial UGC site in 325 days covering over 6 million posts and nearly 400 thousand users. Our analysis shows that UGC spammers exhibit unique non-textual patterns, such as posting activities, advertised spam link metrics, and spam hosting behaviors. Based on these non-textual features, we show via several classification methods that a high detection rate could be achieved offline. These results further motivate us to develop a runtime scheme, BARS, to detect spam posts based on these spamming patterns. The experimental results demonstrate the effectiveness and robustness of BARS.

knowledge discovery and data mining | 2009