Weixiong Rao | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Weixiong Rao is active.

Explore More

Publication

Featured researches published by Weixiong Rao.

IEEE Transactions on Parallel and Distributed Systems | 2010

Optimal Resource Placement in Structured Peer-to-Peer Networks

Weixiong Rao; Lei Chen; Ada Wai-Chee Fu; Guoren Wang

Utilizing the skewed popularity distribution in P2P systems, common in Gnutella and KazaA like P2P applications, we propose an optimal resource (replica or link) placement strategy, which can optimally trade off the performance gain and paid cost. The proposed resource placement strategy, with better results than existing works, can be generally applied in randomized P2P systems (Symphony) and deterministic P2P systems (e.g., Chord, Pastry, Tapestry, etc.). We apply the proposed resource placement strategy, respectively, to two novel applications: PCache (a P2P-based caching scheme) and PRing (a P2P ring structure). The simulation results as well as a real deployment on Planetlab demonstrate the effectiveness of the proposed resource placement strategy in reducing the average search cost of the whole system.

Scientific Reports | 2015

Explaining the power-law distribution of human mobility through transportation modality decomposition.

Kai Zhao; Mirco Musolesi; Pan Hui; Weixiong Rao; Sasu Tarkoma

Human mobility has been empirically observed to exhibit Lévy flight characteristics and behaviour with power-law distributed jump size. The fundamental mechanisms behind this behaviour has not yet been fully explained. In this paper, we propose to explain the Lévy walk behaviour observed in human mobility patterns by decomposing them into different classes according to the different transportation modes, such as Walk/Run, Bike, Train/Subway or Car/Taxi/Bus. Our analysis is based on two real-life GPS datasets containing approximately 10 and 20 million GPS samples with transportation mode information. We show that human mobility can be modelled as a mixture of different transportation modes, and that these single movement patterns can be approximated by a lognormal distribution rather than a power-law distribution. Then, we demonstrate that the mixture of the decomposed lognormal flight distributions associated with each modality is a power-law distribution, providing an explanation to the emergence of Lévy Walk patterns that characterize human mobility patterns.

international conference on computer communications | 2009

On Efficient Content Matching in Distributed Pub/Sub Systems

Weixiong Rao; Lei Chen; Ada Wai-Chee Fu; Hanhua Chen; Futai Zou

The efficiency of matching structures is the key issue for content publish/subscribe systems. In this paper, we propose an efficient matching tree structure, named CobasTree, for a distributed environment. Particularly, we model a predicate in each subscription filter as an interval and published content value as a data point. The CobasTree is designed to index all subscription intervals and a matching algorithm is proposed to match the data points to these indexed intervals. Through a set of techniques including selective multicast by bounding intervals, cost model-based interval division, and CobasTree merging, CobasTree can match the published contents against subscription filters with a high efficiency. We call the whole framework including CobasTree and the associated techniques as COBAS. The performance evaluation in simulation environment and PlanetLab environment shows COBAS significantly outperforms two counterparts with low cost and fast forwarding.

conference on information and knowledge management | 2007

Optimal proactive caching in peer-to-peer network: analysis and application

Weixiong Rao; Lei Chen; Ada Wai-Chee Fu; Yingyi Bu

As a promising new technology with the unique properties like high efficiency, scalability and fault tolerance, Peer-to-Peer (P2P) technology is used as the underlying network to build new Internet-scale applications. However, one of the well known issues in such an application (for example WWW) is that the distribution of data popularities is heavily tailed with a Zipf-like distribution. With consideration of the skewed popularity we adopt a proactive caching approach to handle the challenge, and focus on two key problems: where (i.e. the placement strategy: where to place the replicas) and how (i.e. the degree problem: how many replicas are assigned to one specific content)? For the where problem, we propose a novel approach which can be generally applied to structured P2P networks. Next, we solve two optimization objectives related to the how problem: MAX_PERF and MIN_COST. Our solution is called <B>PoPCache</B>, and we discover two interesting properties: (1) the number of replicas assigned to each content is proportional to its popularity; (2) the derived optimal solutions are related to the entropy of popularity. To our knowledge, none of the previous works has mentioned such results. Finally, we apply the results of PoPCache to propose a P2P base web caching, called as Web-PoPCache. By means of web cache trace driven simulation, our extensive evaluation results demonstrate the advantages of PoPCache and Web-PoPCache.

IEEE Transactions on Mobile Computing | 2015

Towards Maximizing Timely Content Delivery in Delay Tolerant Networks

Weixiong Rao; Kai Zhao; Yan Zhang; Pan Hui; Sasu Tarkoma

Many applications, such as product promotion advertisement and traffic congestion notification, benefit from opportunistic content exchange in Delay Tolerant Networks (DTNs). An important requirement of such applications is timely delivery. However, the intermittent connectivity of DTNs may significantly delay content exchange, and cannot guarantee timely delivery. The state-of-the-arts capture mobility patterns or social properties of mobile devices. Such solutions do not capture patterns of delivered content in order to optimize content delivery. Without such optimization, the content demanded by a large number of subscribers could follow the same forwarding path as the content by only one subscriber, leading to traffic congestion and packet drop. To address the challenge, in this paper, we develop a solution framework, namely Ameba, for timely delivery. In detail, we first leverage content properties to derive an optimal routing hop count of each content to maximize the number of needed nodes. Next, we develop node utilities to capture interests, capacity and locations of mobile devices. Finally, the distributed forwarding scheme leverages the optimal routing hop count and node utilities to deliver content towards the needed nodes in a timely manner. Illustrative results verify that Ameba achieves comparable delivery ratio as Epidemic but with much lower overhead.

World Wide Web | 2014

Evaluating continuous top-k queries over document streams

Weixiong Rao; Lei Chen; Shudong Chen; Sasu Tarkoma

At the age of Web 2.0, Web content becomes live, and users would like to automatically receive content of interest. Popular RSS subscription approach cannot offer fine-grained filtering approach. In this paper, we propose a personalized subscription approach over the live Web content. The document is represented by pairs of terms and weights. Meanwhile, each user defines a top-k continuous query. Based on an aggregation function to measure the relevance between a document and a query, the user continuously receives the top-k most relevant documents inside a sliding window. The challenge of the above subscription approach is the high processing cost, especially when the number of queries is very large. Our basic idea is to share evaluation results among queries. Based on the defined covering relationship of queries, we identify the relations of aggregation scores of such queries and develop a graph indexing structure (GIS) to maintain the queries. Next, based on the GIS, we propose a document evaluation algorithm to share query results among queries. After that, we re-use evaluation history documents, and design a document indexing structure (DIS) to maintain the history documents. Finally, we adopt a cost model-based approach to unify the approaches of using GIS and DIS. The experimental results show that our solution outperforms the previous works using the classic inverted list structure.

IEEE Transactions on Knowledge and Data Engineering | 2013

Toward Efficient Filter Privacy-Aware Content-Based Pub/Sub Systems

Weixiong Rao; Lei Chen; Sasu Tarkoma

In recent years, the content-based publish/subscribe [12], [22] has become a popular paradigm to decouple information producers and consumers with the help of brokers. Unfortunately, when users register their personal interests to the brokers, the privacy pertaining to filters defined by honest subscribers could be easily exposed by untrusted brokers, and this situation is further aggravated by the collusion attack between untrusted brokers and compromised subscribers. To protect the filter privacy, we introduce an anonymizer engine to separate the roles of brokers into two parts, and adapt the k-anonymity and `-diversity models to the contentbased pub/sub. When the anonymization model is applied to protect the filter privacy, there is an inherent tradeoff between the anonymization level and the publication redundancy. By leveraging partial-order-based generalization of filters to track filters satisfying k-anonymity and ℓ-diversity, we design algorithms to minimize the publication redundancy. Our experiments show the proposed scheme, when compared with studied counterparts, has smaller forwarding cost while achieving comparable attack resilience.

international conference on distributed computing systems | 2012

MOVE: A Large Scale Keyword-Based Content Filtering and Dissemination System

Weixiong Rao; Lei Chen; Pan Hui; Sasu Tarkoma

The Web 2.0 era is characterized by the emergence of a very large amount of live content. A real time and fine grained content filtering approach can precisely keep users up-to-date the information that they are interested. The key of the approach is to offer a scalable match algorithm. One might treat the content match as a special kind of content search, and resort to the classic algorithm [5]. However, due to blind flooding, [5] cannot be simply adapted for scalable content match. To increase the throughput of scalable match, we propose an adaptive approach to allocate (i.e, replicate and partition) filters. The allocation is based on our observation on real datasets: most users prefer to use short queries, consisting of around 2-3 terms per query, and web content typically contains tens and even thousands of terms per article. Thus, by reducing the number of processed documents, we can reduce the latency of matching large articles with filters, and have chance to achieve higher throughput. We implement our approach on an open source project, Apache Cassandra. The experiment with real datasets shows that our approach can achieve around folds of better throughput than two counterpart state-of-the-arts solutions.

very large data bases | 2011

STAIRS: Towards efficient full-text filtering and dissemination in DHT environments

Weixiong Rao; Lei Chen; Ada Wai-Chee Fu

Nowadays “live” content, such as weblog, wikipedia, and news, is ubiquitous in the Internet. Providing users with relevant content in a timely manner becomes a challenging problem. Differing from Web search technologies and RSS feeds/reader applications, this paper envisions a personalized full-text content filtering and dissemination system in a highly distributed environment such as a Distributed Hash Table (DHT) based Peer-to-Peer (P2P) Network. Users subscribe to their interested content by specifying input keywords and thresholds as filters. Then, content is disseminated to those users having interest in it. In the literature, full-text document publishing in DHTs has suffered for a long time from the high cost of forwarding a document to home nodes of all distinct terms. It is aggravated by the fact that a document contains a large number of distinct terms (typically tens or thousands of terms per document). In this paper, we propose a set of novel techniques to overcome such a high forwarding cost by carefully selecting a very small number of meaningful terms (or key features) among candidate terms inside each document. Next, to reduce the average hop count per forwarding, we further prune irrelevant documents during the forwarding path. Experiments based on two real query logs and two real data sets demonstrate the effectiveness of our solution.

international conference on peer-to-peer computing | 2011

Towards optimal keyword-based content dissemination in DHT-based P2P networks

Weixiong Rao; Roman Vitenberg; Sasu Tarkoma

Keyword-based content alert services, e.g., Google Alerts and Microsoft Live Alerts, empower the end users with the ability to automatically receive useful and most recent content. In this paper, we leverage the favorable properties of DHTs, such as scalability, and propose a design of a scalable keyword-based content alert service. The DHT-based architecture matches textual documents with queries based on document terms: For each term, the implementation assigns a home node that is responsible for handling documents and queries that contain the term. The main challenge of this keyword-based matching scheme is the high number of terms that appear in a typical document resulting in a high publication cost. Fortunately, a document can be forwarded to the home nodes of a carefully selected subset of terms without incurring false negatives. In this paper we focus on the MTAF problem of minimizing the number of selected terms to forward the published content. We show that the problem is NP-hardness, and consider centralized and DHT-based solutions. Experimental results based on real datasets indicate that the proposed solutions are efficient compared to existing approaches. In particular, the similarity-based replication of filters that is a key element of our solution is shown to mitigate the effect of hotspots that arise due to the fact that some document terms are substantially more popular than the others, both inside documents and queries.

Explore More