Quanqing Xu
Peking University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Quanqing Xu.
international conference on data engineering | 2008
Bin Cui; Hua Lu; Quanqing Xu; Lijiang Chen; Yafei Dai; Yongluan Zhou
Skyline queries are capable of retrieving interesting points from a large data set according to multiple criteria. Most work on skyline queries so far has assumed a centralized storage, whereas in practice relevant data are often distributed among geographically scattered sites. In this work, we tackle constrained skyline queries in large-scale distributed environments without the assumption of any overlay structures, and propose a novel algorithm named PaDSkyline (Parallel distributed Skyline query processing). PaDSkyline significantly shortens the response time by performing parallel processing over site groups produced by a partition algorithm. Within each group, it locally optimizes the query processing over distributed sites. It also drastically enhances the network transmission efficiency by performing early reduction of skyline candidates with deliberately selected multiple filtering points. Results of extensive experiments demonstrate the efficiency and robustness of our proposals.
international conference on distributed computing systems | 2008
Lijiang Chen; Bin Cui; Hua Lu; Linhao Xu; Quanqing Xu
An interesting problem in peer-based data management is efficient support for skyline queries within a multiattribute space. A skyline query retrieves from a set of multidimensional data points a subset of interesting points, compared to which no other points are better. Skyline queries play an important role in multi-criteria decision making and user preference applications. In this paper, we address the skyline computing problem in a structured P2P network. We exploit the iMinMax(thetas) transformation to map high-dimensional data points to 1-dimensional values. All transformed data points are then distributed on a structured P2P network called BATON, where all peers are virtually organized as a balanced binary search tree. Subsequently, a progressive algorithm is proposed to compute skyline in the distributed P2P network. Further, we propose an adaptive skyline filtering technique to reduce both processing cost and communication cost during distributed skyline computing. Our performance study, with both synthetic and real datasets, shows that the proposed approach can dramatically reduce transferred data volume and gain quick response time.
IEEE Transactions on Knowledge and Data Engineering | 2009
Bin Cui; Lijiang Chen; Linhao Xu; Hua Lu; Guojie Song; Quanqing Xu
An increasing number of large-scale applications exploit peer-to-peer network architecture to provide highly scalable and flexible services. Among these applications, data management in peer-to-peer systems is one of the interesting domains. In this paper, we investigate the multidimensional skyline computation problem on a structured peer-to-peer network. In order to achieve low communication cost and quick response time, we utilize the iMinMax(\theta ) method to transform high-dimensional data to one-dimensional value and distribute the data in a structured peer-to-peer network called BATON. Thereafter, we propose a progressive algorithm with adaptive filter technique for efficient skyline computation in this environment. We further discuss some optimization techniques for the algorithm, and summarize the key principles of our algorithm into a query routing protocol with detailed analysis. Finally, we conduct an extensive experimental evaluation to demonstrate the efficiency of our approach.
International Journal of Parallel Programming | 2015
Quanqing Xu; Liang Zhao; Mingzhong Xiao; Anna Liu; Yafei Dai
In this paper, we present YuruBackup, a space-efficient and highly scalable incremental backup system in the cloud. YuruBackup enables fine-grained data de-duplication with hierarchical partitioning to improve space efficiency to reduce bandwidth of both backup and restore processes, and storage costs. On the other hand, YuruBackup explores a highly scalable architecture for fingerprint servers that allows to add one or more fingerprint servers dynamically to cope with increasing clients. In this architecture, the fingerprint servers in a DB cluster are used for scaling writes of fingerprint catalog, while the slaves are used for scaling reads of fingerprint catalog. We present the system architecture of YuruBackup and its components, and we have implemented a proof-of-concept prototype of YuruBackup. By conducting performance evaluation in a public cloud, experimental results demonstrate the efficiency of the system.
international ifip tc networking conference | 2009
Quanqing Xu; Heng Tao Shen; Bin Cui; Xiaoxiao Hou; Yafei Dai
DHT (Distributed Hash Table) is a structured overlay network that is widely utilized in P2P systems. Existing content distribution approaches do not completely exploit features of DHT and incur heavy network bandwidth consumption. This paper analyzes existing content distribution approaches including synchronous content distribution method in eMule based on DHT overlay networks and points out that their network loads are too heavy. We propose a novel content distribution algorithm: asynchronous distribution, in DHT networks. Compared with traditional distribution approaches, it is more effective and scalable with lower network load. We apply the techniques of vector space model and search frequency to the asynchronous distribution algorithm, which effectively improves search hit ratio and reduces network load. Simulation results based on real data from Maze system show that this approach has low network overhead and publishing cost, high search and download hit ratio.
international conference on computational science | 2009
Quanqing Xu; Heng Tao Shen; Zaiben Chen; Bin Cui; Xiaofang Zhou; Yafei Dai
Mobile P2P networks have potential applications in many fields, making them a focus of current research. However, mobile P2P networks are subject to the limitations of transmission range, wireless bandwidth, and highly dynamic network topology, giving rise to many new challenges for efficient search. In this paper, we propose a hybrid search approach, which is automatic and economical in mobile P2P networks. The region covered by a mobile P2P network is partitioned into subregions, each of which can be identified by a unique ID and known to all peers. All the subregions then construct a mobile Kademlia (MKad) network. The proposed hybrid retrieval approach aims to utilize flooding-based and DHT-based schemes in MKad for indexing and searching according to designed utility functions. Our experiments show that the proposed approach is more accurate and efficient than existing methods.
Frontiers of Computer Science in China | 2009
Quanqing Xu; Heng Tao Shen; Zaiben Chen; Bin Cui; Xiaofang Zhou; Yafei Dai
The concept of Peer-to-Peer (P2P) has been introduced into mobile networks, which has led to the emergence of mobile P2P networks, and originated potential applications in many fields. However,mobile P2P networks are subject to the limitations of transmission range, and highly dynamic and unpredictable network topology, giving rise to many new challenges for efficient information retrieval. In this paper, we propose an automatic and economical hybrid information retrieval approach based on cooperative cache. In this method, the region covered by a mobile P2P network is partitioned into subregions, each of which is identified by a unique ID and known to all peers. All the subregions then constitute a mobile Kademlia (MKad) network. The proposed hybrid retrieval approach aims to utilize the floodingbased and Distributed Hash Table (DHT)-based schemes in MKad for indexing and searching according to the designed utility functions. To further facilitate information retrieval, we present an effective cache update method by considering all relevant factors. At the same time, the combination of two different methods for cache update is also introduced. One of them is pull based on time stamp including two different pulls: an on-demand pull and a periodical pull, and the other is a push strategy using update records. Furthermore, we provide detailed mathematical analysis on the cache hit ratio of our approach. Simulation experiments in NS-2 showed that the proposed approach is more accurate and efficient than the existing methods.
Future Generation Computer Systems | 2018
Yongjun Li; Zhen Zhang; You Peng; Hongzhi Yin; Quanqing Xu
Abstract Matching user accounts can help us build better users’ profiles and benefit many applications. It has attracted much attention from both industry and academia. Most of existing works are mainly based on rich user profile attributes. However, in many cases, user profile attributes are unavailable, incomplete or unreliable, either due to the privacy settings or just because users decline to share their information. This makes the existing schemes quite fragile. Users often share their activities on different social networks. This provides an opportunity to overcome the above problem. We aim to address the problem of user identification based on User Generated Content (UGC). We first formulate the problem of user identification based on UGCs and then propose a UGC-based user identification model. A supervised machine learning based solution is presented. It has three steps: firstly, we propose several algorithms to measure the spatial similarity, temporal similarity and content similarity of two UGCs; secondly, we extract the spatial, temporal and content features to exploit these similarities; afterwards, we employ the machine learning method to match user accounts, and conduct the experiments on three ground truth datasets. The results show that the proposed method has given excellent performance with F1 values reaching 89.79%, 86.78% and 86.24% on three ground truth datasets, respectively. This work presents the possibility of matching user accounts with high accessible online data.
international world wide web conferences | 2017
Yongjun Li; You Peng; Zhen Zhang; Quanqing Xu; Hongzhi Yin
The display names that an individual uses in various online social networks always contain some redundant information because some people tend to use the similar names across different networks to make them easier to remember or to build their online reputation. In this paper, we aim to measure the redundant information between different display names of the same individual. Based on the crosssite linking function, we first develop a specific distributed crawler to extract the display names that individuals select for different social networks, and we give an overview on the display names we extracted. Then we measure and analyze the redundant information in three ways: length similarity, character similarity and letter distribution similarity, comparing with display names of different individuals. We also analyze the evolution of redundant information over time. We find 45% of users tend to use the same display name across OSNs. Our findings also demonstrate that display names of the same individual show high similarity. The evolution analysis results show that redundant information is time-independent. Awareness of the redundant information between the display names can benefit many applications, such as user identification across social networks.
IEEE Access | 2017
Yongjun Li; You Peng; Wenli Ji; Zhen Zhang; Quanqing Xu
User identification is very helpful for building a better profile of a user. Some works have been devoted to this issue. However, the existing works with a good performance are mainly based on the rich online data and do not consider the cost of online data acquisition. In this paper, we aim to address this issue with a lower cost of data acquisition. A machine learning-based solution is proposed solely based on the user’s display names. It consists of three key steps: we first analyze the users’ unique naming patterns that lead to information redundancies across sites; second, we construct features that exploit information redundancies; afterward, we employ machine learning method for user identification. The experiment shows that the proposed solution can provide excellent performance with F1 score reaching 96.24%, 92.49%, and 90.68% on three real different data sets, respectively. This paper shows the possibility of user identification with a lower cost of data acquisition.