Yubao Liu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yubao Liu is active.

Explore More

Publication

Featured researches published by Yubao Liu.

Journal of Computer Science and Technology | 2008

Clustering Text Data Streams

Yubao Liu; Jiarong Cai; Jian Yin; Ada Wai-Chee Fu

Clustering text data streams is an important issue in data mining community and has a number of applications such as news group filtering, text crawling, document organization and topic detection and tracing etc. However, most methods are similarity-based approaches and only use the TF*IDF scheme to represent the semantics of text data and often lead to poor clustering quality. Recently, researchers argue that semantic smoothing model is more efficient than the existing TF*IDF scheme for improving text clustering quality. However, the existing semantic smoothing model is not suitable for dynamic text data context. In this paper, we extend the semantic smoothing model into text data streams context firstly. Based on the extended model, we then present two online clustering algorithms OCTS and OCTSM for the clustering of massive text data streams. In both algorithms, we also present a new cluster statistics structure named cluster profile which can capture the semantics of text data streams dynamically and at the same time speed up the clustering process. Some efficient implementations for our algorithms are also given. Finally, we present a series of experimental results illustrating the effectiveness of our technique.

very large data bases | 2008

Efficient skyline querying with variable user preferences on nominal attributes

Raymond Chi-Wing Wong; Ada Wai-Chee Fu; Jian Pei; Yip Sing Ho; Tai Wong; Yubao Liu

Current skyline evaluation techniques assume a fixed ordering on the attributes. However, dynamic preferences on nominal attributes are more realistic in known applications. In order to generate online response for any such preference issued by a user, one obvious solution is to enumerate all possible preferences and materialize all results of these preferences. However, the pre-processing and storage requirements of a full materialization are typically prohibitive. Instead, we propose a semi-materialization method called the IPO-tree Search which stores partial useful results only. With these partial results, the result of each possible preference can be returned efficiently. We have also conducted experiments to show the efficiency of our proposed algorithm.

international conference on management of data | 2014

Efficient algorithms for optimal location queries in road networks

Zhitong Chen; Yubao Liu; Raymond Chi-Wing Wong; Jiamin Xiong; Ganglin Mai; Cheng Long

In this paper, we study the optimal location query problem based on road networks. Specifically, we have a road network on which some clients and servers are located. Each client finds the server that is closest to her for service and her cost of getting served is equal to the (network) distance between the client and the server serving her multiplied by her weight or importance. The optimal location query problem is to find a location for setting up a new server such that the maximum cost of clients being served by the servers (including the new server) is minimized. This problem has been studied before, but the state-of-the-art is still not efficient enough. In this paper, we propose an efficient algorithm for the optimal location query problem, which is based on a novel idea of \emph{nearest location component}. We also discuss three extensions of the optimal location query problem, namely the optimal multiple-location query problem, the optimal location query problem on 3D road networks, and the optimal location query problem with another objective. Extensive experiments were conducted which showed that our algorithms are faster than the state-of-the-art by at least an order of magnitude on large real benchmark datasets. For example, on our largest real datasets, the state-of-the-art ran for more than 10 hours but our algorithm ran within 3 minutes only (i.e., >200 times faster).

very large data bases | 2011

Maximizing bichromatic reverse nearest neighbor for Lp-norm in two- and three-dimensional spaces

Raymond Chi-Wing Wong; M. Tamer Özsu; Ada Wai-Chee Fu; Philip S. Yu; Lian Liu; Yubao Liu

Bichromatic reverse nearest neighbor (BRNN) has been extensively studied in spatial database literature. In this paper, we study a related problem called MaxBRNN: find an optimal region that maximizes the size of BRNNs for Lp-norm in two- and three- dimensional spaces. Such a problem has many real-life applications, including the problem of finding a new server point that attracts as many customers as possible by proximity. A straightforward approach is to determine the BRNNs for all possible points that are not feasible since there are a large (or infinite) number of possible points. To the best of our knowledge, there are no existing algorithms which solve MaxBRNN for any Lp-norm space of two- and three-dimensionality. Based on some interesting properties of the problem, we come up with an efficient algorithm called MaxOverlap for to solve this problem. Extensive experiments are conducted to show that our algorithm is efficient.

Knowledge and Information Systems | 2013

A new approach for maximizing bichromatic reverse nearest neighbor search

Yubao Liu; Raymond Chi-Wing Wong; Ke Wang; Zhijie Li; Cheng Chen; Zhitong Chen

Maximizing bichromatic reverse nearest neighbor (MaxBRNN) is a variant of bichromatic reverse nearest neighbor (BRNN). The purpose of the MaxBRNN problem is to find an optimal region that maximizes the size of BRNNs. This problem has lots of real applications such as location planning and profile-based marketing. The best-known algorithm for the MaxBRNN problem is called MaxOverlap. In this paper, we study the MaxBRNN problem and propose a new approach called MaxSegment for a two-dimensional space when the

web age information management | 2007

(α, k)-anonymity based privacy preservation by lossy join

Raymond Chi-Wing Wong; Yubao Liu; Jian Yin; Zhilan Huang; Ada Wai-Chee Fu; Jian Pei

advanced data mining and applications | 2007

Clustering Massive Text Data Streams by Semantic Smoothing Model

Yubao Liu; Jiarong Cai; Jian Yin; Ada Wai-Chee Fu

L_2

Information Sciences | 2015

Rotating MaxRS queries

Zhitong Chen; Yubao Liu; Raymond Chi-Wing Wong; Jiamin Xiong; Xiuyuan Cheng; Peihuan Chen

Frontiers of Computer Science in China | 2012

An efficient method for privacy preserving location queries

Yubao Liu; Xiuwei Chen; Zhan Li; Zhijie Li; Raymond Chi-Wing Wong

-norm is used. Then, we extend our algorithm to other variations of the MaxBRNN problem such as the MaxBRNN problem with other metric spaces, and a three-dimensional space. Finally, we conducted experiments on real and synthetic datasets to compare our proposed algorithm with existing algorithms. The experimental results verify the efficiency of our proposed approach.

web-age information management | 2009

Efficient Detection of Discords for Time Series Stream

Yubao Liu; Xiuwei Chen; Fei Wang; Jian Yin

Privacy-preserving data publication for data mining is to protect sensitive information of individuals in published data while the distortion to the data is minimized. Recently, it is shown that (α, k)- anonymity is a feasible technique when we are given some sensitive attribute(s) and quasi-identifier attributes. In previous work, generalization of the given data table has been used for the anonymization. In this paper, we show that we can project the data onto two tables for publishing in such a way that the privacy protection for (α, k)-anonymity can be achieved with less distortion. In the two tables, one table contains the undisturbed non-sensitive values and the other table contains the undisturbed sensitive values. Privacy preservation is guaranteed by the lossy join property of the two tables. We show by experiments that the results are better than previous approaches.

Explore More