Cheqing Jin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Cheqing Jin is active.

Explore More

Publication

Featured researches published by Cheqing Jin.

conference on information and knowledge management | 2003

Dynamically maintaining frequent items over a data stream

Cheqing Jin; Weining Qian; Chaofeng Sha; Jeffrey Xu Yu; Aoying Zhou

It is challenge to maintain frequent items over a data stream, with a small bounded memory, in a dynamic environment where both insertion/deletion of items are allowed. In this paper, we propose a new novel algorithm, called hCount, which can handle both insertion and deletion of items with a much less memory space than the best reported algorithm. Our algorithm is also superior in terms of precision, recall and processing time. In addition, our approach does not request the preknowledge on the size of range for a data stream, and can handle range extension dynamically. Given a little modification, algorithm hCount can be improved to hCount*, which even owns significantly better performance than before.

very large data bases | 2008

Sliding-window top-k queries on uncertain streams

Cheqing Jin; Ke Yi; Lei Chen; Jeffrey Xu Yu; Xuemin Lin

Query processing on uncertain data streams has attracted a lot of attentions lately, due to the imprecise nature in the data generated from a variety of streaming applications, such as readings from a sensor network. However, all of the existing works on uncertain data streams study unbounded streams. This paper takes the first step towards the important and challenging problem of answering sliding-window queries on uncertain data streams, with a focus on arguably one of the most important types of queries---top-k queries. The challenge of answering sliding-window top-k queries on uncertain data streams stems from the strict space and time requirements of processing both arriving and expiring tuples in high-speed streams, combined with the difficulty of coping with the exponential blowup in the number of possible worlds induced by the uncertain data model. In this paper, we design a unified framework for processing sliding-window top-k queries on uncertain streams. We show that all the existing top-k definitions in the literature can be plugged into our framework, resulting in several succinct synopses that use space much smaller than the window size, while are also highly efficient in terms of processing time. In addition to the theoretical space and time bounds that we prove for these synopses, we also present a thorough experimental report to verify their practical efficiency on both synthetic and real data.

Knowledge and Information Systems | 2008

Tracking clusters in evolving data streams over sliding windows

Aoying Zhou; Feng Cao; Weining Qian; Cheqing Jin

Mining data streams poses great challenges due to the limited memory availability and real-time query response requirement. Clustering an evolving data stream is especially interesting because it captures not only the changing distribution of clusters but also the evolving behaviors of individual clusters. In this paper, we present a novel method for tracking the evolution of clusters over sliding windows. In our SWClustering algorithm, we combine the exponential histogram with the temporal cluster features, propose a novel data structure, the Exponential Histogram of Cluster Features (EHCF). The exponential histogram is used to handle the in-cluster evolution, and the temporal cluster features represent the change of the cluster distribution. Our approach has several advantages over existing methods: (1) the quality of the clusters is improved because the EHCF captures the distribution of recent records precisely; (2) compared with previous methods, the mechanism employed to adaptively maintain the in-cluster synopsis can track the cluster evolution better, while consuming much less memory; (3) the EHCF provides a flexible framework for analyzing the cluster evolution and tracking a specific cluster efficiently without interfering with other clusters, thus reducing the consumption of computing resources for data stream clustering. Both the theoretical analysis and extensive experiments show the effectiveness and efficiency of the proposed method.

database systems for advanced applications | 2014

Probabilistic Reverse Top-k Queries

Cheqing Jin; Rong Zhang; Qiangqiang Kang; Zhao Zhang; Aoying Zhou

Ranking-aware query is one of the most fundamental queries in the database management field. The ranking query that returns top-k elements with maximal ranking scores according to a ranking function has been widely studied for decades. Recently, some researchers also focus on finding all customers who treat the given query object one of their top-k favorite elements, namely reverse top-k query. In such applications, each customer is described as a vector. However, none of the existing work has considered the uncertain data case for reverse top-k query, which is our focus. In this paper, we propose two methods to handle probabilistic reverse top-k query, namely BLS and ALS. As a basic solution, BLS approach checks each pair of user and product to find the query result. While as an advanced solution, ALS approach uses two pruning rules and historical information to significantly improve the efficiency. Both detailed analysis and experiments upon real and synthetic data sets illustrate the efficiency of our proposed methods.

database systems for advanced applications | 2016

Popular Route Planning with Travel Cost Estimation

Huiping Liu; Cheqing Jin; Aoying Zhou

With the increasing number of GPS-equipped vehicles, more and more trajectories are generated continuously, based on which some urban applications become feasible, such as route planning. In general, route planning aims at finding a path from source to destination to meet some specific requirements, i.e., the minimal travel time, fee or fuel consumption. Especially, some users may prefer popular route that has been travelled frequently. However, the existing work to find the popular route does not consider how to estimate the travelling cost. In this paper, we address this issue by devising a novel structure, called popular traverse graph, to summarize historical trajectories. Based on which an efficient route planning algorithm is proposed to search the popular route with minimal travel cost. The extensive experimental reports show that our method is both effective and efficient.

international conference on data engineering | 2015

Hotel recommendation based on user preference analysis

Kai Zhang; Keqiang Wang; Xiaoling Wang; Cheqing Jin; Aoying Zhou

Recommender system offers personalized suggestions by analyzing user preference. However, the performance falls sharply when it encounters sparse data, especially meets a cold start user. Hotel is such kind of goods that suffers a lot from sparsity issue due to extremely low rating frequency. In order to handle these issues, this paper proposes a novel hotel recommendation framework. The main contribution includes: 1) We combine collaboration filtering (CF) with content-based (CBF) method to overcome sparsity issue, while ensuring high accuracy. 2) Travel intents are introduced to provide additional information for user preference analysis. 3) To provide as broad as possible recommendations, diversity techniques are employed. 4) Several experiments are conducted on the real Ctrip1 dataset, the results show that the proposed hybrid framework is competitive against classical approaches.

database systems for advanced applications | 2016

TSCluWin: Trajectory Stream Clustering over Sliding Window

Jiali Mao; Qiuge Song; Cheqing Jin; Zhigang Zhang; Aoying Zhou

The popularity of GPS-embedded devices facilitates online monitoring of moving objects and analyzing movement behaviors in a real-time manner. Trajectory clustering acts as one of the most important trajectory analysis tasks, and the researches in this area have been studied extensively in the recent decade. Due to the rapid arrival rate and evolving feature of stream data, little effort has been devoted to online clustering trajectory data streams. In this paper, we propose a framework that consists of two phases, including a micro-clustering phase where a number of micro-clusters represented by compact synopsis data structures are incrementally maintained, and a macro-clustering phase where a small number of macro-clusters are generated based on micro-clusters. Experimental results show that our proposal is both effective and efficient to handle streaming trajectories without compromising the quality.

database systems for advanced applications | 2016

Real-Time Personalized Taxi-Sharing

Xiaoyi Duan; Cheqing Jin; Xiaoling Wang; Aoying Zhou; Kun Yue

Taxi-sharing is an efficient way to improve the utility of taxis by allowing multiple passengers to share a taxi. It also helps to relieve the traffic jams and air pollution. It is common that different users may have different attitudes towards the taxi-sharing scheduling plan, such as the fee to be paid and the additional time to the destination. However, this property has not been paid enough attention to in the traditional taxi-sharing systems --- the traditional focus is how to decrease the travel distance. We study the problem of personalized taxi-sharing in this paper, with the consideration of each passengers preference in payment, travel time and waiting time. We first define the satisfaction degree of each party involved in the scheduling plan, based on which two goals are defined to evaluate the overall plan, including MaxMin and MaxSum. Subsequently, we devise a two-phase framework to deal with this problem. The statistical information gathered during the offline phase will be used to hasten query processing during the online phase. Experimental reports upon the real dataset illustrate the effectiveness and efficiency of the proposed method.

international symposium on information science and engineering | 2008

High Performance Lattice Boltzmann Algorithms for Fluid Flows

Weibin Guo; Cheqing Jin; Jianhua Li

While the lattice Boltzmann method (LBM) has attracted much attention in the area of CFD in recent years, it has also been recognized that it is both computationally demanding and memory intensive. Extensive studies on improving the performance of LBM have been carried out. In this work, various efficient implementation algorithms of LBM are investigated in terms of computational performance and memory consumptions. More precisely, we consider four types high performance LB algorithms: efficient grid refinement, parallel, cache optimization and GPU-based algorithms.

international conference on data engineering | 2013

Similarity query processing for probabilistic sets

Ming Gao; Cheqing Jin; Wei Wang; Xuemin Lin; Aoying Zhou

Evaluating similarity between sets is a fundamental task in computer science. However, there are many applications in which elements in a set may be uncertain due to various reasons. Existing work on modeling such probabilistic sets and computing their similarities suffers from huge model sizes or significant similarity evaluation cost, and hence is only applicable to small probabilistic sets. In this paper, we propose a simple yet expressive model that supports many applications where one probabilistic set may have thousands of elements. We define two types of similarities between two probabilistic sets using the possible world semantics; they complement each other in capturing the similarity distributions in the cross product of possible worlds. We design efficient dynamic programming-based algorithms to calculate both types of similarities. Novel individual and batch pruning techniques based on upper bounding the similarity values are also proposed. To accommodate extremely large probabilistic sets, we also design sampling-based approximate query processing methods with strong probabilistic guarantees. We have conducted extensive experiments using both synthetic and real datasets, and demonstrated the effectiveness and efficiency of our proposed methods.

Explore More