Bin Cui | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bin Cui is active.

Explore More

Publication

Featured researches published by Bin Cui.

very large data bases | 2003

Supporting frequent updates in R-trees: a bottom-up approach

Mong Li Lee; Wynne Hsu; Christian S. Jensen; Bin Cui; Keng Lik Teo

Advances in hardware-related technologies promise to enable new data management applications that monitor continuous processes. In these applications, enormous amounts of state samples are obtained via sensors and are streamed to a database. Further, updates are very frequent and may exhibit locality. While the R-tree is the index of choice for multi-dimensional data with low dimensionality, and is thus relevant to these applications, R-tree updates are also relatively inefficient. We present a bottom-up update strategy for R-trees that generalizes existing update techniques and aims to improve update performance. It has different levels of reorganization--ranging from global to local--during updates, avoiding expensive top-down updates. A compact main-memory summary structure that allows direct access to the R-tree index nodes is used together with efficient bottom-up algorithms. Empirical studies indicate that the bottom-up strategy outperforms the traditional top-down technique, leads to indices with better query performance, achieves higher throughput, and is scalable.

conference on information and knowledge management | 2009

The use of categorization information in language models for question retrieval

Xin Cao; Gao Cong; Bin Cui; Christian S. Jensen; Ce Zhang

Community Question Answering (CQA) has emerged as a popular type of service meeting a wide range of information needs. Such services enable users to ask and answer questions and to access existing question-answer pairs. CQA archives contain very large volumes of valuable user-generated content and have become important information resources on the Web. To make the body of knowledge accumulated in CQA archives accessible, effective and efficient question search is required. Question search in a CQA archive aims to retrieve historical questions that are relevant to new questions posed by users. This paper proposes a category-based framework for search in CQA archives. The framework embodies several new techniques that use language models to exploit categories of questions for improving question-answer search. Experiments conducted on real data from Yahoo! Answers demonstrate that the proposed techniques are effective and efficient and are capable of outperforming baseline methods significantly.

international world wide web conferences | 2010

A generalized framework of exploring category information for question retrieval in community question answer archives

Xin Cao; Gao Cong; Bin Cui; Christian S. Jensen

Community Question Answering (CQA) has emerged as a popular type of service where users ask and answer questions and access historical question-answer pairs. CQA archives contain very large volumes of questions organized into a hierarchy of categories. As an essential function of CQA services, question retrieval in a CQA archive aims to retrieve historical question-answer pairs that are relevant to a query question. In this paper, we present a new approach to exploiting category information of questions for improving the performance of question retrieval, and we apply the approach to existing question retrieval models, including a state-of-the-art question retrieval model. Experiments conducted on real CQA data demonstrate that the proposed techniques are capable of outperforming a variety of baseline methods significantly.

international conference on data engineering | 2008

Parallel Distributed Processing of Constrained Skyline Queries by Filtering

Bin Cui; Hua Lu; Quanqing Xu; Lijiang Chen; Yafei Dai; Yongluan Zhou

Skyline queries are capable of retrieving interesting points from a large data set according to multiple criteria. Most work on skyline queries so far has assumed a centralized storage, whereas in practice relevant data are often distributed among geographically scattered sites. In this work, we tackle constrained skyline queries in large-scale distributed environments without the assumption of any overlay structures, and propose a novel algorithm named PaDSkyline (Parallel distributed Skyline query processing). PaDSkyline significantly shortens the response time by performing parallel processing over site groups produced by a partition algorithm. Within each group, it locally optimizes the query processing over distributed sites. It also drastically enhances the network transmission efficiency by performing early reduction of skyline candidates with deliberately selected multiple filtering points. Results of extensive experiments demonstrate the efficiency and robustness of our proposals.

ACM Transactions on Knowledge Discovery From Data | 2015

Modeling Location-Based User Rating Profiles for Personalized Recommendation

Hongzhi Yin; Bin Cui; Ling Chen; Zhiting Hu; Chengqi Zhang

This article proposes LA-LDA, a location-aware probabilistic generative model that exploits location-based ratings to model user profiles and produce recommendations. Most of the existing recommendation models do not consider the spatial information of users or items; however, LA-LDA supports three classes of location-based ratings, namely spatial user ratings for nonspatial items, nonspatial user ratings for spatial items, and spatial user ratings for spatial items. LA-LDA consists of two components, ULA-LDA and ILA-LDA, which are designed to take into account user and item location information, respectively. The component ULA-LDA explicitly incorporates and quantifies the influence from local public preferences to produce recommendations by considering user home locations, whereas the component ILA-LDA recommends items that are closer in both taste and travel distance to the querying users by capturing item co-occurrence patterns, as well as item location co-occurrence patterns. The two components of LA-LDA can be applied either separately or collectively, depending on the available types of location-based ratings. To demonstrate the applicability and flexibility of the LA-LDA model, we deploy it to both top-k recommendation and cold start recommendation scenarios. Experimental evidence on large-scale real-world data, including the data from Gowalla (a location-based social network), DoubanEvent (an event-based social network), and MovieLens (a movie recommendation system), reveal that LA-LDA models user profiles more accurately by outperforming existing recommendation models for top-k recommendation and the cold start problem.

Information & Software Technology | 2007

Efficient index-based KNN join processing for high-dimensional data

Cui Yu; Bin Cui; Shuguang Wang; Jianwen Su

In many advanced database applications (e.g., multimedia databases), data objects are transformed into high-dimensional points and manipulated in high-dimensional space. One of the most important but costly operations is the similarity join that combines similar points from multiple datasets. In this paper, we examine the problem of processing K-nearest neighbor similarity join (KNN join). KNN join between two datasets, R and S, returns for each point in R its K most similar points in S. We propose a new index-based KNN join approach using the iDistance as the underlying index structure. We first present its basic algorithm and then propose two different enhancements. In the first enhancement, we optimize the original KNN join algorithm by using approximation bounding cubes. In the second enhancement, we exploit the reduced dimensions of data space. We conducted an extensive experimental study using both synthetic and real datasets, and the results verify the performance advantage of our schemes over existing KNN join algorithms.

ACM Transactions on Information Systems | 2009

Bounded coordinate system indexing for real-time video clip search

Zi Huang; Heng Tao Shen; Jie Shao; Xiaofang Zhou; Bin Cui

Recently, video clips have become very popular online. The massive influx of video clips has created an urgent need for video search engines to facilitate retrieving relevant clips. Different from traditional long videos, a video clip is a short video often expressing a moment of significance. Due to the high complexity of video data, efficient video clip search from large databases turns out to be very challenging. We propose a novel video clip representation model called the Bounded Coordinate System (BCS), which is the first single representative capturing the dominating content and content—changing trends of a video clip. It summarizes a video clip by a coordinate system, where each of its coordinate axes is identified by principal component analysis (PCA) and bounded by the range of data projections along the axis. The similarity measure of BCS considers the operations of translation, rotation, and scaling for coordinate system matching. Particularly, rotation and scaling reflect the difference of content tendencies. Compared with the quadratic time complexity of existing methods, the time complexity of measuring BCS similarity is linear. The compact video representation together with its linear similarity measure makes real-time search from video clip collections feasible. To further improve the retrieval efficiency for large video databases, a two-dimensional transformation method called Bidistance Transformation (BDT) is introduced to utilize a pair of optimal reference points with respect to bidirectional axes in BCS. Our extensive performance study on a large database of more than 30,000 video clips demonstrates that BCS achieves very high search accuracy according to human judgment. This indicates that content tendencies are important in determining the meanings of video clips and confirms that BCS can capture the inherent moment of video clip to some extent that better resembles human perception. In addition, BDT outperforms existing indexing methods greatly. Integration of the BCS model and BDT indexing can achieve real-time search from large video clip databases.

ACM Transactions on Information Systems | 2016

Joint Modeling of User Check-in Behaviors for Real-time Point-of-Interest Recommendation

Hongzhi Yin; Bin Cui; Xiaofang Zhou; Weiqing Wang; Zi Huang; Shazia Wasim Sadiq

Point-of-Interest (POI) recommendation has become an important means to help people discover attractive and interesting places, especially when users travel out of town. However, the extreme sparsity of a user-POI matrix creates a severe challenge. To cope with this challenge, we propose a unified probabilistic generative model, the Topic-Region Model (TRM), to simultaneously discover the semantic, temporal, and spatial patterns of users’ check-in activities, and to model their joint effect on users’ decision making for selection of POIs to visit. To demonstrate the applicability and flexibility of TRM, we investigate how it supports two recommendation scenarios in a unified way, that is, hometown recommendation and out-of-town recommendation. TRM effectively overcomes data sparsity by the complementarity and mutual enhancement of the diverse information associated with users’ check-in activities (e.g., check-in content, time, and location) in the processes of discovering heterogeneous patterns and producing recommendations. To support real-time POI recommendations, we further extend the TRM model to an online learning model, TRM-Online, to track changing user interests and speed up the model training. In addition, based on the learned model, we propose a clustering-based branch and bound algorithm (CBB) to prune the POI search space and facilitate fast retrieval of the top-k recommendations. We conduct extensive experiments to evaluate the performance of our proposals on two real-world datasets, including recommendation effectiveness, overcoming the cold-start problem, recommendation efficiency, and model-training efficiency. The experimental results demonstrate the superiority of our TRM models, especially TRM-Online, compared with state-of-the-art competitive methods, by making more effective and efficient mobile recommendations. In addition, we study the importance of each type of pattern in the two recommendation scenarios, respectively, and find that exploiting temporal patterns is most important for the hometown recommendation scenario, while the semantic patterns play a dominant role in improving the recommendation effectiveness for out-of-town users.

international conference on data engineering | 2013

A unified model for stable and temporal topic detection from social media data

Hongzhi Yin; Bin Cui; Hua Lu; Yuxin Huang; Junjie Yao

Web 2.0 users generate and spread huge amounts of messages in online social media. Such user-generated contents are mixture of temporal topics (e.g., breaking events) and stable topics (e.g., user interests). Due to their different natures, it is important and useful to distinguish temporal topics from stable topics in social media. However, such a discrimination is very challenging because the user-generated texts in social media are very short in length and thus lack useful linguistic features for precise analysis using traditional approaches. In this paper, we propose a novel solution to detect both stable and temporal topics simultaneously from social media data. Specifically, a unified user-temporal mixture model is proposed to distinguish temporal topics from stable topics. To improve this models performance, we design a regularization framework that exploits prior spatial information in a social network, as well as a burst-weighted smoothing scheme that exploits temporal prior information in the time dimension. We conduct extensive experiments to evaluate our proposal on two real data sets obtained from Del.icio.us and Twitter. The experimental results verify that our mixture model is able to distinguish temporal topics from stable topics in a single detection process. Our mixture model enhanced with the spatial regularization and the burst-weighted smoothing scheme significantly outperforms competitor approaches, in terms of topic detection accuracy and discrimination in stable and temporal topics.

IEEE Transactions on Multimedia | 2010

Practical Online Near-Duplicate Subsequence Detection for Continuous Video Streams

Zi Huang; Heng Tao Shen; Jie Shao; Bin Cui; Xiaofang Zhou

Online video content is surging to an unprecedented level. Massive video publishing and sharing impose heavy demands on online near-duplicate detection for many novel video applications. This paper presents an accurate and practical system for online near-duplicate subsequence detection over continuous video streams. We propose to transform a video stream into a one-dimensional video distance trajectory (VDT) monitoring the continuous changes of consecutive frames with respect to a reference point, which is further segmented and represented by a sequence of compact signatures called linear smoothing functions (LSFs). LSFs of each subsequence of the incoming video stream are continuously generated and temporally stored in a buffer for comparison with query LSFs. LSF adopts compound probability to combine three independent video factors for effective segment similarity measure, which is then utilized to compute sequence similarity for near-duplicate detection. To avoid unnecessary sequence similarity computations, an efficient sequence skipping strategy is also embedded. Experimental results on detecting diverse near-duplicates of TV commercials in real video streams show the superior performance of our system on both effectiveness and efficiency over existing methods.

Explore More