Zhifeng Bao | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zhifeng Bao is active.

Explore More

Publication

Featured researches published by Zhifeng Bao.

international conference on data engineering | 2009

Effective XML Keyword Search with Relevance Oriented Ranking

Zhifeng Bao; Tok Wang Ling; Bo Chen; Jiaheng Lu

Inspired by the great success of information retrieval (IR) style keyword search on the web, keyword search on XML has emerged recently. The difference between text database and XML database results in three new challenges: (1) Identify the user search intention, i.e. identify the XML node types that user wants to search for and search via. (2) Resolve keyword ambiguity problems: a keyword can appear as both a tag name and a text value of some node; a keyword can appear as the text values of different XML node types and carry different meanings. (3) As the search results are sub-trees of the XML document, new scoring function is needed to estimate its relevance to a given query. However, existing methods cannot resolve these challenges, thus return low result quality in term of query relevance. In this paper, we propose an IR-style approach which basically utilizes the statistics of underlying XML data to address these challenges. We first propose specific guidelines that a search engine should meet in both search intention identification and relevance oriented ranking for search results. Then based on these guidelines, we design novel formulae to identify the search for nodes and search via nodes of a query, and present a novel XML TF*IDF ranking strategy to rank the individual matches of all possible search intentions. Lastly, the proposed techniques are implemented in an XML keyword search engine called XReal, and extensive experiments show the effectiveness of our approach.

international conference on management of data | 2009

DDE: from dewey to a fully dynamic XML labeling scheme

Liang Xu; Tok Wang Ling; Huayu Wu; Zhifeng Bao

Labeling schemes lie at the core of query processing for many XML database management systems. Designing labeling schemes for dynamic XML documents is an important problem that has received a lot of research attention. Existing dynamic labeling schemes, however, often sacrifice query performance and introduce additional labeling cost to facilitate arbitrary updates even when the documents actually seldom get updated. Since the line between static and dynamic XML documents is often blurred in practice, we believe it is important to design a labeling scheme that is compact and efficient regardless of whether the documents are frequently updated or not. In this paper, we propose a novel labeling scheme called DDE (for Dynamic DEwey) which is tailored for both static and dynamic XML documents. For static documents, the labels of DDE are the same as those of dewey which yield compact size and high query performance. When updates take place, DDE can completely avoid re-labeling and its label quality is most resilient to the number and order of insertions compared to the existing approaches. In addition, we introduce Compact DDE (CDDE) which is designed to optimize the performance of DDE for insertions. Both DDE and CDDE can be incorporated into existing systems and applications that are based on dewey labeling scheme with minimum efforts. Experiment results demonstrate the benefits of our proposed labeling schemes over the previous approaches.

IEEE Transactions on Knowledge and Data Engineering | 2011

Extended XML Tree Pattern Matching: Theories and Algorithms

Jiaheng Lu; Tok Wang Ling; Zhifeng Bao; Chen Wang

As business and enterprises generate and exchange XML data more often, there is an increasing need for efficient processing of queries on XML data. Searching for the occurrences of a tree pattern query in an XML database is a core operation in XML query processing. Prior works demonstrate that holistic twig pattern matching algorithm is an efficient technique to answer an XML tree pattern with parent-child (P-C) and ancestor-descendant (A-D) relationships, as it can effectively control the size of intermediate results during query processing. However, XML query languages (e.g., XPath and XQuery) define more axes and functions such as negation function, order-based axis, and wildcards. In this paper, we research a large set of XML tree pattern, called extended XML tree pattern, which may include P-C, A-D relationships, negation functions, wildcards, and order restriction. We establish a theoretical framework about “matching cross” which demonstrates the intrinsic reason in the proof of optimality on holistic algorithms. Based on our theorems, we propose a set of novel algorithms to efficiently process three categories of extended XML tree patterns. A set of experimental results on both real-life and synthetic data sets demonstrate the effectiveness and efficiency of our proposed theories and algorithms.

IEEE Transactions on Knowledge and Data Engineering | 2010

Towards an Effective XML Keyword Search

Zhifeng Bao; Jiaheng Lu; Tok Wang Ling; Bo Chen

Inspired by the great success of information retrieval (IR) style keyword search on the web, keyword search on XML has emerged recently. The difference between text database and XML database results in three new challenges: 1) Identify the user search intention, i.e., identify the XML node types that user wants to search for and search via. 2) Resolve keyword ambiguity problems: a keyword can appear as both a tag name and a text value of some node; a keyword can appear as the text values of different XML node types and carry different meanings; a keyword can appear as the tag name of different XML node types with different meanings. 3) As the search results are subtrees of the XML document, new scoring function is needed to estimate its relevance to a given query. However, existing methods cannot resolve these challenges, thus return low result quality in term of query relevance. In this paper, we propose an IR-style approach which basically utilizes the statistics of underlying XML data to address these challenges. We first propose specific guidelines that a search engine should meet in both search intention identification and relevance oriented ranking for search results. Then, based on these guidelines, we design novel formulae to identify the search for nodes and search via nodes of a query, and present a novel XML TF*IDF ranking strategy to rank the individual matches of all possible search intentions. To complement our result ranking framework, we also take the popularity into consideration for the results that have comparable relevance scores. Lastly, extensive experiments have been conducted to show the effectiveness of our approach.

international conference on data engineering | 2012

Fast SLCA and ELCA Computation for XML Keyword Queries Based on Set Intersection

Junfeng Zhou; Zhifeng Bao; Wei Wang; Tok Wang Ling; Ziyang Chen; Xudong Lin; Jingfeng Guo

In this paper, we focus on efficient keyword query processing for XML data based on the SLCA and ELCA semantics. We propose a novel form of inverted lists for keywords which include IDs of nodes that directly or indirectly contain a given keyword. We propose a family of efficient algorithms that are based on the set intersection operation for both semantics. We show that the problem of SLCA/ELCA computation becomes finding a set of nodes that appear in all involved inverted lists and satisfy certain conditions. We also propose several optimization techniques to further improve the query processing performance. We have conducted extensive experiments with many alternative methods. The results demonstrate that our proposed methods outperform previous methods by up to two orders of magnitude in many cases.

IEEE Transactions on Image Processing | 2015

A Framework of Joint Graph Embedding and Sparse Regression for Dimensionality Reduction

Xiaoshuang Shi; Zhenhua Guo; Zhihui Lai; Yujiu Yang; Zhifeng Bao; David Zhang

Over the past few decades, a large number of algorithms have been developed for dimensionality reduction. Despite the different motivations of these algorithms, they can be interpreted by a common framework known as graph embedding. In order to explore the significant features of data, some sparse regression algorithms have been proposed based on graph embedding. However, the problem is that these algorithms include two separate steps: (1) embedding learning and (2) sparse regression. Thus their performance is largely determined by the effectiveness of the constructed graph. In this paper, we present a framework by combining the objective functions of graph embedding and sparse regression so that embedding learning and sparse regression can be jointly implemented and optimized, instead of simply using the graph spectral for sparse regression. By the proposed framework, supervised, semisupervised, and unsupervised learning algorithms could be unified. Furthermore, we analyze two situations of the optimization problem for the proposed framework. By adopting an ℓ2,1-norm regularization for the proposed framework, it can perform feature selection and subspace learning simultaneously. Experiments on seven standard databases demonstrate that joint graph embedding and sparse regression method can significantly improve the recognition performance and consistently outperform the sparse regression method.

international conference on management of data | 2015

Location-Aware Pub/Sub System: When Continuous Moving Queries Meet Dynamic Event Streams

Long Guo; Dongxiang Zhang; Guoliang Li; Kian-Lee Tan; Zhifeng Bao

In this paper, we propose a new location-aware pub/sub system, Elaps, that continuously monitors moving users subscribing to dynamic event streams from social media and E-commerce applications. Users are notified instantly when there is a matching event nearby. To the best of our knowledge, Elaps is the first to take into account continuous moving queries against dynamic event streams. Like existing works on continuous moving query processing,Elaps employs the concept of safe region to reduce communication overhead. However, unlike existing works which assume data from publishers are static, updates to safe regions may be triggered by newly arrived events. In Elaps, we develop a concept called \textit{impact region} that allows us to identify whether a safe region is affected by newly arrived events. Moreover, we propose a novel cost model to optimize the safe region size to keep the communication overhead low. Based on the cost model, we design two incremental methods, iGM and idGM, for safe region construction. In addition, Elaps uses boolean expression, which is more expressive than keywords, to model user intent and we propose a novel index, BEQ-Tree, to handle spatial boolean expression matching. In our experiments, we use geo-tweets from Twitter and venues from Foursquare to simulate publishers and boolean expressions generated from AOL search log to represent users intentions. We test user movement in both synthetic trajectories and real taxi trajectories. The results show that Elaps can significantly reduce the communication overhead and disseminate events to users in real-time.

international conference on data engineering | 2015

Real time personalized search on social networks

Yuchen Li; Zhifeng Bao; Guoliang Li; Kian-Lee Tan

Internet users are shifting from searching on traditional media to social network platforms (SNPs) to retrieve up-to-date and valuable information. SNPs have two unique characteristics: frequent content update and small world phenomenon. However, existing works are not able to support these two features simultaneously. To address this problem, we develop a general framework to enable real time personalized top-k query. Our framework is based on a general ranking function that incorporates time freshness, social relevance and textual similarity. To ensure efficient update and query processing, there are two key challenges. The first is to design an index structure that is update-friendly while supporting instant query processing. The second is to efficiently compute the social relevance in a complex graph. To address these challenges, we first design a novel 3D cube inverted index to support efficient pruning on the three dimensions simultaneously. Then we devise a cube based threshold algorithm to retrieve the top-k results, and propose several pruning techniques to optimize the social distance computation, whose cost dominates the query processing. Furthermore, we optimize the 3D index via a hierarchical partition method to enhance our pruning on the social dimension. Extensive experimental results on two real world large datasets demonstrate the efficiency and the robustness of our proposed solution.

very large data bases | 2014

Efficient query processing for XML keyword queries based on the IDList index

Junfeng Zhou; Zhifeng Bao; Wei Wang; Jinjia Zhao; Xiaofeng Meng

Keyword search over XML data has attracted a lot of research efforts in the last decade, where one of the fundamental research problems is how to efficiently answer a given keyword query w.r.t. a certain query semantics. We found that the key factor resulting in the inefficiency for existing methods is that they all heavily suffer from the common-ancestor-repetition problem. In this paper, we propose a novel form of inverted list, namely the IDList; the IDList for keyword

database and expert systems applications | 2007