Jingfei Li
Tianjin University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jingfei Li.
international acm sigir conference on research and development in information retrieval | 2014
Thanh Vu; Dawei Song; Alistair Willis; Son N. Tran; Jingfei Li
Recent research has shown that the performance of search engines can be improved by enriching a users personal profile with information about other users with shared interests. In the existing approaches, groups of similar users are often statically determined, e.g., based on the common documents that users clicked. However, these static grouping methods are query-independent and neglect the fact that users in a group may have different interests with respect to different topics. In this paper, we argue that common interest groups should be dynamically constructed in response to the users input query. We propose a personalisation framework in which a user profile is enriched using information from other users dynamically grouped with respect to an input query. The experimental results on query logs from a major commercial web search engine demonstrate that our framework improves the performance of the web search engine and also achieves better performance than the static grouping method.
Entropy | 2016
Benyou Wang; Peng Zhang; Jingfei Li; Dawei Song; Yuexian Hou; Zhenguo Shang
Quantum theory has been applied in a number of fields outside physics, e.g., cognitive science and information retrieval (IR). Recently, it has been shown that quantum theory can subsume various key IR models into a single mathematical formalism of Hilbert vector spaces. While a series of quantum-inspired IR models has been proposed, limited effort has been devoted to verify the existence of the quantum-like phenomenon in real users’ information retrieval processes, from a real user study perspective. In this paper, we aim to explore and model the quantum interference in users’ relevance judgement about documents, caused by the presentation order of documents. A user study in the context of IR tasks have been carried out. The existence of the quantum interference is tested by the violation of the law of total probability and the validity of the order effect. Our main findings are: (1) there is an apparent judging discrepancy across different users and document presentation orders, and empirical data have violated the law of total probability; (2) most search trials recorded in the user study show the existence of the order effect, and the incompatible decision perspectives in the quantum question (QQ) model are valid in some trials. We further explain the judgement discrepancy in more depth, in terms of four effects (comparison, unfamiliarity, attraction and repulsion) and also analyse the dynamics of document relevance judgement in terms of the evolution of the information need subspace.
international acm sigir conference on research and development in information retrieval | 2015
Qiuchi Li; Jingfei Li; Peng Zhang; Dawei Song
The quantum probabilistic framework has recently been applied to Information Retrieval (IR). A representative is the Quantum Language Model (QLM), which is developed for the ad-hoc retrieval with single queries and has achieved significant improvements over traditional language models. In QLM, a density matrix, defined on the quantum probabilistic space, is estimated as a representation of users search intention with respect to a specific query. However, QLM is unable to capture the dynamics of users information need in query history. This limitation restricts its further application on the dynamic search tasks, e.g., session search. In this paper, we propose a Session-based Quantum Language Model (SQLM) that deals with multi-query session search task. In SQLM, a transformation model of density matrices is proposed to model the evolution of users information need in response to the users interaction with search engine, by incorporating features extracted from both positive feedback (clicked documents) and negative feedback (skipped documents). Extensive experiments conducted on TREC 2013 and 2014 session track data demonstrate the effectiveness of SQLM in comparison with the classic QLM.
asia information retrieval symposium | 2013
Dazhao Pan; Peng Zhang; Jingfei Li; Dawei Song; Ji-Rong Wen; Yuexian Hou; Bin Hu; Yuan Jia; Anne N. De Roeck
Query expansion is generally a useful technique in improving search performance. However, some expanded query terms obtained by traditional statistical methods (e.g., pseudo-relevance feedback) may not be relevant to the user’s information need, while some relevant terms may not be contained in the feedback documents at all. Recent studies utilize external resources to detect terms that are related to the query, and then adopt these terms in query expansion. In this paper, we present a study in the use of Freebase [6], which is an open source general-purpose ontology, as a source for deriving expansion terms. FreeBase provides a graph-based model of human knowledge, from which a rich and multi-step structure of instances related to the query concept can be extracted, as a complement to the traditional statistical approaches to query expansion. We propose a novel method, based on the well-principled Dempster-Shafer’s (D-S) evidence theory, to measure the certainty of expansion terms from the Freebase structure. The expanded query model is then combined with a state of the art statistical query expansion model – the Relevance Model (RM3). Experiments show that the proposed method achieves significant improvements over RM3.
Entropy | 2016
Peng Zhang; Jingfei Li; Benyou Wang; Xiaozhao Zhao; Dawei Song; Yuexian Hou; Massimo Melucci
Recently, Quantum Theory (QT) has been employed to advance the theory of Information Retrieval (IR). Various analogies between QT and IR have been established. Among them, a typical one is applying the idea of photon polarization in IR tasks, e.g., for document ranking and query expansion. In this paper, we aim to further extend this work by constructing a new superposed state of each document in the information need space, based on which we can incorporate the quantum interference idea in query expansion. We then apply the new quantum query expansion model to session search, which is a typical Web search task. Empirical evaluation on the large-scale Clueweb12 dataset has shown that the proposed model is effective in the session search tasks, demonstrating the potential of developing novel and effective IR models based on intuitions and formalisms of QT.
asia information retrieval symposium | 2014
Jingfei Li; Dawei Song; Peng Zhang; Ji-Rong Wen; Zhicheng Dou
Personalized search has recently attracted increasing attention. This paper focuses on utilizing click-through data to personalize the web search results, from a novel perspective based on subspace projection. Specifically, we represent a user profile as a vector subspace spanned by a basis generated from a word-correlation matrix, which is able to capture the dependencies between words in the “satisfied click” (SAT Click) documents. A personalized score for each document in the original result list returned by a search engine is computed by projecting the document (represented as a vector or another word-correlation subspace) onto the user profile subspace. The personalized scores are then used to re-rank the documents through the Borda’ ranking fusion method. Empirical evaluation is carried out on a real user log data set collected from a prominent search engine (Bing). Experimental results demonstrate the effectiveness of our methods, especially for the queries with high click entropy.
NLPCC/ICCPOL | 2016
Benyou Wang; Jiabin Niu; Liqun Ma; Yuhua Zhang; Lipeng Zhang; Jingfei Li; Peng Zhang; Dawei Song
Document-based Question Answering system, which needs to match semantically the short text pairs, has gradually become an important topic in the fields of natural language processing and information retrieval. Question Answering system based on English corpus has developed rapidly with the utilization of the deep learning technology, whereas an effective Chinese-customized system needs to be paid more attention. Thus, we explore a Question Answering system which is characterized in Chinese for the QA task of NLPCC. In our approach, the ordered sequential information of text and deep matching of semantics of Chinese textual pairs have been captured by our count-based traditional methods and embedding-based neural network. The ensemble strategy has achieved a good performance which is much stronger than the provided baselines.
Information Sciences | 2017
Jingfei Li; Yue Wu; Peng Zhang; Dawei Song; Benyou Wang
Search diversification (also called diversity search), is an important approach to tackling the query ambiguity problem in information retrieval. It aims to diversify the search results that are originally ranked according to their probabilities of relevance to a given query, by re-ranking them to cover as many as possible different aspects (or subtopics) of the query. Most existing diversity search models heuristically balance the relevance ranking and the diversity ranking, yet lacking an efficient learning mechanism to reach an optimized parameter setting. To address this problem, we propose a learning-to-diversify approach which can directly optimize the search diversification performance (in term of any effectiveness metric). We first extend the ranking function of a widely used learning-to-rank framework, i.e., LambdaMART, so that the extended ranking function can correlate relevance and diversity indicators. Furthermore, we develop an effective learning algorithm, namely Document Repulsion Model (DRM), to train the ranking function based on a Document Repulsion Theory (DRT). DRT assumes that two result documents covering similar query aspects (i.e., subtopics) should be mutually repulsive, for the purpose of search diversification. Accordingly, the proposed DRM exerts a repulsion force between each pair of similar documents in the learning process, and includes the diversity effectiveness metric to be optimized as part of the loss function. Although there have been existing learning based diversity search methods, they often involve an iterative sequential selection process in the ranking process, which is computationally complex and time consuming for training, while our proposed learning strategy can largely reduce the time cost. Extensive experiments are conducted on the TREC diversity track data (2009, 2010 and 2011). The results demonstrate that our model significantly outperforms a number of baselines in terms of effectiveness and robustness. Further, an efficiency analysis shows that the proposed DRM has a lower computational complexity than the state of the art learning-to-diversify methods.
ACM Transactions on Intelligent Systems and Technology | 2017
Peng Zhang; Qian Yu; Yuexian Hou; Dawei Song; Jingfei Li; Bin Hu
In many research and application areas, such as information retrieval and machine learning, we often encounter dealing with a probability distribution that is mixed by one distribution that is relevant to our task in hand and the other that is irrelevant and that we want to get rid of. Thus, it is an essential problem to separate the irrelevant distribution from the mixture distribution. This article is focused on the application in Information Retrieval, where relevance feedback is a widely used technique to build a refined query model based on a set of feedback documents. However, in practice, the relevance feedback set, even provided by users explicitly or implicitly, is often a mixture of relevant and irrelevant documents. Consequently, the resultant query model (typically a term distribution) is often a mixture rather than a true relevance term distribution, leading to a negative impact on the retrieval performance. To tackle this problem, we recently proposed a Distribution Separation Method (DSM), which aims to approximate the true relevance distribution by separating a seed irrelevance distribution from the mixture one. While it achieved a promising performance in an empirical evaluation with simulated explicit irrelevance feedback data, it has not been deployed in the scenario where one should automatically obtain the irrelevance feedback data. In this article, we propose a substantial extension of the basic DSM from two perspectives: developing a further regularization framework and deploying DSM in the automatic irrelevance feedback scenario. Specifically, in order to avoid the output distribution of DSM drifting away from the true relevance distribution when the quality of seed irrelevant distribution (as the input to DSM) is not guaranteed, we propose a DSM regularization framework to constrain the estimation for the relevance distribution. This regularization framework includes three algorithms, each corresponding to a regularization strategy incorporated in the objective function of DSM. In addition, we exploit DSM in automatic (i.e., pseudo) irrelevance feedback, by automatically detecting the seed irrelevant documents via three different document reranking methods. We have carried out extensive experiments based on various TREC datasets, in order to systematically evaluate the proposed methods. The experimental results demonstrate the effectiveness of our proposed approaches in comparison with various strong baselines.
asia information retrieval symposium | 2016
Yue Wu; Jingfei Li; Peng Zhang; Dawei Song
Search diversification plays an important role in modern search engine, especially when user-issued queries are ambiguous and the top ranked results are redundant. Some diversity search approaches have been proposed for reducing the information redundancy of the retrieved results, while do not consider the topic coverage maximization. To solve this problem, the Affinity ranking model has been developed aiming at maximizing the topic coverage meanwhile reducing the information redundancy. However, the original model does not involve a learning algorithm for parameter tuning, thus limits the performance optimization. In order to further improve the diversity performance of Affinity ranking model, inspired by its ranking principle, we propose a learning approach based on the learning-to-rank framework. Our learning model not only considers the topic coverage maximization and redundancy reduction by formalizing a series of features, but also optimizes the diversity metric by extending a well-known learning-to-rank algorithm LambdaMART. Comparative experiments have been conducted on TREC diversity tracks, which show the effectiveness of our model.