Kai Hui
Max Planck Society
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kai Hui.
european conference on information retrieval | 2013
Kai Hui; Bin Gao; Ben He; Tiejian Luo
In sponsored search, the ad selection algorithm is used to pick out the best candidate ads for ranking, the bid keywords of which are best matched to the user queries. Existing ad selection methods mainly focus on the relevance between user query and selected ads, and consequently the monetization ability of the results is not necessarily maximized. To this end, instead of making selection based on keywords as a whole, our work takes advantages of the different impacts, as revealed in our data study, of different components inside the keywords on both relevance and monetization ability. In particular, we select keyword components and then maximize the relevance and revenue on the component level. Finally, we combine the selected components to generate the bid keywords. The experiments reveal that our method can significantly outperform two baseline algorithms on the metrics including recall, precision and the monetization ability.
international conference on the theory of information retrieval | 2011
Kai Hui; Ben He; Tiejian Luo; Bin Wang
This paper presents an initial investigation in the relative effectiveness of different popular pseudo relevance feedback (PRF) methods. The retrieval performance of relevance model, and two KL-divergence-based divergence from randomness (DFR) feedback methods generalized from Rocchios algorithm, are compared by extensive experiments on standard TREC test collections. Results show that a KL-divergence based DFR method (denoted as KL1), combined with the classical Rocchios algorithm, has the best retrieval effectiveness out of the three methods studied in this paper.
web search and data mining | 2018
Kai Hui; Andrew Yates; Klaus Berberich; Gerard de Melo
Neural IR models, such as DRMM and PACRR, have achieved strong results by successfully capturing relevance matching signals. We argue that the context of these matching signals is also important. Intuitively, when extracting, modeling, and combining matching signals, one would like to consider the surrounding text(local context) as well as other signals from the same document that can contribute to the overall relevance score. In this work, we highlight three potential shortcomings caused by not considering context information and propose three neural ingredients to address them: a disambiguation component, cascade k-max pooling, and a shuffling combination layer. Incorporating these components into the PACRR model yields Co-PACER, a novel context-aware neural IR model. Extensive comparisons with established models on TREC Web Track data confirm that the proposed model can achieve superior search results. In addition, an ablation analysis is conducted to gain insights into the impact of and interactions between different components. We release our code to enable future comparisons.
string processing and information retrieval | 2015
Kai Hui; Klaus Berberich
Information retrieval evaluation heavily relies on human effort to assess the relevance of result documents. Recent years have seen efforts and good progress to reduce the human effort and thus lower the cost of evaluation. Selective labeling strategies carefully choose a subset of result documents to label, for instance, based on their aggregate rank in results; strategies to mitigate incomplete labels seek to make up for missing labels, for instance, predicting them using machine learning methods. How different strategies interact, though, is unknown. In this work, we study the interaction of several state-of-the-art strategies for selective labeling and incomplete label mitigation on four years of TREC Web Track data 2011---2014. Moreover, we propose and evaluate MaxRep as a novel selective labeling strategy, which has been designed so as to select effective training data for missing label prediction.
international world wide web conferences | 2017
Kai Hui; Andrew Yates; Klaus Berberich; Gerard de Melo
To deploy deep learning models for ad-hoc information retrieval, suitable representations of query-document pairs are needed. Such representations ought to capture all relevant information required to assess the relevance of a document for a given query, including uni-gram term overlap as well as positional information such as proximity and term dependencies. In this work, we investigate the use of similarity matrices that are able to encode such position-specific information. Extensive experiments on TREC Web Track data confirm that such representations can yield good results.
international conference on pervasive computing | 2013
Yu Huang; Tiejian Luo; Xiang Wang; Kai Hui; Wenjie Wang; Ben He
Query performance prediction (QPP) is to estimate the query difficulty without knowing the relevance assessment information. The quality of predictor is evaluated by the correlation coefficient between the predicted values and actual Average Precision (AP). The Pearson correlation coefficient, Spearman’s Rho and Kendall’ Tau are the most popular measurements of calculating the correlation coefficient between predicted values and AP. Previous works showed that these methods are not enough equitable and appropriate for evaluating the quality of predictor. In this paper, we add two novel methods, Maximal Information Coefficient (MIC) and Brownian Distance Correlation (Dcor), in evaluating the quality of predictor and compare them with three traditional measurements to observe the differences. We conduct a series of experiments on several standard TREC datasets and analyze the results. The experimental results reveal that MIC and Dcor provide different conclusions in some cases, which offer useful supplements in evaluating the quality of predictor. Furthermore, the sensitivity of diverse methods towards the change of predictors’ parameters is distinct in our experiments, and we make some analysis to these differences.
european conference on information retrieval | 2017
Kai Hui; Klaus Berberich
Preference judgments have been demonstrated as a better alternative to graded judgments to assess the relevance of documents relative to queries. Existing work has verified transitivity among preference judgments when collected from trained judges, which reduced the number of judgments dramatically. Moreover, strict preference judgments and weak preference judgments, where the latter additionally allow judges to state that two documents are equally relevant for a given query, are both widely used in literature. However, whether transitivity still holds when collected from crowdsourcing, i.e., whether the two kinds of preference judgments behave similarly remains unclear. In this work, we collect judgments from multiple judges using a crowdsourcing platform and aggregate them to compare the two kinds of preference judgments in terms of transitivity, time consumption, and quality. That is, we look into whether aggregated judgments are transitive, how long it takes judges to make them, and whether judges agree with each other and with judgments from Trec. Our key findings are that only strict preference judgments are transitive. Meanwhile, weak preference judgments behave differently in terms of transitivity, time consumption, as well as of the quality of judgment.
european conference on information retrieval | 2017
Kai Hui; Klaus Berberich
Preference judgment, as an alternative to graded judgment, leads to more accurate labels and avoids the need to define relevance levels. However, it also requires a larger number of judgments. Prior research has successfully reduced that number to \(\mathcal {O}(N_d\,\log {N_d})\) for \(N_d\) documents by assuming transitivity, which is still too expensive in practice. In this work, by analytically deriving the number of judgments and by empirically simulating the ground-truth ranking of documents from Trec Web Track, we demonstrate that the number of judgments can be dramatically reduced when allowing for ties.
international conference on the theory of information retrieval | 2017
Kai Hui; Klaus Berberich; Ida Mele
Cascade measures like alpha-nDCG, ERR-IA, and NRBP take into account novelty and diversity of query results and are computed using judgments provided by humans, which are costly to collect. These measures expect that all documents in the result list of a query are judged and cannot make use of judgments beyond the assigned labels. Existing work has demonstrated that condensing the query results by taking out documents without judgment can address this problem to some extent. However, how highly incomplete judgments can affect cascade measures and how to cope with such incompleteness have not been addressed yet. In this paper, we propose an approach which mitigates incomplete judgments by leveraging the content of documents relevant to the querys subtopics. These language models are estimated at each rank taking into account the document and the upper ranked ones. Then, our method determines gain values based on the Kullback-Leibler divergence between the language models. Experiments on the diversity tasks of the TREC Web Track 2009--2012 show that with only 15% of the judgments our method accurately reconstructs the original rankings determined by the established cascade measures.
Archive | 2017
Kai Hui
An information retrieval (IR) system assists people in consuming huge amount of data, where the evaluation and the construction of such systems are important. However, there exist two difficulties: the overwhelmingly large number of query-document pairs to judge, making IR evaluation a manually laborious task; and the complicated patterns to model due to the non-symmetric, heterogeneous relationships between a query-document pair, where different interaction patterns such as term dependency and proximity have been demonstrated to be useful, yet are non-trivial for a single IR model to encode. In this thesis we attempt to address both difficulties from the perspectives of IR evaluation and of the retrieval model respectively, by reducing the manual cost with automatic methods, by investigating the usage of crowdsourcing in collecting preference judgments, and by proposing novel neural retrieval models. In particular, to address the large number of query-document pairs in IR evaluation, a low-cost selective labeling method is proposed to pick out a small subset of representative documents for manual judgments in favor of the follow-up prediction for the remaining query-document pairs; furthermore, a language-model based cascade measure framework is developed to evaluate the novelty and diversity, utilizing the content of the labeled documents to mitigate incomplete labels. In addition, we also attempt to make the preference judgments practically usable by empirically investigating different properties of the judgments when collected via crowdsourcing; and by proposing a novel judgment mechanism, making a compromise between the judgment quality and the number of judgments. Finally, to model different complicated patterns in a single retrieval model, inspired by the recent advances in deep learning, we develop novel neural IR models to incorporate different patterns like term dependency, query proximity, density of relevance, and query coverage in a single model. We demonstrate their superior performances through evaluations on different datasets.