Featured Researches

Information Retrieval

"And the Winner Is...": Dynamic Lotteries for Multi-group Fairness-Aware Recommendation

As recommender systems are being designed and deployed for an increasing number of socially-consequential applications, it has become important to consider what properties of fairness these systems exhibit. There has been considerable research on recommendation fairness. However, we argue that the previous literature has been based on simple, uniform and often uni-dimensional notions of fairness assumptions that do not recognize the real-world complexities of fairness-aware applications. In this paper, we explicitly represent the design decisions that enter into the trade-off between accuracy and fairness across multiply-defined and intersecting protected groups, supporting multiple fairness metrics. The framework also allows the recommender to adjust its performance based on the historical view of recommendations that have been delivered over a time horizon, dynamically rebalancing between fairness concerns. Within this framework, we formulate lottery-based mechanisms for choosing between fairness concerns, and demonstrate their performance in two recommendation domains.

Read more
Information Retrieval

A Comparison of Question Rewriting Methods for Conversational Passage Retrieval

Conversational passage retrieval relies on question rewriting to modify the original question so that it no longer depends on the conversation history. Several methods for question rewriting have recently been proposed, but they were compared under different retrieval pipelines. We bridge this gap by thoroughly evaluating those question rewriting methods on the TREC CAsT 2019 and 2020 datasets under the same retrieval pipeline. We analyze the effect of different types of question rewriting methods on retrieval performance and show that by combining question rewriting methods of different types we can achieve state-of-the-art performance on both datasets.

Read more
Information Retrieval

A Comparison of Supervised Learning to Match Methods for Product Search

The vocabulary gap is a core challenge in information retrieval (IR). In e-commerce applications like product search, the vocabulary gap is reported to be a bigger challenge than in more traditional application areas in IR, such as news search or web search. As recent learning to match methods have made important advances in bridging the vocabulary gap for these traditional IR areas, we investigate their potential in the context of product search. In this paper we provide insights into using recent learning to match methods for product search. We compare both effectiveness and efficiency of these methods in a product search setting and analyze their performance on two product search datasets, with 50,000 queries each. One is an open dataset made available as part of a community benchmark activity at CIKM 2016. The other is a proprietary query log obtained from a European e-commerce platform. This comparison is conducted towards a better understanding of trade-offs in choosing a preferred model for this task. We find that (1) models that have been specifically designed for short text matching, like MV-LSTM and DRMMTKS, are consistently among the top three methods in all experiments; however, taking efficiency and accuracy into account at the same time, ARC-I is the preferred model for real world use cases; and (2) the performance from a state-of-the-art BERT-based model is mediocre, which we attribute to the fact that the text BERT is pre-trained on is very different from the text we have in product search. We also provide insights into factors that can influence model behavior for different types of query, such as the length of retrieved list, and query complexity, and discuss the implications of our findings for e-commerce practitioners, with respect to choosing a well performing method.

Read more
Information Retrieval

A Comprehensive Pipeline for Hotel Recommendation System

This paper addresses a comprehensive pipeline to build a hotel recommendation system with the raw data collected by Apps in users' smartphones. The pipeline mainly consists of pre-processing of the raw data and training prediction models. We use two methods, Support Vector Machine (SVM) and Recurrent Neural Network (RNN). The results show that two methods achieved a reasonable accuracy with the pre-processing of the raw data. Therefore, we conclude that this paper provides a comprehensive pipeline, in which a hotel recommendation system was successfully built from the raw data to specific applications.

Read more
Information Retrieval

A Comprehensive Survey and Experimental Comparison of Graph-Based Approximate Nearest Neighbor Search

Approximate nearest neighbor search (ANNS) constitutes an important operation in a multitude of applications, including recommendation systems, information retrieval, and pattern recognition. In the past decade, graph-based ANNS algorithms have been the leading paradigm in this domain, with dozens of graph-based ANNS algorithms proposed. Such algorithms aim to provide effective, efficient solutions for retrieving the nearest neighbors for a given query. Nevertheless, these efforts focus on developing and optimizing algorithms with different approaches, so there is a real need for a comprehensive survey about the approaches' relative performance, strengths, and pitfalls. Thus here we provide a thorough comparative analysis and experimental evaluation of 13 representative graph-based ANNS algorithms via a new taxonomy and fine-grained pipeline. We compared each algorithm in a uniform test environment on eight real-world datasets and 12 synthetic datasets with varying sizes and characteristics. Our study yields novel discoveries, offerings several useful principles to improve algorithms, thus designing an optimized method that outperforms the state-of-the-art algorithms. This effort also helped us pinpoint algorithms' working portions, along with rule-of-thumb recommendations about promising research directions and suitable algorithms for practitioners in different fields.

Read more
Information Retrieval

A Concept Knowledge-Driven Keywords Retrieval Framework for Sponsored Search

In sponsored search, retrieving synonymous keywords for exact match type is important for accurately targeted advertising. Data-driven deep learning-based method has been proposed to tackle this problem. An apparent disadvantage of this method is its poor generalization performance on entity-level long-tail instances, even though they might share similar concept-level patterns with frequent instances. With the help of a large knowledge base, we find that most commercial synonymous query-keyword pairs can be abstracted into meaningful conceptual patterns through concept tagging. Based on this fact, we propose a novel knowledge-driven conceptual retrieval framework to mitigate this problem, which consists of three parts: data conceptualization, matching via conceptual patterns and concept-augmented discrimination. Both offline and online experiments show that our method is very effective. This framework has been successfully applied to Baidu's sponsored search system, which yields a significant improvement in revenue.

Read more
Information Retrieval

A Differentiable Ranking Metric Using Relaxed Sorting Operation for Top-K Recommender Systems

A recommender system generates personalized recommendations for a user by computing the preference score of items, sorting the items according to the score, and filtering top-K items with high scores. While sorting and ranking items are integral for this recommendation procedure, it is nontrivial to incorporate them in the process of end-to-end model training since sorting is nondifferentiable and hard to optimize with gradient descent. This incurs the inconsistency issue between existing learning objectives and ranking metrics of recommenders. In this work, we present DRM (differentiable ranking metric) that mitigates the inconsistency and improves recommendation performance by employing the differentiable relaxation of ranking metrics. Via experiments with several real-world datasets, we demonstrate that the joint learning of the DRM objective upon existing factor based recommenders significantly improves the quality of recommendations, in comparison with other state-of-the-art recommendation methods.

Read more
Information Retrieval

A Federated Multi-View Deep Learning Framework for Privacy-Preserving Recommendations

Privacy-preserving recommendations are recently gaining momentum, since the decentralized user data is increasingly harder to collect, by recommendation service providers, due to the serious concerns over user privacy and data security. This situation is further exacerbated by the strict government regulations such as Europe's General Data Privacy Regulations(GDPR). Federated Learning(FL) is a newly developed privacy-preserving machine learning paradigm to bridge data repositories without compromising data security and privacy. Thus many federated recommendation(FedRec) algorithms have been proposed to realize personalized privacy-preserving recommendations. However, existing FedRec algorithms, mostly extended from traditional collaborative filtering(CF) method, cannot address cold-start problem well. In addition, their performance overhead w.r.t. model accuracy, trained in a federated setting, is often non-negligible comparing to centralized recommendations. This paper studies this issue and presents FL-MV-DSSM, a generic content-based federated multi-view recommendation framework that not only addresses the cold-start problem, but also significantly boosts the recommendation performance by learning a federated model from multiple data source for capturing richer user-level features. The new federated multi-view setting, proposed by FL-MV-DSSM, opens new usage models and brings in new security challenges to FL in recommendation scenarios. We prove the security guarantees of \xxx, and empirical evaluations on FL-MV-DSSM and its variations with public datasets demonstrate its effectiveness. Our codes will be released if this paper is accepted.

Read more
Information Retrieval

A Framework for Capturing and Analyzing Unstructured and Semi-structured Data for a Knowledge Management System

Mainstream knowledge management researchers generally agree that knowledge extracted from unstructured data and semi-structured data have become imperative for organizational strategic decision making. In this research, we develop a framework that captures and analyses unstructured data using machine learning techniques and integrates knowledge and insight gained from the data into traditional knowledge management systems. Unlike most frameworks published in the literature that focuses on a specific type of unstructured data, our frameworks cut across the varieties of unstructured data ranging from textual data from social network sites, online forums, discussion boards, reviews to audio data, image data and video data. We highlight some pre-processing and processing techniques for these data and also highlight some standard output. We evaluate the framework by developing a textual data application programming interface (API) using python and beautiful soup and we perform sentiment analysis on the students review data collected through the API.

Read more
Information Retrieval

A Graph-based Relevance Matching Model for Ad-hoc Retrieval

To retrieve more relevant, appropriate and useful documents given a query, finding clues about that query through the text is crucial. Recent deep learning models regard the task as a term-level matching problem, which seeks exact or similar query patterns in the document. However, we argue that they are inherently based on local interactions and do not generalise to ubiquitous, non-consecutive contextual relationships. In this work, we propose a novel relevance matching model based on graph neural networks to leverage the document-level word relationships for ad-hoc retrieval. In addition to the local interactions, we explicitly incorporate all contexts of a term through the graph-of-word text format. Matching patterns can be revealed accordingly to provide a more accurate relevance score. Our approach significantly outperforms strong baselines on two ad-hoc benchmarks. We also experimentally compare our model with BERT and show our advantages on long documents.

Read more

Ready to get started?

Join us today