Featured Researches

Information Retrieval

Analysing the Effect of Clarifying Questions on Document Ranking in Conversational Search

Recent research on conversational search highlights the importance of mixed-initiative in conversations. To enable mixed-initiative, the system should be able to ask clarifying questions to the user. However, the ability of the underlying ranking models (which support conversational search) to account for these clarifying questions and answers has not been analysed when ranking documents, at large. To this end, we analyse the performance of a lexical ranking model on a conversational search dataset with clarifying questions. We investigate, both quantitatively and qualitatively, how different aspects of clarifying questions and user answers affect the quality of ranking. We argue that there needs to be some fine-grained treatment of the entire conversational round of clarification, based on the explicit feedback which is present in such mixed-initiative settings. Informed by our findings, we introduce a simple heuristic-based lexical baseline, that significantly outperforms the existing naive baselines. Our work aims to enhance our understanding of the challenges present in this particular task and inform the design of more appropriate conversational ranking models.

Read more
Information Retrieval

Analysis of Multivariate Scoring Functions for Automatic Unbiased Learning to Rank

Leveraging biased click data for optimizing learning to rank systems has been a popular approach in information retrieval. Because click data is often noisy and biased, a variety of methods have been proposed to construct unbiased learning to rank (ULTR) algorithms for the learning of unbiased ranking models. Among them, automatic unbiased learning to rank (AutoULTR) algorithms that jointly learn user bias models (i.e., propensity models) with unbiased rankers have received a lot of attention due to their superior performance and low deployment cost in practice. Despite their differences in theories and algorithm design, existing studies on ULTR usually use uni-variate ranking functions to score each document or result independently. On the other hand, recent advances in context-aware learning-to-rank models have shown that multivariate scoring functions, which read multiple documents together and predict their ranking scores jointly, are more powerful than uni-variate ranking functions in ranking tasks with human-annotated relevance labels. Whether such superior performance would hold in ULTR with noisy data, however, is mostly unknown. In this paper, we investigate existing multivariate scoring functions and AutoULTR algorithms in theory and prove that permutation invariance is a crucial factor that determines whether a context-aware learning-to-rank model could be applied to existing AutoULTR framework. Our experiments with synthetic clicks on two large-scale benchmark datasets show that AutoULTR models with permutation-invariant multivariate scoring functions significantly outperform those with uni-variate scoring functions and permutation-variant multivariate scoring functions.

Read more
Information Retrieval

Annotation of epidemiological information in animal disease-related news articles: guidelines

This paper describes a method for annotation of epidemiological information in animal disease-related news articles. The annotation guidelines are generic and aim to embrace all animal or zoonotic infectious diseases, regardless of the pathogen involved or its way of transmission (e.g. vector-borne, airborne, by contact). The framework relies on the successive annotation of all the sentences from a news article. The annotator evaluates the sentences in a specific epidemiological context, corresponding to the publication of the news article.

Read more
Information Retrieval

Answer Ranking for Product-Related Questions via Multiple Semantic Relations Modeling

Many E-commerce sites now offer product-specific question answering platforms for users to communicate with each other by posting and answering questions during online shopping. However, the multiple answers provided by ordinary users usually vary diversely in their qualities and thus need to be appropriately ranked for each question to improve user satisfaction. It can be observed that product reviews usually provide useful information for a given question, and thus can assist the ranking process. In this paper, we investigate the answer ranking problem for product-related questions, with the relevant reviews treated as auxiliary information that can be exploited for facilitating the ranking. We propose an answer ranking model named MUSE which carefully models multiple semantic relations among the question, answers, and relevant reviews. Specifically, MUSE constructs a multi-semantic relation graph with the question, each answer, and each review snippet as nodes. Then a customized graph convolutional neural network is designed for explicitly modeling the semantic relevance between the question and answers, the content consistency among answers, and the textual entailment between answers and reviews. Extensive experiments on real-world E-commerce datasets across three product categories show that our proposed model achieves superior performance on the concerned answer ranking task.

Read more
Information Retrieval

Application of Deep Learning in Recognizing Bates Numbers and Confidentiality Stamping from Images

In eDiscovery, it is critical to ensure that each page produced in legal proceedings conforms with the requirements of court or government agency production requests. Errors in productions could have severe consequences in a case, putting a party in an adverse position. The volume of pages produced continues to increase, and tremendous time and effort has been taken to ensure quality control of document productions. This has historically been a manual and laborious process. This paper demonstrates a novel automated production quality control application which leverages deep learning-based image recognition technology to extract Bates Number and Confidentiality Stamping from legal case production images and validate their correctness. Effectiveness of the method is verified with an experiment using a real-world production data.

Read more
Information Retrieval

Application of Knowledge Graphs to Provide Side Information for Improved Recommendation Accuracy

Personalized recommendations are popular in these days of Internet driven activities, specifically shopping. Recommendation methods can be grouped into three major categories, content based filtering, collaborative filtering and machine learning enhanced. Information about products and preferences of different users are primarily used to infer preferences for a specific user. Inadequate information can obviously cause these methods to fail or perform poorly. The more information we provide to these methods, the more likely it is that the methods perform better. Knowledge graphs represent the current trend in recording information in the form of relations between entities, and can provide additional (side) information about products and users. Such information can be used to improve nearest neighbour search, clustering users and products, or train the neural network, when one is used. In this work, we present a new generic recommendation systems framework, that integrates knowledge graphs into the recommendation pipeline. We describe its software design and implementation, and then show through experiments, how such a framework can be specialized for a domain, say movie recommendations, and the improvements in recommendation results possible due to side information obtained from knowledge graphs representation of such information. Our framework supports different knowledge graph representation formats, and facilitates format conversion, merging and information extraction needed for training recommendation methods.

Read more
Information Retrieval

Applying the Affective Aware Pseudo Association Method to Enhance the Top-N Recommendations Distribution to Users in Group Emotion Recommender Systems

Recommender Systems are a subclass of information retrieval systems, or more succinctly, a class of information filtering systems that seeks to predict how close is the match of the user's preference to a recommended item. A common approach for making recommendations for a user group is to extend Personalized Recommender Systems' capability. This approach gives the impression that group recommendations are retrofits of the Personalized Recommender Systems. Moreover, such an approach not taken the dynamics of group emotion and individual emotion into the consideration in making top_N recommendations. Recommending items to a group of two or more users has certainly raised unique challenges in group behaviors that influence group decision-making that researchers only partially understand. This study applies the Affective Aware Pseudo Association Method in studying group formation and dynamics in group decision-making. The method shows its adaptability to group's moods change when making recommendations.

Read more
Information Retrieval

Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval

Conducting text retrieval in a dense learned representation space has many intriguing advantages over sparse retrieval. Yet the effectiveness of dense retrieval (DR) often requires combination with sparse retrieval. In this paper, we identify that the main bottleneck is in the training mechanisms, where the negative instances used in training are not representative of the irrelevant documents in testing. This paper presents Approximate nearest neighbor Negative Contrastive Estimation (ANCE), a training mechanism that constructs negatives from an Approximate Nearest Neighbor (ANN) index of the corpus, which is parallelly updated with the learning process to select more realistic negative training instances. This fundamentally resolves the discrepancy between the data distribution used in the training and testing of DR. In our experiments, ANCE boosts the BERT-Siamese DR model to outperform all competitive dense and sparse retrieval baselines. It nearly matches the accuracy of sparse-retrieval-and-BERT-reranking using dot-product in the ANCE-learned representation space and provides almost 100x speed-up.

Read more
Information Retrieval

ArXivDigest: A Living Lab for Personalized Scientific Literature Recommendation

Providing personalized recommendations that are also accompanied by explanations as to why an item is recommended is a research area of growing importance. At the same time, progress is limited by the availability of open evaluation resources. In this work, we address the task of scientific literature recommendation. We present arXivDigest, which is an online service providing personalized arXiv recommendations to end users and operates as a living lab for researchers wishing to work on explainable scientific literature recommendations.

Read more
Information Retrieval

Are Top School Students More Critical of Their Professors? Mining Comments on RateMyProfessor.com

Student reviews and comments on this http URL reflect realistic learning experiences of students. Such information provides a large-scale data source to examine the teaching quality of the lecturers. In this paper, we propose an in-depth analysis of these comments. First, we partition our data into different comparison groups. Next, we perform exploratory data analysis to delve into the data. Furthermore, we employ Latent Dirichlet Allocation and sentiment analysis to extract topics and understand the sentiments associated with the comments. We uncover interesting insights about the characteristics of both college students and professors. Our study proves that student reviews and comments contain crucial information and can serve as essential references for enrollment in courses and universities.

Read more

Ready to get started?

Join us today