Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where David Carmel is active.

Publication


Featured researches published by David Carmel.


conference on information and knowledge management | 2014

Improving Term Weighting for Community Question Answering Search Using Syntactic Analysis

David Carmel; Avihai Mejer; Yuval Pinter; Idan Szpektor

Query term weighting is a fundamental task in information retrieval and most popular term weighting schemes are primarily based on statistical analysis of term occurrences within the document collection. In this work we study how term weighting may benefit from syntactic analysis of the corpus. Focusing on community question answering (CQA) sites, we take into account the syntactic function of the terms within CQA texts as an important factor affecting their relative importance for retrieval. We analyze a large log of web queries that landed on Yahoo Answers site, showing a strong deviation between the tendencies of different document words to appear in a landing (click-through) query given their syntactic function. To this end, we propose a novel term weighting method that makes use of the syntactic information available for each query term occurrence in the document, on top of term occurrence statistics. The relative importance of each feature is learned via a learning to rank algorithm that utilizes a click-through query log. We examine the new weighting scheme using manual evaluation based on editorial data and using automatic evaluation over the query log. Our experimental results show consistent improvement in retrieval when syntactic information is taken into account.


international acm sigir conference on research and development in information retrieval | 2016

Document Retrieval Using Entity-Based Language Models

Hadas Raviv; Oren Kurland; David Carmel

We address the ad hoc document retrieval task by devising novel types of entity-based language models. The models utilize information about single terms in the query and documents as well as term sequences marked as entities by some entity-linking tool. The key principle of the language models is accounting, simultaneously, for the uncertainty inherent in the entity-markup process and the balance between using entity-based and term-based information. Empirical evaluation demonstrates the merits of using the language models for retrieval. For example, the performance transcends that of a state-of-the-art term proximity method. We also show that the language models can be effectively used for cluster-based document retrieval and query expansion.


conference on information and knowledge management | 2015

Rank by Time or by Relevance?: Revisiting Email Search

David Carmel; Guy Halawi; Liane Lewin-Eytan; Yoelle Maarek; Ariel Raviv

With Web mail services offering larger and larger storage capacity, most users do not feel the need to systematically delete messages anymore and inboxes keep growing. It is quite surprising that in spite of the huge progress of relevance ranking in Web Search, mail search results are still typically ranked by date. This can probably be explained by the fact that users demand perfect recall in order to re-find a previously seen message, and would not trust relevance ranking. Yet mail search is still considered a difficult and frustrating task, especially when trying to locate older messages. In this paper, we study the current search traffic of Yahoo mail, a major Web commercial mail service, and discuss the limitations of ranking search results by date. We argue that this sort-by-date paradigm needs to be revisited in order to account for the specific structure and nature of mail messages, as well as the high-recall needs of users. We describe a two-phase ranking approach, in which the first phase is geared towards maximizing recall and the second phase follows a learning-to-rank approach that considers a rich set of mail-specific features to maintain precision. We present our results obtained on real mail search query traffic, for three different datasets, via manual as well as automatic evaluation. We demonstrate that the default time-driven ranking can be significantly improved in terms of both recall and precision, by taking into consideration time recency and textual similarity to the query, as well as mail-specific signals such as users actions.


international world wide web conferences | 2016

Identifying Web Queries with Question Intent

Gilad Tsur; Yuval Pinter; Idan Szpektor; David Carmel

Vertical selection is the task of predicting relevant verticals for a Web query so as to enrich the Web search results with complementary vertical results. We investigate a novel variant of this task, where the goal is to detect queries with a question intent. Specifically, we address queries for which the user would like an answer with a human touch. We call these CQA-intent queries, since answers to them are typically found in community question answering (CQA) sites. A typical approach in vertical selection is using a verticals specific language model of relevant queries and computing the query-likelihood for each vertical as a selective criterion. This works quite well for many domains like Shopping, Local and Travel. Yet, we claim that queries with CQA intent are harder to distinguish by modeling content alone, since they cover many different topics. We propose to also take the structure of queries into consideration, reasoning that queries with question intent have quite a different structure than other queries. We present a supervised classification scheme, random forest over word-clusters for variable length texts, which can model the query structure. Our experiments show that it substantially improves classification performance in the CQA-intent selection task compared to content-oriented based classification, especially as query length grows.


international acm sigir conference on research and development in information retrieval | 2016

Novelty based Ranking of Human Answers for Community Questions

Adi Omari; David Carmel; Oleg Rokhlenko; Idan Szpektor

Questions and their corresponding answers within a community based question answering (CQA) site are frequently presented as top search results forWeb search queries and viewed by millions of searchers daily. The number of answers for CQA questions ranges from a handful to dozens, and a searcher would be typically interested in the different suggestions presented in various answers for a question. Yet, especially when many answers are provided, the viewer may not want to sift through all answers but to read only the top ones. Prior work on answer ranking in CQA considered the qualitative notion of each answer separately, mainly whether it should be marked as best answer. We propose to promote CQA answers not only by their relevance to the question but also by the diversification and novelty qualities they hold compared to other answers. Specifically, we aim at ranking answers by the amount of new aspects they introduce with respect to higher ranked answers (novelty), on top of their relevance estimation. This approach is common in Web search and information retrieval, yet it was not addressed within the CQA settings before, which is quite different from classic document retrieval. We propose a novel answer ranking algorithm that borrows ideas from aspect ranking and multi-document summarization, but adapts them to our scenario. Answers are ranked in a greedy manner, taking into account their relevance to the question as well as their novelty compared to higher ranked answers and their coverage of important aspects. An experiment over a collection of Health questions, using a manually annotated gold-standard dataset, shows that considering novelty for answer ranking improves the quality of the ranked answer list.


ACM Transactions on Information Systems | 2016

Query Performance Prediction Using Reference Lists

Anna Shtok; Oren Kurland; David Carmel

The task of query performance prediction is to estimate the effectiveness of search performed in response to a query when no relevance judgments are available. We present a novel probabilistic analysis of the performance prediction task. The analysis gives rise to a general prediction framework that uses pseudo-effective or ineffective document lists that are retrieved in response to the query. These lists serve as reference to the result list at hand, the effectiveness of which we want to predict. We show that many previously proposed prediction methods can be explained using our framework. More generally, we shed new light on existing prediction methods and establish formal common grounds to seemingly different prediction approaches. In addition, we formally demonstrate the connection between prediction using reference lists and fusion of retrieved lists, and provide empirical support to this connection. Through an extensive empirical exploration, we study various factors that affect the quality of prediction using reference lists.


international acm sigir conference on research and development in information retrieval | 2013

The cluster hypothesis for entity oriented search

Hadas Raviv; Oren Kurland; David Carmel

In this work we study the cluster hypothesis for entity oriented search (EOS). Specifically, we show that the hypothesis can hold to a substantial extent for several entity similarity measures. We also demonstrate the retrieval effectiveness merits of using clusters of similar entities for EOS.


international world wide web conferences | 2017

The Demographics of Mail Search and their Application to Query Suggestion

David Carmel; Liane Lewin-Eytan; Alex Libov; Yoelle Maarek; Ariel Raviv

Web mail search is an emerging topic, which has not been the object of as many studies as traditional Web search. In particular, little is known about the characteristics of mail searchers and of the queries they issue. We study here the characteristics of Web mail searchers, and explore how demographic signals such as location, age, gender, and inferred income, influence their search behavior. We try to understand for instance, whether women exhibit different mail search patterns than men, or whether senior people formulate more precise queries than younger people. We compare our results, obtained from the analysis of a Yahoo Web mail search query log, to similar work conducted in Web and Twitter search. In addition, we demonstrate the value of the users personal query log, as well as of the global query log and of the demographic signals, in a key search task: dynamic query auto-completion. We discuss how going beyond users personal query logs (their search history) significantly improves the quality of suggestions, in spite of the fact that a users mailbox is perceived as being highly personal. In particular, we note the striking value of demographic features for queries relating to companies/organizations, thus verifying our assumption that query completion benefits from leveraging queries issued by ``people like me. We believe that demographics and other such global features can be leveraged in other mail applications, and hope that this work is a first step in this direction.


european conference on information retrieval | 2016

Supporting Human Answers for Advice-Seeking Questions in CQA Sites

Liora Braunstain; Oren Kurland; David Carmel; Idan Szpektor; Anna Shtok

In many questions in Community Question Answering sites users look for the advice or opinion of other users who might offer diverse perspectives on a topic at hand. The novel task we address is providing supportive evidence for human answers to such questions, which will potentially help the asker in choosing answers that fit her needs. We present a support retrieval model that ranks sentences from Wikipedia by their presumed support for a human answer. The model outperforms a state-of-the-art textual entailment system designed to infer factual claims from texts. An important aspect of the model is the integration of relevance oriented and support oriented features.


international world wide web conferences | 2017

Promoting Relevant Results in Time-Ranked Mail Search

David Carmel; Liane Lewin-Eytan; Alex Libov; Yoelle Maarek; Ariel Raviv

Mail search has traditionally served time-ranked results, even if it has been shown that relevance ranking provides higher retrieval quality on average. Some Web mail services have recently started to provide relevance ranking options such as the relevance toggle in the search results page of Yahoo Mail, or the ``top results section in Inbox by Gmail. Yet, ranking results by relevance is not accepted by all, either in mail search, or in in other domains such as social media, where it has even triggered some public outcry. Given the sensitivity of the topic, we propose here to investigate a mixed approach of promoting the most relevant results, to which we refer as ``heroes, on top of time-ranked results. We argue that this approach represents a good compromise to mail searchers, supporting on one hand the time sorted paradigm they are familiar with, while being almost as effective as full relevance ranking view that Web mail users seem to be reluctant to adopt. We describe three hero-selection algorithms we have devised and the associated experiments we have conducted in Yahoo mail. We measure retrieval success via two metrics: MRR (Mean Reciprocal Rank) and Success@k, and verify agreement between these metrics and users direct feedback. We demonstrate that supplementing time-sorted results with hero results leads to a higher MRR than the traditional time-sorted view. We additionally show that MRR better reflects users perception of quality than Success@k. Finally, we report on online results following the successful launch of one of our hero-selection algorithms for all Yahoo enterprise mail users and a few million Yahoo Web mail users.

Collaboration


Dive into the David Carmel's collaboration.

Top Co-Authors

Avatar

Oren Kurland

Technion – Israel Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Anna Shtok

Technion – Israel Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge