Gilad Mishne
Yahoo!
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gilad Mishne.
web search and data mining | 2008
Eugene Agichtein; Carlos Castillo; Debora Donato; Aristides Gionis; Gilad Mishne
The quality of user-generated content varies drastically from excellent to abuse and spam. As the availability of such content increases, the task of identifying high-quality content sites based on user contributions --social media sites -- becomes increasingly important. Social media in general exhibit a rich variety of information sources: in addition to the content itself, there is a wide array of non-content information available, such as links between items and explicit quality ratings from members of the community. In this paper we investigate methods for exploiting such community feedback to automatically identify high quality content. As a test case, we focus on Yahoo! Answers, a large community question/answering portal that is particularly rich in the amount and types of content and social interactions available in it. We introduce a general classification framework for combining the evidence from different sources of information, that can be tuned automatically for a given social media type and quality definition. In particular, for the community question/answering domain, we show that our system is able to separate high-quality items from the rest with an accuracy close to that of humans
web search and data mining | 2010
Anlei Dong; Yi Chang; Zhaohui Zheng; Gilad Mishne; Jing Bai; Ruiqiang Zhang; Karolina Buchner; Ciya Liao; Fernando Diaz
In web search, recency ranking refers to ranking documents by relevance which takes freshness into account. In this paper, we propose a retrieval system which automatically detects and responds to recency sensitive queries. The system detects recency sensitive queries using a high precision classifier. The system responds to recency sensitive queries by using a machine learned ranking model trained for such queries. We use multiple recency features to provide temporal evidence which effectively represents document recency. Furthermore, we propose several training methodologies important for training recency sensitive rankers. Finally, we develop new evaluation metrics for recency sensitive queries. Our experiments demonstrate the efficacy of the proposed approaches.
ACM Transactions on The Web | 2012
Guangyu Zhu; Gilad Mishne
User browsing information, particularly non-search-related activity, reveals important contextual information on the preferences and intents of Web users. In this article, we demonstrate the importance of mining general Web user behavior data to improve ranking and other Web-search experience, with an emphasis on analyzing individual user sessions for creating aggregate models. In this context, we introduce ClickRank, an efficient, scalable algorithm for estimating Webpage and Website importance from general Web user-behavior data. We lay out the theoretical foundation of ClickRank based on an intentional surfer model and discuss its properties. We quantitatively evaluate its effectiveness regarding the problem of Web-search ranking, showing that it contributes significantly to retrieval performance as a novel Web-search feature. We demonstrate that the results produced by ClickRank for Web-search ranking are highly competitive with those produced by other approaches, yet achieved at better scalability and substantially lower computational costs. Finally, we discuss novel applications of ClickRank in providing enriched user Web-search experience, highlighting the usefulness of our approach for nonranking tasks.
knowledge discovery and data mining | 2009
Guangyu Zhu; Gilad Mishne
User browsing information, particularly their non-search related activity, reveals important contextual information on the preferences and the intent of web users. In this paper, we expand the use of browsing information for web search ranking and other applications, with an emphasis on analyzing individual user sessions for creating aggregate models. In this context, we introduce ClickRank, an efficient, scalable algorithm for estimating web page and web site importance from browsing information. We lay out the theoretical foundation of ClickRank based on an intentional surfer model and analyze its properties. We evaluate its effectiveness for the problem of web search ranking, showing that it contributes significantly to retrieval performance as a novel web search feature. We demonstrate that the results produced by ClickRank for web search ranking are highly competitive with those produced by other approaches, yet achieved at better scalability and substantially lower computational costs. Finally, we discuss novel applications of ClickRank in providing enriched user web search experience, highlighting the usefulness of our approach for non-ranking tasks.
conference on information and knowledge management | 2010
Alpa Jain; Gilad Mishne
All state-of-the-art web search engines implement an auto-completion mechanism - an assistive technology enabling users to effectively formulate their search queries by predicting the next characters or words that they are likely to type. Query completions (or suggestions) are typically mined from past user interactions with the search engine, e.g., from query logs, clickthrough patterns, or query reformulations; they are ranked by some measure of query popularity, e.g., query frequency or clickthrough rate. Current query suggestion tools largely assume that the set of suggestions provided to the users is homogeneous, corresponding to a single real-world interpretation of the query. In this paper, we hypothesize that, in some cases, users would benefit from an alternative presentation of the suggestions, one where suggestions are not only ordered by likelihood but also organized by high-level user intent. Rich search suggestion interaction frameworks that reduce the user effort in identifying the set of relevant suggestions open new and promising directions towards improving user experience. Along these lines, we propose clustering the set of suggestions presented to a search engine user, and assigning an appropriate label to each subset of suggestions to help users quickly identify useful ones. For this, we present a variety of unsupervised clustering techniques for search suggestions, based on the information available to a large-scale web search engine. We evaluate our novel search suggestion presentation techniques on a real-world dataset of query logs. Based on a set of user studies, we show that by extending the existing assistance layer to effectively group suggestions and label them - while accounting for the query popularity - we substantially increase the users satisfaction.
empirical methods in natural language processing | 2009
Yumao Lu; Fuchun Peng; Gilad Mishne; Xing Wei; Benoit Dumoulin
Most existing information retrieval (IR) systems do not take much advantage of natural language processing (NLP) techniques due to the complexity and limited observed effectiveness of applying NLP to IR. In this paper, we demonstrate that substantial gains can be obtained over a strong baseline using NLP techniques, if properly handled. We propose a framework for deriving semantic text matching features from named entities identified in Web queries; we then utilize these features in a supervised machine-learned ranking approach, applying a set of emerging machine learning techniques. Our approach is especially useful for queries that contain multiple types of concepts. Comparing to a major commercial Web search engine, we observe a substantial 4% DCG5 gain over the affected queries.
international world wide web conferences | 2010
Ana Maria Popescu; Patrick Pantel; Gilad Mishne
We describe improvements to the use of semantic lexicons by a state-of-the-art query interpretation system powering a major search engine. We successfully compute concept label importance information for lexicon strings; lexicon augmentation with such information leads to a 6.4% precision increase on affected queries with no query coverage loss. Finally, lexicon filtering based on label importance leads to a 13% precision increase, but at the expense of query coverage.
Archive | 2009
Anlei Dong; Yi Chang; Ruiqiang Zhang; Zhaohui Zheng; Gilad Mishne; Jing Bai; Karolina Buchner; Ciya Liao; Shihao Ji; Gilbert Leung; Georges-Eric Albert Marie Robert Dupret; Ling Liu
Archive | 2009
Gilad Mishne; Alpa Jain
Archive | 2009
Gilad Mishne; Raymond P. Stata; Fuchun Peng