Vassilis Plachouras | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vassilis Plachouras is active.

Explore More

Publication

Featured researches published by Vassilis Plachouras.

international acm sigir conference on research and development in information retrieval | 2007

The impact of caching on search engines

Ricardo A. Baeza-Yates; Aristides Gionis; Flavio Junqueira; Vanessa Murdock; Vassilis Plachouras; Fabrizio Silvestri

In this paper we study the trade-offs in designing efficient caching systems for Web search engines. We explore the impact of different approaches, such as static vs. dynamic caching, and caching query results vs.caching posting lists. Using a query log spanning a whole year we explore the limitations of caching and we demonstrate that caching posting lists can achieve higher hit rates than caching query answers. We propose a new algorithm for static caching of posting lists, which outperforms previous methods. We also study the problem of finding the optimal way to split the static cache between answers and posting lists. Finally, we measure how the changes in the query log affect the effectiveness of static caching, given our observation that the distribution of the queries changes slowly over time. Our results and observations are applicable to different levels of the data-access hierarchy, for instance, for a memory/disk layer or a broker/remote server layer.

european conference on information retrieval | 2005

Terrier information retrieval platform

Iadh Ounis; Gianni Amati; Vassilis Plachouras; Ben He; Craig Macdonald; Douglas Johnson

Terrier is a modular platform for the rapid development of large-scale Information Retrieval (IR) applications. It can index various document collections, including TREC and Web collections. Terrier also offers a range of document weighting and query expansion models, based on the Divergence From Randomness framework. It has been successfully used for ad-hoc retrieval, cross-language retrieval, Web IR and intranet search, in a centralised or distributed setting.

international world wide web conferences | 2008

Online learning from click data for sponsored search

Massimiliano Ciaramita; Vanessa Murdock; Vassilis Plachouras

Sponsored search is one of the enabling technologies for todays Web search engines. It corresponds to matching and showing ads related to the user query on the search engine results page. Users are likely to click on topically related ads and the advertisers pay only when a user clicks on their ad. Hence, it is important to be able to predict if an ad is likely to be clicked, and maximize the number of clicks. We investigate the sponsored search problem from a machine learning perspective with respect to three main sub-problems: how to use click data for training and evaluation, which learning framework is more suitable for the task, and which features are useful for existing models. We perform a large scale evaluation based on data from a commercial Web search engine. Results show that it is possible to learn and evaluate directly and exclusively on click data encoding pairwise preferences following simple and conservative assumptions. We find that online multilayer perceptron learning, based on a small set of features representing content similarity of different kinds, significantly outperforms an information retrieval baseline and other learning models, providing a suitable framework for the sponsored search task.

international conference on data engineering | 2007

Challenges on Distributed Web Retrieval

Ricardo A. Baeza-Yates; Carlos Castillo; Flavio Junqueira; Vassilis Plachouras; Fabrizio Silvestri

In the ocean of Web data, Web search engines are the primary way to access content. As the data is on the order of petabytes, current search engines are very large centralized systems based on replicated clusters. Web data, however, is always evolving. The number of Web sites continues to grow rapidly and there are currently more than 20 billion indexed pages. In the near future, centralized systems are likely to become ineffective against such a load, thus suggesting the need of fully distributed search engines. Such engines need to achieve the following goals: high quality answers, fast response time, high query throughput, and scalability. In this paper we survey and organize recent research results, outlining the main challenges of designing a distributed Web retrieval system.

ACM Transactions on The Web | 2008

Design trade-offs for search engine caching

Ricardo A. Baeza-Yates; Aristides Gionis; Flavio Junqueira; Vanessa Murdock; Vassilis Plachouras; Fabrizio Silvestri

In this article we study the trade-offs in designing efficient caching systems for Web search engines. We explore the impact of different approaches, such as static vs. dynamic caching, and caching query results vs. caching posting lists. Using a query log spanning a whole year, we explore the limitations of caching and we demonstrate that caching posting lists can achieve higher hit rates than caching query answers. We propose a new algorithm for static caching of posting lists, which outperforms previous methods. We also study the problem of finding the optimal way to split the static cache between answers and posting lists. Finally, we measure how the changes in the query log influence the effectiveness of static caching, given our observation that the distribution of the queries changes slowly over time. Our results and observations are applicable to different levels of the data-access hierarchy, for instance, for a memory/disk layer or a broker/remote server layer.

international workshop on data mining and audience intelligence for advertising | 2007

A noisy-channel approach to contextual advertising

Vanessa Murdock; Massimiliano Ciaramita; Vassilis Plachouras

Contextual advertising is a growing category of search advertising. It presents a particular challenge to ad placement systems because of the sparseness of the language of advertising. We present a system that is language independent and knowledge free based on SVM ranking. We evaluate it on a large number of advertisements appearing on real Web pages. Our contribution is two new classes of features of similarity between ads and Web pages based on machine translation technologies. We show that our features significantly improve performance over baseline techniques.

international world wide web conferences | 2010

A refreshing perspective of search engine caching

Berkant Barla Cambazoglu; Flavio Junqueira; Vassilis Plachouras; Scott Alexander Banachowski; Baoqiu Cui; Swee Lim; Bill Bridge

Commercial Web search engines have to process user queries over huge Web indexes under tight latency constraints. In practice, to achieve low latency, large result caches are employed and a portion of the query traffic is served using previously computed results. Moreover, search engines need to update their indexes frequently to incorporate changes to the Web. After every index update, however, the content of cache entries may become stale, thus decreasing the freshness of served results. In this work, we first argue that the real problem in todays caching for large-scale search engines is not eviction policies, but the ability to cope with changes to the index, i.e., cache freshness. We then introduce a novel algorithm that uses a time-to-live value to set cache entries to expire and selectively refreshes cached results by issuing refresh queries to back-end search clusters. The algorithm prioritizes the entries to refresh according to a heuristic that combines the frequency of access with the age of an entry in the cache. In addition, for setting the rate at which refresh queries are issued, we present a mechanism that takes into account idle cycles of back-end servers. Evaluation using a real workload shows that our algorithm can achieve hit rate improvements as well as reduction in average hit ages. An implementation of this algorithm is currently in production use at Yahoo!.

string processing and information retrieval | 2007

Admission policies for caches of search engine results

Ricardo A. Baeza-Yates; Flavio Junqueira; Vassilis Plachouras; Hans Friedrich Witschel

This paper studies the impact of the tail of the query distribution on caches of Web search engines, and proposes a technique for achieving higher hit ratios compared to traditional heuristics such as LRU. The main problem we solve is the one of identifying infrequent queries, which cause a reduction on hit ratio because caching them often does not lead to hits. To mitigate this problem, we introduce a cache management policy that employs an admission policy to prevent infrequent queries from taking space of more frequent queries in the cache. The admission policy uses either stateless features, which depend only on the query, or stateful features based on usage information. The proposed management policy is more general than existing policies for caching of search engine results, and it is fully dynamic. The evaluation results on two different query logs show that our policy achieves higher hit ratios when compared to previously proposed cache management policies.

international acm sigir conference on research and development in information retrieval | 2008

ResIn: a combination of results caching and index pruning for high-performance web search engines

Gleb Skobeltsyn; Flavio Junqueira; Vassilis Plachouras; Ricardo A. Baeza-Yates

Results caching is an efficient technique for reducing the query processing load, hence it is commonly used in real search engines. This technique, however, bounds the maximum hit rate due to the large fraction of singleton queries, which is an important limitation. In this paper we propose ResIn - an architecture that uses a combination of results caching and index pruning to overcome this limitation. We argue that results caching is an inexpensive and efficient way to reduce the query processing load and show that it is cheaper to implement compared to a pruned index. At the same time, we show that index pruning performance is fundamentally affected by the changes in the query traffic that the results cache induces. We experiment with real query logs and a large document collection, and show that the combination of both techniques enables efficient reduction of the query processing costs and thus is practical to use in Web search engines.

international acm sigir conference on research and development in information retrieval | 2007

Incorporating term dependency in the dfr framework

Jie Peng; Craig Macdonald; Ben He; Vassilis Plachouras; Iadh Ounis

Term dependency, or co-occurrence, has been studied in language modelling, for instance by Metzler & Croft who showed that retrieval performance could be significantlyenhanced using term dependency information. In this work, weshow how term dependency can be modelled within the Divergence From Randomness (DFR) framework. We evaluate our term dependency model on the two adhoc retrieval tasks using the TREC .GOV2 Terabyte collection. Furthermore, we examine the effect of varying the term dependency window size on the retrieval performance of the proposed model. Our experiments show that term dependency can indeed besuccessfully incorporated within the DFR framework.

Explore More