Hans Friedrich Witschel

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hans Friedrich Witschel is active.

Explore More

Publication

Featured researches published by Hans Friedrich Witschel.

string processing and information retrieval | 2007

Admission policies for caches of search engine results

Ricardo A. Baeza-Yates; Flavio Junqueira; Vassilis Plachouras; Hans Friedrich Witschel

This paper studies the impact of the tail of the query distribution on caches of Web search engines, and proposes a technique for achieving higher hit ratios compared to traditional heuristics such as LRU. The main problem we solve is the one of identifying infrequent queries, which cause a reduction on hit ratio because caching them often does not lead to hits. To mitigate this problem, we introduce a cache management policy that employs an admission policy to prevent infrequent queries from taking space of more frequent queries in the cache. The admission policy uses either stateless features, which depend only on the query, or stateful features based on usage information. The proposed management policy is more general than existing policies for caching of search engine results, and it is fully dynamic. The evaluation results on two different query logs show that our policy achieves higher hit ratios when compared to previously proposed cache management policies.

international conference natural language processing | 2006

Ord i dag: mining norwegian daily newswire

Unni Cathrine Eiken; Anja Therese Liseth; Hans Friedrich Witschel; Matthias Richter; Chris Biemann

We present Ord i Dag, a new service that displays today’s most important keywords. These are extracted fully automatically from Norwegian online newspapers. Describing the complete process, we provide an entirely disclosed method for media monitoring and news summarization. For keyword extraction, a reference corpus serves as background about average language use, which is contrasted with the current day’s word frequencies. Having detected the most prominent keywords of a day, we introduce several ways of grouping and displaying them in intuitive ways. A discussion about possible applications concludes. Up to now, the service is available for Norwegian and German. As only some shallow language-specific processing is needed, it can easily be set up for other languages.

information retrieval in peer to peer networks | 2005

Evaluating profiling and query expansion methods for P2P information retrieval

Hans Friedrich Witschel; Thomas Böhme

This paper addresses the issue of peer profiles, i.e. compact representations of a peers locally offered content, and their use in P2P information retrieval and routing. Experiments with different profile building and query expansion methods show that compression of profiles is possible without losing too much retrieval performance and that query expansion using global co-occurrence data can improve results by approx. 10%.

large scale distributed systems for information retrieval | 2008

Ranking information resources in peer-to-peer text retrieval: an experimental study

Hans Friedrich Witschel

This paper experimentally studies approaches to the problem of ranking information resources w.r.t. user queries in peer-to-peer information retrieval. In distributed environments, for each given user query and a set of information resources that are available, we need to select the right subset of these resources to forward the query to. Here, we study the problem of pruning descriptions of resources to acceptable lengths in a peer-to-peer scenario and two approaches to overcome the mismatch problem that may arise as a consequence of the pruning, namely query expansion and learning better resource descriptions from query streams. The results show that resource descriptions can be pruned to a large extent without ill effects and that learning better descriptions from query streams works much better than query expansion.

european conference on information retrieval | 2008

An evaluation measure for distributed information retrieval systems

Hans Friedrich Witschel; Florian Holz; Gregor Heinrich; Sven Teresniak

This paper is concerned with the evaluation of distributed and peer-to-peer information retrieval systems. A new measure is introduced that compares results of a distributed retrieval system to those of a centralised system, fully exploiting the ranking of the latter as an indicator of gradual relevance. Problems with existing evaluation approaches are verified experimentally.

document engineering | 2006

Carrot and stick: combining information retrieval models

Hans Friedrich Witschel

Quite recently, the idea of language models [4] has been proposed as a new information retrieval model and has been shown to outperform most other term weighting schemes. Language models are based on calculating the probability of a query q being generated from a document d’s language model. To do this, relative frequencies of terms in a document are smoothed with collection frequencies, e.g. using Dirichlet priors: p(q|d) = Q t∈q tftd+μp(t|C) |d|+μ , where p(t|C) is term t’s relative frequency in the collection, tftd its frequency in document d and μ a smoothing parameter. Analytically, since each factor in the above product is smaller than one, it can be thought of as a penalty. This penalty has a strong impact if a term is not contained in a document (tftd = 0) and even stronger if that term is a rare one (p(t|C) very small). This means that language models can be interpreted as penalising documents for the absence of informative terms whereas traditional weighting schemes like tf.idf reward the presence of these terms.

international acm sigir conference on research and development in information retrieval | 2007

Global resources for peer-to-peer text retrieval

Hans Friedrich Witschel

The thesis presented in this paper tackles selected issues in unstructured peer-to-peer information retrieval (P2PIR) systems, using world knowledge for solving P2PIR problems. A first part uses so-called reference corpora for estimating global term weights such as IDF instead of sampling them from the distributed collection. A second part of the work will be dedicated to the question of query routing in unstructured P2PIR systems using peer resource descriptions and world knowledge for query expansion.

Archive | 2004