Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Valentin Jijkoun is active.

Publication


Featured researches published by Valentin Jijkoun.


Archive | 2008

Advances in Multilingual and Multimodal Information Retrieval

Carol Peters; Valentin Jijkoun; Thomas Mandl; Henning Müller; Douglas W. Oard; Anselmo Peñas; Vivien Petras; Diana Santos

This book constitutes the thoroughly refereed proceedings of the 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, held in Budapest, Hungary, September 2007. The revised and extended papers were carefully reviewed and selected for inclusion in the book. There are 115 contributions in total and an introduction. The seven distinct evaluation tracks in CLEF 2007, are designed to test the performance of a wide range of multilingual information access systems or system components. The papers are organized in topical sections on Multilingual Textual Document Retrieval (Ad Hoc), Domain-Specific Information Retrieval (Domain-Specific), Multiple Language Question Answering (QA@CLEF), cross-language retrieval in image collections (Image CLEF), cross-language speech retrieval (CL-SR), multilingual Web retrieval (WebCLEF), cross-language geographical retrieval (GeoCLEF), and CLEF in other evaluations.


international conference on computational linguistics | 2004

Information extraction for question answering: improving recall through syntactic patterns

Valentin Jijkoun; Maarten de Rijke; Jori Mur

We investigate the impact of the precision/recall trade-off of information extraction on the performance of an offline corpus-based question answering (QA) system. One of our findings is that, because of the robust final answer selection mechanism of the QA system, recall is more important. We show that the recall of the extraction component can be improved using syntactic parsing instead of more common surface text patterns, substantially increasing the number of factoid questions answered by the QA system.


european conference on information retrieval | 2008

The impact of named entity normalization on information retrieval for question answering

Mahboob Alam Khalid; Valentin Jijkoun; Maarten de Rijke

In the named entity normalization task, a system identifies a canonical unambiguous referent for names like Bush or Alabama. Resolving synonymy and ambiguity of such names can benefit end-to-end information access tasks. We evaluate two entity normalization methods based on Wikipedia in the context of both passage and document retrieval for question anwering. We find that even a simple normalization method leads to improvements of early precision, both for document and passage retrieval. Moreover, better normalization results in better retrieval performance.


analytics for noisy unstructured text data | 2008

Named entity normalization in user generated content

Valentin Jijkoun; Mahboob Alam Khalid; Maarten Marx; Maarten de Rijke

Named entity recognition is important for semantically oriented retrieval tasks, such as question answering, entity retrieval, biomedical retrieval, trend detection, and event and entity tracking. In many of these tasks it is important to be able to accurately normalize the recognized entities, i.e., to map surface forms to unambiguous references to real world entities. Within the context of structured databases, this task (known as record linkage and data de-duplication) has been a topic of active research for more than five decades. For edited content, such as news articles, the named entity normalization (NEN) task is one that has recently attracted considerable attention. We consider the task in the challenging context of user generated content (UGC), where it forms a key ingredient of tracking and media-analysis systems. A baseline NEN system from the literature (that normalizes surface forms to Wikipedia pages) performs considerably worse on UGC than on edited news: accuracy drops from 80% to 65% for a Dutch language data set and from 94% to 77% for English. We identify several sources of errors: entity recognition errors, multiple ways of referring to the same entity and ambiguous references. To address these issues we propose five improvements to the baseline NEN algorithm, to arrive at a language independent NEN system that achieves overall accuracy scores of 90% on the English data set and 89% on the Dutch data set. We show that each of the improvements contributes to the overall score of our improved NEN algorithm, and conclude with an error analysis on both Dutch and English language UGC. The NEN system is computationally efficient and runs with very modest computational requirements.


european conference on information retrieval | 2004

Answer selection in a multi-stream open domain question answering system

Valentin Jijkoun; Maarten de Rijke

Question answering systems aim to meet users’ information needs by returning exact answers in response to a question. Traditional open domain question answering systems are built around a single pipeline architecture. In an attempt to exploit multiple resources as well as multiple answering strategies, systems based on a multi-stream architecture have recently been introduced. Such systems face the challenging problem of having to select a single answer from pools of answers obtained using essentially different techniques. We report on experiments aimed at understanding and evaluating the effect of different options for answer selection in a multi-stream question answering system. We examine the impact of local tiling techniques, assignments of weights to streams based on past performance and/or question type, as well redundancy-based ideas. Our main finding is that redundancy-based ideas in combination with naively learned stream weights conditioned on question type work best, and improve significantly over a number of baselines.


meeting of the association for computational linguistics | 2009

Generating a Non-English Subjectivity Lexicon: Relations That Matter

Valentin Jijkoun; Katja Hofmann

We describe a method for creating a non-English subjectivity lexicon based on an English lexicon, an online translation service and a general purpose thesaurus: Wordnet. We use a PageRank-like algorithm to bootstrap from the translation of the English lexicon and rank the words in the thesaurus by polarity using the network of lexical relations in Wordnet. We apply our method to the Dutch language. The best results are achieved when using synonymy and antonymy relations only, and ranking positive and negative words simultaneously. Our method achieves an accuracy of 0.82 at the top 3,000 negative words, and 0.62 at the top 3,000 positive words.


cross language evaluation forum | 2008

Overview of WebCLEF 2006

Valentin Jijkoun; Maarten de Rijke

This paper describes the WebCLEF 2007 task. The task definition--which goes beyond traditional navigational queries and is concerned with undirected information search goals--combines insights gained at previous editions of WebCLEF and of the WiQA pilot that was run at CLEF 2006. We detail the task, the assessment procedure and the results achieved by the participants.


Journal of Applied Logic | 2007

Data-driven Type Checking in Open Domain Question Answering

Stefan Schlobach; David Ahn; Maarten de Rijke; Valentin Jijkoun

Abstract Many open domain question answering systems answer questions by first harvesting a large number of candidate answers, and then picking the most promising one from the list. One criterion for this answer selection is type checking: deciding whether the candidate answer is of the semantic type expected by the question. We define a general strategy for building redundancy-based type checkers, built around the notions of comparison set and scoring method, where the former provide a set of potential answer types and the latter are meant to capture the relation between a candidate answer and an answer type. Our focus is on scoring methods. We discuss nine such methods, provide a detailed experimental comparison and analysis of these methods, and find that the best performing scoring method performs at the same level as knowledge-intensive methods, although our experiments do not reveal a clear-cut answer on the question whether any of the scoring methods we consider should be preferred over the others.


cross language evaluation forum | 2007

Overview of the WiQA task at CLEF 2006

Valentin Jijkoun; Maarten de Rijke

We describe WiQA 2006, a pilot task aimed at studying question answering using Wikipedia. Going beyond traditional factoid questions, the task considered at WiQA 2006 was to return--given an source page from Wikipedia--to identify snippets from other Wikipedia pages, possibly in languages different from the language of the source page, that add new and important information to the source page, and that do so without repetition. A total of 7 teams took part, submitting 20 runs. Our main findings are two-fold: (i) while challenging, the tasks considered at WiQA are do-able as participants achieved impressive scores as measured in terms of yield, mean reciprocal rank, and precision, (ii) on the bilingual task, substantially higher scores were achieved than on the monolingual tasks.


international conference on machine learning | 2005

Recognizing textual entailment: is word similarity enough?

Valentin Jijkoun; Maarten de Rijke

We describe the system we used at the PASCAL-2005 Recognizing Textual Entailment Challenge. Our method for recognizing entailment is based on calculating “directed” sentence similarity: checking the directed “semantic” word overlap between the text and the hypothesis. We use frequency-based term weighting in combination with two different word similarity measures. Although one version of the system shows significant improvement over randomly guessing decisions (with an accuracy score of 57.3), we show that this is only due to a subset of the data that can be equally well handled by simple word overlap. Furthermore, we give an in-depth analysis of the system and the data of the challenge.

Collaboration


Dive into the Valentin Jijkoun's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Gilad Mishne

University of Amsterdam

View shared research outputs
Top Co-Authors

Avatar

David Ahn

University of Amsterdam

View shared research outputs
Top Co-Authors

Avatar

M. de Rijke

University of Amsterdam

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Maarten Marx

University of Amsterdam

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge