Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Christian Wartena is active.

Publication


Featured researches published by Christian Wartena.


international conference on multimedia retrieval | 2011

Automatic tagging and geotagging in video collections and communities

Martha Larson; Mohammad Soleymani; Pavel Serdyukov; Stevan Rudinac; Christian Wartena; Vanessa Murdock; Gerald Friedland; Roeland Ordelman; Gareth J. F. Jones

Automatically generated tags and geotags hold great promise to improve access to video collections and online communities. We overview three tasks offered in the MediaEval 2010 benchmarking initiative, for each, describing its use scenario, definition and the data set released. For each task, a reference algorithm is presented that was used within MediaEval 2010 and comments are included on lessons learned. The Tagging Task, Professional involves automatically matching episodes in a collection of Dutch television with subject labels drawn from the keyword thesaurus used by the archive staff. The Tagging Task, Wild Wild Web involves automatically predicting the tags that are assigned by users to their online videos. Finally, the Placing Task requires automatically assigning geo-coordinates to videos. The specification of each task admits the use of the full range of available information including user-generated metadata, speech recognition transcripts, audio, and visual features.


database and expert systems applications | 2008

Topic Detection by Clustering Keywords

Christian Wartena; Rogier Brussee

We consider topic detection without any prior knowledge of category structure or possible categories. Keywords are extracted and clustered based on different similarity measures using the induced k-bisecting clustering algorithm. Evaluation on Wikipedia articles shows that clusters of keywords correlate strongly with the Wikipedia categories of the articles. In addition, we find that a distance measure based on the Jensen-Shannon divergence of probability distributions outperforms the cosine similarity. In particular, a newly proposed term distribution taking co-occurrence of terms into account gives best results.


database and expert systems applications | 2010

Keyword Extraction Using Word Co-occurrence

Christian Wartena; Rogier Brussee; Wout Slakhorst

A common strategy to assign keywords to documents is to select the most appropriate words from the document text. One of the most important criteria for a word to be selected as keyword is its relevance for the text. The tf.idf score of a term is a widely used relevance measure. While easy to compute and giving quite satisfactory results, this measure does not take (semantic) relations between words into account. In this paper we study some alternative relevance measures that do use relations between words. They are computed by defining co-occurrence distributions for words and comparing these distributions with the document and the corpus distribution. We then evaluate keyword extraction algorithms defined by selecting different relevance measures. For two corpora of abstracts with manually assigned keywords, we compare manually extracted keywords with different automatically extracted ones. The results show that using word co-occurrence information can improve precision and recall over tf.idf.


intelligent systems design and applications | 2009

Using Tag Co-occurrence for Recommendation

Christian Wartena; Rogier Brussee; Martin Wibbels

Tagging with free form tags is becoming an increasingly important indexing mechanism. However, free form tags have characteristics that require special treatment when used for searching or recommendation because they show much more variation than controlled keywords. In this paper we present a method that puts this large variation to good use. We introduce second order co-occurrence and a related distance measure measure for tag similarities that is robust against the variation in tags. From this distance measure it is straightforward to derive methods to analyze user interest and compute recommendations. We evaluate the use of tag based recommendation on the Movielens dataset and a dataset of tagged books.


database and expert systems applications | 2010

Thesaurus Based Term Ranking for Keyword Extraction

Luit Gazendam; Christian Wartena; Rogier Brussee

In many cases keywords from a restricted set of possible keywords have to be assigned to texts. A common way to find the best keywords is to rank terms occurring in the text according to their tf.idf value. This requires a corpus of texts from which document frequencies can be derived. In this paper we show that we can obtain results of the same quality without the usage of a background corpus, using relations between terms provided in a thesaurus.


database and expert systems applications | 2007

Apolda: A Practical Tool for Semantic Annotation

Christian Wartena; Rogier Brussee; Luit Gazendam; Willem-Olaf Huijsen

In this paper we give an overview of methods to find representations of ontology defined concepts in texts. We distinguish two approaches: lexicon-based methods and approaches using lexicalized ontologies. We focus on the latter method and describe the problems and choices that have to be made if this approach is put to work. Finally we describe an open-source tool that implements the lexicalized ontology approach along with two examples of its applicability in a practical context.


Interdisciplinary Science Reviews | 2009

Automatic Annotation Suggestions for Audiovisual Archives: Evaluation Aspects

Luit Gazendam; Christian Wartena; Véronique Malaisé; Guus Schreiber; Annemieke de Jong; Hennie Brugman

Abstract In the context of large and ever growing archives, generating annotation suggestions automatically from textual resources related to the documents to be archived is an interesting option in theory. It could save a lot of work in the time consuming and expensive task of manual annotation and it could help cataloguers attain a higher inter-annotator agreement. However, some questions arise in practice: what is the quality of the automatically produced annotations? How do they compare with manual annotations and with the requirements for annotation that were defined in the archive? If different from the manual annotations, are the automatic annotations wrong? In the CHOICE project, partially hosted at the Netherlands Institute for Sound and Vision, the Dutch public archive for audiovisual broadcasts, we automatically generate annotation suggestions for cataloguers. In this paper, we define three types of evaluation of these annotation suggestions: (1) a classic and strict evaluation measure expressing the overlap between automatically generated keywords and the manual annotations, (2) a loosened evaluation measure for which semantically very similar annotations are also considered as relevant matches, and (3) an in-use evaluation of the usefulness of manual versus automatic annotations in the context of serendipitous browsing. During serendipitous browsing, the annotations (manual or automatic) are used to retrieve and visualize semantically related documents.


international semantic web conference | 2008

Instanced-Based Mapping between Thesauri and Folksonomies

Christian Wartena; Rogier Brussee

The emergence of web based systems in which users can annotate items, raises the question of the semantic interoperability between vocabularies originating from collaborative annotation processes, often called folksonomies, and keywords assigned in a more traditional way. If collections are annotated according to two systems, e.g. with tags and keywords, the annotated data can be used for instance based mapping between the vocabularies. The basis for this kind of matching is an appropriate similarity measure between concepts, based on their distribution as annotations. In this paper we propose a new similarity measure that can take advantage of some special properties of user generated metadata. We have evaluated this measure with a set of articles from Wikipedia which are both classified according to the topic structure of Wikipedia and annotated by users of the bookmarking service del.icio.us. The results using the new measure are significantly better than those obtained using standard similarity measures proposed for this task in the literature, i.e., it correlates better with human judgments. We argue that the measure also has benefits for instance based mapping of more traditionally developed vocabularies.


content based multimedia indexing | 2012

Comparing retrieval effectiveness of alternative content segmentation methods for Internet video search

Maria Eskevich; Gareth J. F. Jones; Christian Wartena; Martha Larson; Robin Aly; Thijs Verschoor; Roeland Ordelman

We present an exploratory study of the retrieval of semiprofessional user-generated Internet video. The study is based on the MediaEval 2011 Rich Speech Retrieval (RSR) task for which the dataset was taken from the Internet sharing platform blip.tv, and search queries associated with specific speech acts occurring in the video. We compare results from three participant groups using: automatic speech recognition system transcript (ASR), metadata manually assigned to each video by the user who uploaded it, and their combination. RSR 2011 was a known-item search for a single manually identified ideal jump-in point in the video for each query where playback should begin. Retrieval effectiveness is measured using the MRR and mGAP metrics. Using different transcript segmentation methods the participants tried to maximize the rank of the relevant item and to locate the nearest match to the ideal jump-in point. Results indicate that best overall results are obtained for topically homogeneous segments which have a strong overlap with the relevant region associated with the jump-in point, and that use of metadata can be beneficial when segments are unfocused or cover more than one topic.


conference on information and knowledge management | 2010

Selecting keywords for content based recommendation

Christian Wartena; Wout Slakhorst; Martin Wibbels

The continued growth of online content makes personalized recommendation an increasingly important tool for media consumption. While collaborative filtering techniques have shown to be very successful in stable collections, content based approaches are necessary for recommending new items. Content based recommendation uses the similarity between new items and consumed items to predict whether a new item is interesting for the user. The similarity is computed by comparing the content or the meta-data of the items. In this paper we consider recommendation of TV-broadcasts for which meta-data and synopses are available. We thereby concentrate on the new item problem. We investigate the value of different types of meta-data provided by the broadcaster or extracted from synopsis. We show that extracted keywords are better suited for recommendation than manually assigned keywords. Furthermore we show that the number of keywords used is of great importance. Using a rather small number of keywords to present an item yields the best results for recommendation.

Collaboration


Dive into the Christian Wartena's collaboration.

Top Co-Authors

Avatar

Martha Larson

Delft University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Zeno Gantner

University of Hildesheim

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge