Stefanie Nowak
Fraunhofer Society
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Stefanie Nowak.
multimedia information retrieval | 2010
Stefanie Nowak; Stefan M. Rüger
The creation of golden standard datasets is a costly business. Optimally more than one judgment per document is obtained to ensure a high quality on annotations. In this context, we explore how much annotations from experts differ from each other, how different sets of annotations influence the ranking of systems and if these annotations can be obtained with a crowdsourcing approach. This study is applied to annotations of images with multiple concepts. A subset of the images employed in the latest ImageCLEF Photo Annotation competition was manually annotated by expert annotators and non-experts with Mechanical Turk. The inter-annotator agreement is computed at an image-based and concept-based level using majority vote, accuracy and kappa statistics. Further, the Kendall τ and Kolmogorov-Smirnov correlation test is used to compare the ranking of systems regarding different ground-truths and different evaluation measures in a benchmark scenario. Results show that while the agreement between experts and non-experts varies depending on the measure used, its influence on the ranked lists of the systems is rather small. To sum up, the majority vote applied to generate one annotation set out of several opinions, is able to filter noisy judgments of non-experts to some extent. The resulting annotation set is of comparable quality to the annotations of experts.
cross language evaluation forum | 2009
Stefanie Nowak; Peter Dunker
The Large-Scale Visual Concept Detection and Annotation Task (LS-VCDT) in ImageCLEF 2009 aims at the detection of 53 concepts in consumer photos. These concepts are structured in an ontology which can be utilized during training and classification of the photos. The dataset consists of 18,000 Flickr photos which were manually annotated with 53 concepts. 5,000 photos were used for training and 13,000 for testing. Two evaluation paradigms have been applied, the evaluation per concept and the evaluation per photo. The evaluation per concept was performed by calculating the Equal Error Rate (EER) and the Area Under Curve (AUC). For the evaluation per photo a recently proposed ontology-based measure was utilized that takes the hierarchy and the relations of the ontology into account and calculates a score per photo. Altogether 19 research groups participated and submitted 73 runs. For the concepts, an average AUC of 84% could be achieved, including concepts with an AUC of 95%. The classification performance for each photo ranged between 68.7% and 100% with an average score of 89.6%.
multimedia information retrieval | 2008
Peter Dunker; Stefanie Nowak; Andre Begau; Cornelia Lanz
Mood or emotion information are often used search terms or navigation properties within multimedia archives, retrieval systems or multimedia players. Most of these applications engage end-users or experts to tag multimedia objects with mood annotations. Within the scientific community different approaches for content-based music, photo or multimodal mood classification can be found with a wide range of used mood definitions or models and completely different test suites. The purpose of this paper is to review common mood models in order to assess their flexibility, to present a generic multi-modal mood classification framework which uses various audio-visual features and multiple classifiers and to present a novel music and photo mood classification reference set for evaluation. The classification framework is the basis for different applications e.g. automatic media tagging or music slideshow players. The novel reference set can be used for comparison of different algorithms from various research groups. Finally, the results of the introduced framework are presented, discussed and conclusions for future steps are drawn.
multimedia information retrieval | 2010
Stefanie Nowak; Hanna M. Lukashevich; Peter Dunker; Stefan M. Rüger
With the steadily increasing amount of multimedia documents on the web and at home, the need for reliable semantic indexing methods that assign multiple keywords to a document grows. The performance of existing approaches is often measured with standard evaluation measures of the information retrieval community. In a case study on image annotation, we show the behaviour of 13 different evaluation measures and point out their strengths and weaknesses. For the analysis, data from 19 research groups that participated in the ImageCLEF Photo Annotation Task are utilized together with several configurations based on random numbers. A recently proposed ontology-based measure was investigated that incorporates structure information, relationships from the ontology and the agreement between annotators for a concept and compared to a hierarchical variant. The results for the hierarchical measure are not competitive. The ontology-based results assign good scores to the systems that got also good ranks in the other measures like the example-based F-measure. For concept-based evaluation, stable results could be obtained for MAP concerning random numbers and the number of annotated labels. The AUC measure shows good evaluation characteristics in case all annotations contain confidence values.
international conference on multimedia and expo | 2009
Hanna M. Lukashevich; Stefanie Nowak; Peter Dunker
Supervised learning requires adequately labeled training data. In this paper, we present an approach for automatic detection of outliers in image training sets using an one-class Support Vector Machine (SVM). The image sets were downloaded from photo communities solely based on their tags. We conducted four experiments to investigate if the one-class SVM can automatically differentiate between target and outlier images. As testing setup, we chose four image categories, namely Snow & Skiing, Family & Friends, Architecture & Buildings and Beach. Our experiments show that for all tests a significant tendency to remove the outliers and retain the target images is present. This offers a great possibility to gather big data sets from the web without the need for a manual review of the images.
content based multimedia indexing | 2009
Peter Dunker; Christian Dittmar; Andre Begau; Stefanie Nowak; Matthias Gruhne
This paper describes a technical solution for automated slideshow generation by extracting a set of high-level features from music, such as beat grid, mood and genre and intelligently combining this set with image high-level features, such as mood, daytime- and scene classification. An advantage of this high-level concept is to enable the user to incorporate his preferences regarding the semantic aspects of music and images. For example, the user might request the system to automatically create a slideshow, which plays soft music and shows pictures with sunsets from the last 10 years of his own photo collection.The high-level feature extraction on both, the audio and the visual information is based on the same underlying machine learning core, which processes different audio- and visual- low- and mid-level features. This paper describes the technical realization and evaluation of the algorithms with suitable test databases.
conference on image and video retrieval | 2010
Stefanie Nowak; Ainhoa Llorente; Enrico Motta; Stefan M. Rüger
In this paper, we explore different ways of formulating new evaluation measures for multi-label image classification when the vocabulary of the collection adopts the hierarchical structure of an ontology. We apply several semantic relatedness measures based on web-search engines, WordNet, Wikipedia and Flickr to the ontology-based score (OS) proposed in [22]. The final objective is to assess the benefit of integrating semantic distances to the OS measure. Hence, we have evaluated them in a real case scenario: the results (73 runs) provided by 19 research teams during their participation in the ImageCLEF 2009 Photo Annotation Task. Two experiments were conducted with a view to understand what aspect of the annotation behaviour is more effectively captured by each measure. First, we establish a comparison of system rankings brought about by different evaluation measures. This is done by computing the Kendall τ and Kolmogorov-Smirnov correlation between the ranking of pairs of them. Second, we investigate how stable the different measures react to artificially introduced noise in the ground truth. We conclude that the distributional measures based on image information sources show a promising behaviour in terms of ranking and stability.
international conference on multimedia retrieval | 2011
Stefanie Nowak; Ronny Paduschek; Uwe Kühhirt
This work presents a showcase on automated selection of photos from a digital photo collection. The Photo Summary technology considers content-based information and photo metadata to determine the most relevant photos in a given collection. The key contribution is the rating scheme for relevance which is based on criteria such as the diversity of photos, the importance of the photo motifs, the technical quality and aesthetics of photos and the interdependence of photos concerning the represented events. The summarization system further considers user preferences and visualizes the selected photos as event staples.
Multimedia Tools and Applications | 2013
Tobias Schwarze; Thomas Riegel; Seunghan Han; Andreas Hutter; Stefanie Nowak; Sascha Ebel; Christian Petersohn; Patrick Ndjiki-Nya
Semantic queries involving image understanding aspects require the exploitation of multiple clues, namely the (inter-) relations between objects and events across multiple images, the situational context, and the application context. A prominent example for such queries is the identification of individuals in video sequences. Straightforward face recognition approaches require a model of the persons in question and tend to fail in ill-conditioned environments. Therefore, an alternative approach is to involve contextual conditions of observations in order to determine the role a person plays in the current context. Due to the strong relation between roles, persons and their identities, knowing either often allows inferring about the other. This paper presents a system that implements this approach: First, robust face detection localizes the actors in the video. By clustering similar face instances the relative frequency of their appearance within a sequence is determined. In combination with a coarse textual annotation manually created by the broadcast station’s archivist the roles and consequently the identities can be assigned and labeled in the video. Starting with unambiguous assignments and cascading, most of the persons can be identified and labeled successfully. The feasibility and performance of the role-based person identification is demonstrated on the basis of several programs of a popular German TV show, which consists of various elements like interview scenes, games and musical show acts.
european conference on interactive tv | 2010
Cornelia Lanz; Stefanie Nowak; Uwe Kühhirt
Tags are a useful tool for film search and recommendation because they contain valuable information in a compressed form. However, manual tagging is time-consuming and needs the involvement of many people. We are working on automated cross-modal video classification algorithms based on signal processing. These algorithms assign subcategories to film scenes. The subcategories are useful to cluster scenes in different groups and therefore can also be used as tags. In this paper, we point out the advantages of tags generated by automated video classification and present the determination of five possible categories with their associated subcategories. The determination was obtained with five group studies to ensure that the categories fit the way humans classify. Because they are based on the audiovisual content and do not include additional external information like name of director or year of production, they are suited for automated classification as well. The final categories are called Dynamic, Valence, Interaction, Suspense and Essential Features. They contain between two and nine subcategories.