Pascal Kelm
Technical University of Berlin
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Pascal Kelm.
workshop on image analysis for multimedia interactive services | 2009
Pascal Kelm; Sebastian Schmiedeke; Thomas Sikora
We present an approach to key frame extraction for structuring user generated videos on video sharing websites (e. g. YouTube). Our approach is intended to link existing image search engines to video data. User generated videos are, contrary to professional material, unstructured, do not follow any fixed rule, and their camera work is poor. Furthermore, the coding quality is bad due to low resolution and high compression. In a first step, we segment video sequences into shots by detecting gradual and abrupt cuts. Further, longer shots are segmented into subshots based on location and camera motion features. One representative key frame is extracted per subshot using visual attention features, such as lighting, camera motion, face, and text appearance. These key frames are useful for indexing and for searching similar video sequences using MPEG-7 descriptors [1].
international conference on multimedia retrieval | 2011
Pascal Kelm; Sebastian Schmiedeke; Thomas Sikora
We present three approaches for placing videos in Flickr on the world map. The toponym extraction and geo lookup approach makes use of external resources to identify toponyms in the metadata and associate them with geo-coordinates. The metadata-based region model approach uses a k-nearest-neighbour classifier trained over geographical regions. Videos are represented using their metadata in a text space with reduced dimensionality. The visual region model approach uses a support vector machine also trained over geographical regions. Videos are represented using low-level feature vectors from multiple key frames. Voting methods are used to form a single decision for each video. We compare the approaches experimentally, highlighting the importance of using appropriate metadata features and suitable regions as the basis of the region model. The best performance is achieved by the geo-lookup approach used with fallback to the visual region model when the video metadata contains no toponym.
acm multimedia | 2012
Luke R. Gottlieb; Jaeyoung Choi; Pascal Kelm; Thomas Sikora; Gerald Friedland
In this article we review the methods we have developed for finding Mechanical Turk participants for the manual annotation of the geo-location of random videos from the web. We require high quality annotations for this project, as we are attempting to establish a human baseline for future comparison to machine systems. This task is different from a standard Mechanical Turk task in that it is difficult for both humans and machines, whereas a standard Mechanical Turk task is usually easy for humans and difficult or impossible for machines. This article discusses the varied difficulties we encountered while qualifying annotators and the steps that we took to select the individuals most likely to do well at our annotation task in the future.
Multimodal Location Estimation of Videos and Images | 2015
Martha Larson; Pascal Kelm; Adam Rae; Claudia Hauff; Bart Thomee; Michele Trevisiol; Jaeyoung Choi; Olivier Van Laere; Steven Schockaert; Gareth J. F. Jones; Pavel Serdyukov; Vanessa Murdock; Gerald Friedland
Benchmarks have the power to bring research communities together to focus on specific research challenges. They drive research forward by making it easier to systematically compare and contrast new solutions, and evaluate their performance with respect to the existing state of the art. In this chapter, we present a retrospective on the Placing Task, a yearly challenge offered by the MediaEval Multimedia Benchmark. The Placing Task, launched in 2010, is a benchmarking task that requires participants to develop algorithms that automatically predict the geolocation of social multimedia (videos and images). This chapter covers the editions of the Placing Task offered in 2010–2013, and also presents an outlook onto 2014. We present the formulation of the task and the task dataset for each year, tracing the design decisions that were made by the organizers, and how each year built on the previous year. Finally, we provide a summary of future directions and challenges for multimodal geolocation, and concluding remarks on how benchmarking has catalyzed research progress in the research area of geolocation prediction for social multimedia.
international conference on multimedia retrieval | 2012
Sebastian Schmiedeke; Pascal Kelm; Thomas Sikora
This paper describes the possibilities of cross-modal classification of multimedia documents in social media platforms. Our framework predicts the user-chosen category of consumer-produced video sequences based on their textual and visual features. These text resources---includes metadata and automatic speech recognition transcripts---are represented as bags of words and the video content is represented as a bag of clustered local visual features. The contribution of the different modalities is investigated and how they should be combined if sequences lack certain resources. Therefore, several classification methods are evaluated, varying the resources. The paper shows an approach that achieves a mean average precision of 0.3977 using user-contributed metadata in combination with clustered SURF.
international symposium on consumer electronics | 2008
Ronald Glasberg; Sebastian Schmiedeke; Pascal Kelm; Thomas Sikora
We present a new approach for classifying MPEG-2 video sequences as dasiacartoonpsila, dasiacommercialpsila, dasiamusicpsila, dasianewspsila or dasiasportpsila by analyzing specific, high-level audio-visual features of consecutive frames in real-time. This is part of the well-known video-genre-classification problem, where popular TV-broadcast genres are studied. Such applications have also been discussed in the context of MPEG-7 [1]. In our method the extracted features are logically combined using a set of classifiers to produce a reliable recognition. The results demonstrate a high identification rate based on a large representative collection of 100 video sequences (20 sequences per genre) gathered from free digital TV-broadcasting in Europe.
IEEE Transactions on Multimedia | 2014
Luke R. Gottlieb; Gerald Friedland; Jaeyoung Choi; Pascal Kelm; Thomas Sikora
Crowdsourcing is currently used for a range of applications, either by exploiting unsolicited user-generated content, such as spontaneously annotated images, or by utilizing explicit crowdsourcing platforms such as Amazon Mechanical Turk to mass-outsource artificial-intelligence-type jobs. However, crowdsourcing is most often seen as the best option for tasks that do not require more of people than their uneducated intuition as a human being. This article describes our methods for identifying workers for crowdsourced tasks that are difficult for both machines and humans. It discusses the challenges we encountered in qualifying annotators and the steps we took to select the individuals most likely to do well at these tasks.
multimedia signal processing | 2013
Sebastian Schmiedeke; Pascal Kelm; Thomas Sikora
These days the sharing of videos is very popular in social networks. Many of these social media websites such as Flickr, Facebook and YouTube allows the user to manually label their uploaded videos with textual information. However, the manually labelling for a large set of social media is still boring and error-prone. For this reason we present a algorithm for categorisation of videos in social media platforms without decoding them. The paper shows a data-driven approach which makes use of global and local features from the compressed domain and achieves a mean average precision of 0.2498 on the Blip10k dataset. In comparison with existing retrieval approaches at the MediaEval Tagging Task 2012 we will show the effectiveness and high accuracy relative to the state-of-the art solutions.
Social media retrieval | 2013
Pascal Kelm; Vanessa Murdock; Sebastian Schmiedeke; Steven Schockaert; Pavel Serdyukov; Olivier Van Laere
Mobile devices are increasingly the primary way people access information on the Web. More and more users are carrying small computing devices with them wherever they go and can access the Internet regardless of their location. Whereas prior to the wide adoption of mobile devices, a user’s online life and offline life were partitioned by time and location, now there is less of a division. Users can carry their online friends with them and maintain a constant contact in the form of status updates and uploaded pictures and video, in a running social commentary that blurs the distinction between online and offline interactions. This has lead to a proliferation of data that connects a user’s online social network with their offline activities. As a result, the data they create provides a solid link between users, what they are doing offline, and what they are contributing online. The question is how to harness this content.
international conference on digital information management | 2008
Ronald Glasberg; Sebastian Schmiedeke; Hüseyin Oguz; Pascal Kelm; Thomas Sikora
We present a new approach for classifying MPEG-2 video sequences as dasiasportpsila or dasianon-sportpsila by analyzing new high-level audiovisual features of consecutive frames in real-time. This is part of the well-known video-genre-classification problem, where popular TV-broadcast genres like cartoon, commercial, music video, news and sports are studied. Such applications have also been discussed in the context of MPEG-7. In our method the extracted features are logically combined by a support vector machine to produce a reliable detection. The results demonstrate a high identification rate of 98.5% based on a large balanced database of 100 representative video sequences gathered from free digital TV-broadcasting and world wide web.