Thorsten Hermes
University of Bremen
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Thorsten Hermes.
Storage and Retrieval for Image and Video Databases | 1997
Jutta Kreyss; M. Roeper; Peter Alshuth; Thorsten Hermes; Otthein Herzog
The large amount of available multimedia information (e.g. videos, audio, images) requires efficient and effective annotation and retrieval methods. As videos start playing a more important role in the frame of multimedia, we want to make these available for content-based retrieval. The ImageMiner-System, which was developed at the University of Bremen in the AI group, is designed for content-based retrieval of single images by a new combination of techniques and methods from computer vision and artificial intelligence. In our approach to make videos available for retrieval in a large database of videos and images there are two necessary steps: First, the detection and extraction of shots from a video, which is done by a histogram based method and second, the construction of the separate frames in a shot to one still single images. This is performed by a mosaicing-technique. The resulting mosaiced image gives a one image visualization of the shot and can be analyzed by the ImageMiner-System. ImageMiner has been tested on several domains, (e.g. landscape images, technical drawings), which cover a wide range of applications.
international conference on document analysis and recognition | 2001
Andrea Miene; Thorsten Hermes; G. Ioannidis; A. Christoffers
Textual inserts and closed captures superimposed on digital videos often contain important and exclusive information about the video contents which cannot be found in other information channels. Therefore, it is very helpful to extract this information automatically and add it to a video index as generated by video archiving and retrieval systems like e.g. ADViSOR, AVAnTA, DiVA or Informedia. Owing to the fact that common OCR systems are restricted to binary images, the video frames have to be preprocessed in order to extract the textual inserts from the image in the background. In this paper we present our approach to the segmentation of textual inserts from digital videos or images, which consists of a region-growing method for color segmentation and a method of separating text regions from background based on character size and alignment constraints. A new method on segmentation refinement taking into account the results of the classification step leads to a significant enhancement of quality of the resulting binary images. The main difficulties in extracting textual inserts from video are caused by the low resolution and quality of digital video material, the high amount of image data, the very complex structured and textured background, and the unknown color size, and position of the text to be extracted from the image.
international conference on multimedia and expo | 2005
Jean−Pierre Schober; Thorsten Hermes; Otthein Herzog
Large amount of images need an efficient way of retrieving them. The usual approach of manually annotating images and/or providing a syntactic retrieval capability lacks flexibility and comfort. The automatic annotation of images is a main target of the image retrieval community. These so-called content-based image retrieval (CBIR) systems focus on primitive features, as Eakins and Graham (1999) name them. Description logics (DL) offer a useful contribution to content-based image retrieval while allowing logical reasoning about the semantic contents of the image and ending with consistent classification results. This is a main advantage about traditional classification algorithms. Another advantage is the possibility to use domain knowledge, which is formulated in DL, on the retrieval side, thus offering a semantic retrieval. In this paper, we present an approach and the results of adopting a DL for classifying image regions
Storage and Retrieval for Image and Video Databases | 1997
Peter Alshuth; Thorsten Hermes; Lutz Voigt; Otthein Herzog
In this paper videos are analyzed to get a content-based description of the video. The structure of a given video is useful to index long videos efficiently and automatically. A comparison between shots gives an overview about cut frequency, cut pattern, and scene bounds. After a shot detection the shots are grouped into clusters based on their visual similarity. A time-constraint clustering procedure is used to compare only those shots that are positioned inside a time range. Shots from different areas of the video (e.g., begin/end) are not compared. With this cluster information that contains a list about shots and their clusters it is possible to calculate scene bounds. A labeling of all clusters gives a declaration about the cut pattern. It is easy now to distinguish a dialogue from an action scene. The final content analysis is done by the ImageMinerTM system. The ImageMiner system developed at the University of Bremen of the Image Processing Department of the Center for Computing Technology realizes content-based image retrieval for still images through a novel combination of methods and techniques of computer vision and artificial intelligence. The ImageMiner system consists of three analysis modules for computer vision, namely for color, texture, and contour analysis. Additionally exists a module for object recognition. The output of the object recognition module can be indexed by a text retrieval system. Thus, concepts like forestscene may be searched for. We combine the still image analysis with the results of the video analysis in order to retrieve shots or scenes.
Multimedia Tools and Applications | 2005
Thorsten Hermes; Andrea Miene; Otthein Herzog
The text searching paradigm still prevails even when users are looking for image data for example in the Internet. Searching for images mostly means searching on basis of annotations that have been made manually. When annotations are left empty, which is usually the case, searches on image file names are performed. This may lead to surprising retrieval results. The graphical search paradigm, searching image data by querying graphically, either with an image or with a sketch, currently seems not to be the preferred method partly because of the complexity in designing the query.In this paper we present our PictureFinder system, which currently supports “full image retrieval” in analogy to full text retrieval. PictureFinder allows graphical queries for the image the user has in his mind by sketching colored and/or textured regions or by whole images (query by example). By adjusting the search tolerances for each region and image feature (i.e. hue, saturation, lightness, texture pattern and coverage) the user can tune his query either to find images matching his sketch or images which differing from the specified colors and/or textures to a certain degree. To compare colors we propose a color distance measure that takes into account the fact that different colors spread differently in the color space, and which take into account that the position of a region in an image may be important.Furthermore, we show our query by example approach. Based on the example image chosen by the user, a graphical query is generated automatically and presented to the user. One major advantage of this approach is the possibility to change and adjust a query by example in the same way as a query which was sketched by the user. By deleting unimportant regions and by adjusting the tolerances of the remaining regions the user may focus on image details which are important to him.
conference of the centre for advanced studies on collaborative research | 1995
Thorsten Hermes; Christoph Klauck; Jutta Kreyß; Jianguo Zhang
In order to retrieve images it is much more sophisticated and usual for human beings to use natural language concepts, e.g. mountainlake, than syntactical features, e.g. red region left up. This leads to a content-based image retrieval. Furthermore, it is unreasonable for any human being to make the content description for 1000 of images manually.From this point of view, the project IRIS1 (Image Retrieval for Information Systems) combines well-known methods and techniques in computer vision and AI in a new way to generate content descriptions of images in a textual form automatically. The text retrieval is done by IBM SearchManager for AIX.The system is implemented on IBM2 RISC Sytem/60003 using AIX4. It has already been tested with 1200 images.
Computers & Graphics | 1998
Otthein Herzog; Andrea Miene; Thorsten Hermes; Peter Alshuth
Abstract The large amount and the ubiquitous availability of multimedia information (e.g., video, audio, image, and also text documents) require efficient, effective, and automatic annotation and retrieval methods. As videos start to play an even more important role in multimedia, content-based retrieval of videos becomes an issue, especially as there should be an integrated methodology for all types of multimedia documents. Our approach for the integrated retrieval of videos, images, and text comprises three necessary steps: First, the detection and extraction of shots from a video, second, the construction of a still image from the frames in a shot. This is achieved by an extraction of key frames or a mosaicing technique. The result is a single image visualization of a shot, which in turn can be analyzed by the ImageMiner™ 1 system. The ImageMiner system was developed in cooperation with IBM at the University of Bremen in the Image Processing Department of the Center for Computing Technologies. It realizes the content-based retrieval of single images through a novel combination of techniques and methods from computer vision and artificial intelligence. Its output is a textual description of an image, and thus in our case, of the static elements of a video shot. In this way, the annotations of a video can be indexed with standard text retrieval systems, along with text documents or annotations of other multimedia documents, thus ensuring an integrated interface for all kinds of multimedia documents.
Archive | 2010
Adalbert F. X. Wilhelm; Arne Jacobs; Thorsten Hermes
The analysis of video sequences is of primary concern in the field of mass communication. One particular topic is the study of collective visual memories and neglections as they emerged in various cultures, with trans-cultural and global elements (Ludes P., Multimedia und Multi-Moderne: Schlusselbilder, Fernsehnachrichten und World Wide Web – Medienzivilisierung in der Europaischen Wahrungsunion. Westdeutscher Verlag, Opladen 2001). The vast amount of visual data from television and web offerings make comparative studies on visual material rather complex and very expensive. A standard task in this realm is to find images that are similar to each other. Similarity is typically aimed at a conceptual level comprising both syntactic as well as semantic similarity. The use of semi-automatic picture retrieval techniques would facilitate this task. An important aspect is to combine the syntactical analysis that is usually performed automatically with the semantic level obtained from annotations or the analysis of captions or closely related text. Association rules are in particular suited to extract implicit knowledge from the data base and to make this knowledge accessible for further quantitative analysis.
2017 International Conference on Research and Education in Mechatronics (REM) | 2017
Andre Dehne; Nantwin Moller; Thorsten Hermes
MARWIN is a mobile autonomous robot platform designed for performing maintenance and inspection task in a 4D environment at the European XFEL accelerator in Hamburg, Germany. MARWIN consists of a 4-wheel driven chassis and a manipulator arm providing 3 degrees of freedom. It can be operated in a pre-configured autonomous as well as a remotely controlled mode. The primary use case of MARWIN is measuring radiation fields. For this purpose, MARWIN is equipped with a mobile Geiger-Mueller tube mounted at the tip of the manipulator arm and a stationary multi-purpose radiation detector. We describe the mechanical and electrical setup, the architecture and implementation of the controls routines, the strategy implemented to handle radiation-triggered malfunctions, the energy management, and some first experimental results.
joint pattern recognition symposium | 2004
Arne Jacobs; Thorsten Hermes; Otthein Herzog
The estimation of motion in videos yields information useful in the scope of video annotation, retrieval and compression. Current approaches use iterative minimization techniques based on intensity gradients in order to estimate the parameters of a 2D transform between successive frames. These approaches rely on good initial guesses of the motion parameters. For single or dominant motion there exist hybrid algorithms that estimate such initial parameters prior to the iterative minimization. We propose a technique for the generation of a set of motion hypotheses using blockmatching that also works in the presence of multiple non-dominant motions. These hypotheses are then refined using iterative techniques.