Is this you? Create Your Porfile

Jan Richarz

Technical University of Dortmund

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jan Richarz is active.

Explore More

Publication

Featured researches published by Jan Richarz.

Pattern Recognition | 2014

Semi-supervised learning for character recognition in historical archive documents

Jan Richarz; Szilárd Vajda; Rene Grzeszick; Gernot A. Fink

Training recognizers for handwritten characters is still a very time consuming task involving tremendous amounts of manual annotations by experts. In this paper we present semi-supervised labeling strategies that are able to considerably reduce the human effort. We propose two different methods to label and later recognize characters in collections of historical archive documents. The first one is based on clustering of different feature representations and the second one incorporates a simultaneous retrieval on different representations. Hence, both approaches are based on multi-view learning and later apply a voting procedure for reliably propagating annotations to unlabeled data. We evaluate our methods on the MNIST database of handwritten digits and introduce a realistic application in form of a database of handwritten historical weather reports. The experiments show that our method is able to significantly reduce the human effort that is required to build a character recognizer for the data collection considered while still achieving recognition rates that are close to a supervised classification experiment. HighlightsWe present semi-supervised labeling strategies that are able to considerably reduce the human effort.Two different methods to label and later recognize characters in collections of historical archive documents are proposed.A realistic application dealing with handwritten historical weather reports is introduced.Both methods are evaluated on the MNIST database of handwritten digits and the historical weather reports.

international conference on multimodal interfaces | 2009

Multi-modal and multi-camera attention in smart environments

Boris Schauerte; Jan Richarz; Thomas Plötz; Christian Thurau; Gernot A. Fink

This paper considers the problem of multi-modal saliency and attention. Saliency is a cue that is often used for directing attention of a computer vision system, e.g., in smart environments or for robots. Unlike the majority of recent publications on visual/audio saliency, we aim at a well grounded integration of several modalities. The proposed framework is based on fuzzy aggregations and offers a flexible, plausible, and efficient way for combining multi-modal saliency information. Besides incorporating different modalities, we extend classical 2D saliency maps to multi-camera and multi-modal 3D saliency spaces. For experimental validation we realized the proposed system within a smart environment. The evaluation took place for a demanding setup under real-life conditions, including focus of attention selection for multiple subjects and concurrently active modalities.

intelligent robots and systems | 2010

Saliency-based identification and recognition of pointed-at objects

Boris Schauerte; Jan Richarz; Gernot A. Fink

When persons interact, non-verbal cues are used to direct the attention of persons towards objects of interest. Achieving joint attention this way is an important aspect of natural communication. Most importantly, it allows to couple verbal descriptions with the visual appearance of objects, if the referred-to object is non-verbally indicated. In this contribution, we present a system that utilizes bottom-up saliency and pointing gestures to efficiently identify pointed-at objects. Furthermore, the system focuses the visual attention by steering a pan-tilt-zoom camera towards the object of interest and thus provides a suitable model-view for SIFT-based recognition and learning. We demonstrate the practical applicability of the proposed system through experimental evaluation in different environments with multiple pointers and objects.

ambient intelligence | 2011

Visual recognition of 3D emblematic gestures in an HMM framework

Jan Richarz; Gernot A. Fink

In human-machine interaction, gestures play an important role as input modality for natural and intuitive interfaces. While the usage of sign languages or crafted command gestures typically requires special user training, the class of gestural actions called “emblems” represents more intuitive yet expressive signs that seem well suited for the task. Following this, an approach for the visual recognition of 3D emblematic arm gestures in a realistic smart room scenario is presented. Hand and head positions are extracted in multiple unsynchronized monocular camera streams, combined to spatiotemporal 3D gesture trajectories and classified in an Hidden Markov Model (HMM) classification and detection framework. The contributions within this article are threefold: Firstly, a solution for the 3D combination of trajectories obtained from unsynchronized cameras and with varying frame rates is proposed. Secondly, the suitability of different alternative feature representations derived from a hand trajectory is assessed, and it is shown that intuitive gestures can be represented by projection on their principal plane of motion. Thirdly, it is demonstrated that a rejection model for gesture spotting and segmentation can be constructed using out-of-domain data. The approach is evaluated on a challenging realistic data set.

international conference on pattern recognition | 2008

Real-time detection and interpretation of 3D deictic gestures for interactionwith an intelligent environment

Jan Richarz; Thomas Plötz; Gernot A. Fink

We present a system that enables pointing-based unconstrained interaction with a smart conference room using an arbitrary multicamera setup. For each individual camera stream, areas exhibiting strong motion are identified. In these areas, face and hand hypotheses are detected. The detections of multiple cameras are then combined to 3D hypotheses from which deictic gestures are identified and a pointing direction is derived. This is then used to identify objects in the scene. Since we use a combination of simple yet effective techniques, the system runs in real-time and is very responsive. We present evaluation results on realistic data that show the capabilities of the presented approach.

document analysis systems | 2012

Towards Semi-supervised Transcription of Handwritten Historical Weather Reports

Jan Richarz; Szil´rd Vajda; Gernot A. Fink

This paper addresses the automatic transcription of handwritten documents with a regular tabular structure. A method for extracting machine printed tables from images is proposed, using very little prior knowledge about the document layout. The detected table serves as query for retrieving and fitting a structural template, which is then used to extract handwritten text fields. A semi-supervised learning approach is applied to this fields, aiming at minimizing the human labeling effort for recognizer training. The effectiveness of the proposed approach is demonstrated experimentally on a set of historical weather reports. Compared to using all labels, competitive recognition performance is achieved by labeling only a small fraction of the data, keeping the required human effort very low.

international conference on frontiers in handwriting recognition | 2012

Annotating handwritten characters with minimal human involvement in a semi-supervised learning strategy

Jan Richarz; Szilárd Vajda; Gernot A. Fink

One obstacle in the automatic analysis of handwritten documents is the huge amount of labeled data typically needed for classifier training. This is especially true when the document scans are of bad quality and different writers and writing styles have to be covered. Consequently, the considerable human effort required in the process currently prohibits the automatic transcription of large document collections. In this paper, two semi-supervised multiview learning approaches are presented, reducing the manual burden by robustly deriving a large number of labels from relatively few manual annotations. The first is based on cluster-level annotation followed by a majority decision, whereas the second casts the labeling process as a retrieval task and derives labels by voting among ranked lists. Both methods are thoroughly evaluated in a handwritten character recognition scenario using realistic document data. It is demonstrated that competitive recognition performance can be maintained by labeling only a fraction of the data.

Pattern Recognition and Image Analysis | 2008

Robust hand detection in still video images using a combination of salient regions and color cues for interaction with an intelligent environment

Thomas Plötz; Jan Richarz; Gernot A. Fink

The “intelligence” of an intelligent environment is not only influenced by the functionality it offers, but also largely by the naturalness and intuitiveness of its interaction modes. A very important natural interaction mode is gestures, as long as the environment’s interface poses no strict constraints on how the gestures may be performed. Since gestures are generally defined by hand/arm poses and motions, an important prerequisite to the recognition of unconstrained gestures is the robust detection of hands in video images. However, due to the strongly articulated nature of hands and the challenges given by a realistic (i.e., not strictly controlled) environment, this is a very challenging task, because it means hands need to be found in almost arbitrary configurations and under strongly varying lighting conditions. In this article, we present an approach to hand detection in the context of an intelligent house using a fusion of structural cues and color information. We first describe our detection algorithm using scale-invariant salient region features, combined with an efficient region-based filtering approach to reduce the number of false positives. The results are fused with the output of a skin color classifier. A detailed experimental evaluation on realistic data, including different cue fusing schemes, is presented. By means of an experimental evaluation on a challenging task, we demonstrate that, although each of the two different feature types (image structure and color) has drawbacks, their combination yields promising results for robust hand detection.

international conference on pattern recognition | 2010