Jan Richarz
Technical University of Dortmund
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jan Richarz.
Pattern Recognition | 2014
Jan Richarz; Szilárd Vajda; Rene Grzeszick; Gernot A. Fink
Training recognizers for handwritten characters is still a very time consuming task involving tremendous amounts of manual annotations by experts. In this paper we present semi-supervised labeling strategies that are able to considerably reduce the human effort. We propose two different methods to label and later recognize characters in collections of historical archive documents. The first one is based on clustering of different feature representations and the second one incorporates a simultaneous retrieval on different representations. Hence, both approaches are based on multi-view learning and later apply a voting procedure for reliably propagating annotations to unlabeled data. We evaluate our methods on the MNIST database of handwritten digits and introduce a realistic application in form of a database of handwritten historical weather reports. The experiments show that our method is able to significantly reduce the human effort that is required to build a character recognizer for the data collection considered while still achieving recognition rates that are close to a supervised classification experiment. HighlightsWe present semi-supervised labeling strategies that are able to considerably reduce the human effort.Two different methods to label and later recognize characters in collections of historical archive documents are proposed.A realistic application dealing with handwritten historical weather reports is introduced.Both methods are evaluated on the MNIST database of handwritten digits and the historical weather reports.
international conference on multimodal interfaces | 2009
Boris Schauerte; Jan Richarz; Thomas Plötz; Christian Thurau; Gernot A. Fink
This paper considers the problem of multi-modal saliency and attention. Saliency is a cue that is often used for directing attention of a computer vision system, e.g., in smart environments or for robots. Unlike the majority of recent publications on visual/audio saliency, we aim at a well grounded integration of several modalities. The proposed framework is based on fuzzy aggregations and offers a flexible, plausible, and efficient way for combining multi-modal saliency information. Besides incorporating different modalities, we extend classical 2D saliency maps to multi-camera and multi-modal 3D saliency spaces. For experimental validation we realized the proposed system within a smart environment. The evaluation took place for a demanding setup under real-life conditions, including focus of attention selection for multiple subjects and concurrently active modalities.
intelligent robots and systems | 2010
Boris Schauerte; Jan Richarz; Gernot A. Fink
When persons interact, non-verbal cues are used to direct the attention of persons towards objects of interest. Achieving joint attention this way is an important aspect of natural communication. Most importantly, it allows to couple verbal descriptions with the visual appearance of objects, if the referred-to object is non-verbally indicated. In this contribution, we present a system that utilizes bottom-up saliency and pointing gestures to efficiently identify pointed-at objects. Furthermore, the system focuses the visual attention by steering a pan-tilt-zoom camera towards the object of interest and thus provides a suitable model-view for SIFT-based recognition and learning. We demonstrate the practical applicability of the proposed system through experimental evaluation in different environments with multiple pointers and objects.
ambient intelligence | 2011
Jan Richarz; Gernot A. Fink
In human-machine interaction, gestures play an important role as input modality for natural and intuitive interfaces. While the usage of sign languages or crafted command gestures typically requires special user training, the class of gestural actions called “emblems” represents more intuitive yet expressive signs that seem well suited for the task. Following this, an approach for the visual recognition of 3D emblematic arm gestures in a realistic smart room scenario is presented. Hand and head positions are extracted in multiple unsynchronized monocular camera streams, combined to spatiotemporal 3D gesture trajectories and classified in an Hidden Markov Model (HMM) classification and detection framework. The contributions within this article are threefold: Firstly, a solution for the 3D combination of trajectories obtained from unsynchronized cameras and with varying frame rates is proposed. Secondly, the suitability of different alternative feature representations derived from a hand trajectory is assessed, and it is shown that intuitive gestures can be represented by projection on their principal plane of motion. Thirdly, it is demonstrated that a rejection model for gesture spotting and segmentation can be constructed using out-of-domain data. The approach is evaluated on a challenging realistic data set.
international conference on pattern recognition | 2008
Jan Richarz; Thomas Plötz; Gernot A. Fink
We present a system that enables pointing-based unconstrained interaction with a smart conference room using an arbitrary multicamera setup. For each individual camera stream, areas exhibiting strong motion are identified. In these areas, face and hand hypotheses are detected. The detections of multiple cameras are then combined to 3D hypotheses from which deictic gestures are identified and a pointing direction is derived. This is then used to identify objects in the scene. Since we use a combination of simple yet effective techniques, the system runs in real-time and is very responsive. We present evaluation results on realistic data that show the capabilities of the presented approach.
document analysis systems | 2012
Jan Richarz; Szil´rd Vajda; Gernot A. Fink
This paper addresses the automatic transcription of handwritten documents with a regular tabular structure. A method for extracting machine printed tables from images is proposed, using very little prior knowledge about the document layout. The detected table serves as query for retrieving and fitting a structural template, which is then used to extract handwritten text fields. A semi-supervised learning approach is applied to this fields, aiming at minimizing the human labeling effort for recognizer training. The effectiveness of the proposed approach is demonstrated experimentally on a set of historical weather reports. Compared to using all labels, competitive recognition performance is achieved by labeling only a small fraction of the data, keeping the required human effort very low.
international conference on frontiers in handwriting recognition | 2012
Jan Richarz; Szilárd Vajda; Gernot A. Fink
One obstacle in the automatic analysis of handwritten documents is the huge amount of labeled data typically needed for classifier training. This is especially true when the document scans are of bad quality and different writers and writing styles have to be covered. Consequently, the considerable human effort required in the process currently prohibits the automatic transcription of large document collections. In this paper, two semi-supervised multiview learning approaches are presented, reducing the manual burden by robustly deriving a large number of labels from relatively few manual annotations. The first is based on cluster-level annotation followed by a majority decision, whereas the second casts the labeling process as a retrieval task and derives labels by voting among ranked lists. Both methods are thoroughly evaluated in a handwritten character recognition scenario using realistic document data. It is demonstrated that competitive recognition performance can be maintained by labeling only a fraction of the data.
Pattern Recognition and Image Analysis | 2008
Thomas Plötz; Jan Richarz; Gernot A. Fink
The “intelligence” of an intelligent environment is not only influenced by the functionality it offers, but also largely by the naturalness and intuitiveness of its interaction modes. A very important natural interaction mode is gestures, as long as the environment’s interface poses no strict constraints on how the gestures may be performed. Since gestures are generally defined by hand/arm poses and motions, an important prerequisite to the recognition of unconstrained gestures is the robust detection of hands in video images. However, due to the strongly articulated nature of hands and the challenges given by a realistic (i.e., not strictly controlled) environment, this is a very challenging task, because it means hands need to be found in almost arbitrary configurations and under strongly varying lighting conditions. In this article, we present an approach to hand detection in the context of an intelligent house using a fusion of structural cues and color information. We first describe our detection algorithm using scale-invariant salient region features, combined with an efficient region-based filtering approach to reduce the number of false positives. The results are fused with the output of a skin color classifier. A detailed experimental evaluation on realistic data, including different cue fusing schemes, is presented. By means of an experimental evaluation on a challenging task, we demonstrate that, although each of the two different feature types (image structure and color) has drawbacks, their combination yields promising results for robust hand detection.
international conference on pattern recognition | 2010
Jan Richarz; Gernot A. Fink
HBU'10 Proceedings of the First international conference on Human behavior understanding | 2010
Jan Richarz; Gernot A. Fink