Lambertus Schomaker
University of Groningen
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Lambertus Schomaker.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2008
T. van der Zant; Lambertus Schomaker; K. Haak
For quick access to new handwritten collections, current handwriting recognition methods are too cumbersome. They cannot deal with the lack of labeled data and would require extensive laboratory training for each individual script, style, language, and collection. We propose a biologically inspired whole-word recognition method that is used to incrementally elicit word labels in a live Web-based annotation system, named Monk. Since human labor should be minimized given the massive amount of image data, it becomes important to rely on robust perceptual mechanisms in the machine. Recent computational models of the neurophysiology of vision are applied to isolated word classification. A primate cortex-like mechanism allows us to classify text images that have a low frequency of occurrence. Typically, these images are the most difficult to retrieve and often contain named entities and are regarded as the most important to people. Usually, standard pattern-recognition technology cannot deal with these text images if there are not enough labeled instances. The results of this retrieval system are compared to normalized word-image matching and appear to be very promising.
international conference on document analysis and recognition | 2007
Marius Bulacu; R. van Koert; Lambertus Schomaker; T. van der Zant
In this paper, we describe the structure and the performance of a layout analysis system developed for processing the handwritten documents contained in a large historical collection of very high importance in the Netherlands. We introduce a method based on contour tracing that generates curvilinear separation paths between text lines in order to preserve the ascenders and descenders. Our methods are relevant to research on digitization and retrieval of handwritten historical documents.
international conference on document analysis and recognition | 2013
Olarik Surinta; Lambertus Schomaker; Marco Wiering
We propose a novel handwritten character recognition method for isolated handwritten Bangla digits. A feature is introduced for such patterns, the contour angular technique. It is compared to other methods, such as the hotspot feature, the gray-level normalized character image and a basic low-resolution pixel-based method. One of the goals of this study is to explore performance differences between dedicated feature methods and the pixel-based methods. The four methods are compared with support vector machine (SVM) classifiers on the collection of handwritten Bangla digit images. The results show that the fast contour angular technique outperforms the other techniques when not very many training examples are used. The fast contour angular technique captures aspects of curvature of the handwritten image and results in much faster character classification than the gray pixel-based method. Still, this feature obtains a similar recognition compared to the gray pixel-based method when a large training set is used. In order to investigate further whether the different feature methods represent complementary aspects of shape, the effect of majority voting is explored. The results indicate that the majority voting method achieves the best recognition performance on this dataset.
international symposium on neural networks | 2015
Amirhosein Shantia; Rik Timmers; Lambertus Schomaker; Marco Wiering
Robotic mapping and localization methods are mostly dominated by using a combination of spatial alignment of sensory inputs, loop closure detection, and a global fine-tuning step. This requires either expensive depth sensing systems, or fast computational hardware at run-time to produce a 2D or 3D map of the environment. In a similar context, deep neural networks are used extensively in scene recognition applications, but are not yet applied to localization and mapping problems. In this paper, we adopt a novel approach by using denoising autoencoders and image information for tackling robot localization problems. We use semi-supervised learning with location values that are provided by traditional mapping methods. After training, our method requires much less run-time computations, and therefore can perform real-time localization on normal processing units. We compare the effects of different feature vectors such as plain images, the scale invariant feature transform and histograms of oriented gradients on the localization precision. The best system can localize with an average positional error of ten centimeters and an angular error of four degrees in 3D simulation.
international conference on frontiers in handwriting recognition | 2014
Sheng He; Petros Samara; J.W.J. Burgers; Lambertus Schomaker
Estimating the date of undated medieval manuscripts by evaluating the script they contain, using document image analysis, is helpful for scholars of various disciplines studying the Middle Ages. However, there are, as yet, no systems to automatically and effectively infer the age of historical scripts using machine learning methods. To build a system to date medieval documents is a challenging problem in several aspects: (1) As yet, no suitable reference dataset of medieval handwriting exists, (2) relatively little is known about the evolution of writing styles in the Middle Ages, and especially in the later Middle Ages. Our Medieval Paleographic Scale (MPS) project aims at solving these problems. We have collected a corpus of charters from the Medieval Dutch language area, dating from the period 1300 to 1550. A global and local regression method is proposed for learning and estimating the year in which these documents were written, using several features which have been successfully used in writer identification. The proposed system can serve as a basic tool for the medievalist or paleographer. The experimental results of the proposed method demonstrate its effectiveness.
international conference on pattern recognition | 2014
Sheng He; Lambertus Schomaker
This paper presents a method for extracting rotation-invariant features from images of handwriting samples that can be used to perform writer identification. The proposed features are based on the Hinge feature [1], but incorporating the derivative between several points along the ink contours. Finally, we concatenate the proposed features into one feature vector to characterize the writing styles of the given handwritten text. The proposed method has been evaluated using Fire maker and IAM datasets in writer identification, showing promising performance gains.
international conference on frontiers in handwriting recognition | 2014
Jean-Paul van Oosten; Lambertus Schomaker
Hidden Markov models are frequently used in handwriting-recognition applications. While a large number of methodological variants have been developed to accommodate different use cases, the core concepts have not been changed much. In this paper, we develop a number of datasets to benchmark our own implementation as well as various other tool kits. We introduce a gradual scale of difficulty that allows comparison of datasets in terms of separability of classes. Two experiments are performed to review the basic HMM functions, especially aimed at evaluating the role of the transition probability matrix. We found that the transition matrix may be far less important than the observation probabilities. Furthermore, the traditional training methods are not always able to find the proper (true) topology of the transition matrix. These findings support the view that the quality of the features may require more attention than the aspect of temporal modelling addressed by HMMs.
Pattern Recognition | 2017
Sheng He; Lambertus Schomaker
Feature engineering takes a very important role in writer identification which has been widely studied in the literature. Previous works have shown that the joint feature distribution of two properties can improve the performance. The joint feature distribution makes feature relationships explicit instead of roping that a trained classifier picks up a non-linear relation present in the data. In this paper, we propose two novel and curvature-free features: run-lengths of local binary pattern (LBPruns) and cloud of line distribution (COLD) features for writer identification. The LBPruns is the joint distribution of the traditional run-length and local binary pattern (LBP) methods, which computes the run-lengths of local binary patterns on both binarized and gray scale images. The COLD feature is the joint distribution of the relation between orientation and length of line segments obtained from writing contours in handwritten documents. Our proposed LBPruns and COLD are textural-based curvature-free features and capture the line information of handwritten texts instead of the curvature information. The combination of the LBPruns and COLD features provides a significant improvement on the CERUG data set, handwritten documents on which contain a large number of irregular-curvature strokes. The results of proposed features evaluated on other two widely used data sets (Firemaker and IAM) demonstrate promising results.
Pattern Recognition | 2016
Sheng He; Petros Samara; J.W.J. Burgers; Lambertus Schomaker
Historical manuscript dating has always been an important challenge for historians but since countless manuscripts have become digitally available recently, the pattern recognition community has started addressing the dating problem as well. In this paper, we present a family of local contour fragments (kCF) and stroke fragments (kSF) features and study their application to historical document dating. kCF are formed by a number of k primary contour fragments segmented from the connected component contours of handwritten texts and kSF are formed by a segment of length k of a stroke fragment graph. The kCF and kSF are described by scale and rotation invariant descriptors and encoded into trained codebooks inspired by classical bag of words model. We evaluate our methods on the Medieval Paleographical Scale (MPS) data set and perform dating by writer identification and classification. As far as dating by writer identification is concerned, we arrive at the conclusion that features which perform well for writer identification are not necessarily suitable for historical document dating. Experimental results of dating by classification demonstrate that a combination of kCF and kSF achieves optimal results, with a mean absolute error of 14.9years when excluding writer duplicates in training and 7.9years when including writer duplicates in training. HighlightsA new image-based historical manuscript dating problem is proposed.We present a family of local contour fragments and stroke fragments features.Historical manuscript dating is performed by writer identification and classification.
international conference on pattern recognition applications and methods | 2012
Olarik Surinta; Lambertus Schomaker; Marco Wiering
Feature extraction techniques can be important in character recognition, because they can enhance the efficacy of recognition in comparison to featureless or pixel-based approaches. This study aims to investigate the novel feature extraction technique called the hotspot technique in order to use it for representing handwritten characters and digits. In the hotspot technique, the distance values between the closest black pixels and the hotspots in each direction are used as representation for a character. The hotspot technique is applied to three data sets including Thai handwritten characters (65 classes), Bangla numeric (10 classes), and MNIST (10 classes). The hotspot technique consists of two parameters including the number of hotspots and the number of chain code directions. The data sets are then classified by the k-Nearest Neighbors algorithm using the Euclidean distance as function for computing distances between data points. In this study, the classification rates obtained from the hotspot, mark direction, and direction of chain code techniques are compared. The results revealed that the hotspot technique provides the largest average classification rates.