Muhammad Zeshan Afzal
Kaiserslautern University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Muhammad Zeshan Afzal.
document analysis systems | 2016
Joan Pastor-Pellicer; Muhammad Zeshan Afzal; Marcus Liwicki; María José Castro-Bleda
We present a novel Convolutional Neural Network based method for the extraction of text lines, which consists of an initial Layout Analysis followed by the estimation of the Main Body Area (i.e., the text area between the baseline and the corpus line) for each text line. Finally, a region-based method using watershed transform is performed on the map of the Main Body Area for extracting the resulting lines. We have evaluated the new system on the IAM-HisDB, a publicly available dataset containing historical documents, outperforming existing learning-based text line extraction methods, which consider the problem as pixel labelling problem into text and non-text regions.
international conference on document analysis and recognition | 2015
Muhammad Zeshan Afzal; Samuele Capobianco; Muhammad Imran Malik; Simone Marinai; Thomas M. Breuel; Andreas Dengel; Marcus Liwicki
This paper presents a deep Convolutional Neural Network (CNN) based approach for document image classification. One of the main requirement of deep CNN architecture is that they need huge number of samples for training. To overcome this problem we adopt a deep CNN which is trained using big image dataset containing millions of samples i.e., ImageNet. The proposed work outperforms both the traditional structure similarity methods and the CNN based approaches proposed earlier. The accuracy of the proposed approach with merely 20 images per class outperforms the state-of-the-art by achieving classification accuracy of 68.25%. The best results on Tobbacoo-3428 dataset show that our proposed method outperforms the state-of-the-art method by a significant margin and achieved a median accuracy of 77.6% with 100 samples per class used for training and validation.
international conference on document analysis and recognition | 2015
Riaz Ahmad; Muhammad Zeshan Afzal; Sheikh Faisal Rashid; Marcus Liwicki; Thomas M. Breuel
Optical Character Recognition (OCR) of cursive scripts like Pashto and Urdu is difficult due the presence of complex ligatures and connected writing styles. In this paper, we evaluate and compare different approaches for the recognition of such complex ligatures. The approaches include Hidden Markov Model (HMM), Long Short Term Memory (LSTM) network and Scale Invariant Feature Transform (SIFT). Current state of the art in cursive script assumes constant scale without any rotation, while real world data contain rotation and scale variations. This research aims to evaluate the performance of sequence classifiers like HMM and LSTM and compare their performance with descriptor based classifier like SIFT. In addition, we also assess the performance of these methods against the scale and rotation variations in cursive script ligatures. Moreover, we introduce a database of 480,000 images containing 1000 unique ligatures or sub-words of Pashto. In this database, each ligature has 40 scale and 12 rotation variations. The evaluation results show a significantly improved performance of LSTM over HMM and traditional feature extraction technique such as SIFT.
Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing | 2013
Mayce Ibrahim Ali Al Azawi; Muhammad Zeshan Afzal; Thomas M. Breuel
Historical text presents numerous challenges for contemporary different techniques, e.g. information retrieval, OCR and POS tagging. In particular, the absence of consistent orthographic conventions in historical text presents difficulties for any system which requires reference to a fixed lexicon accessed by orthographic form. For example, language modeling or retrieval engine for historical text which is produced by OCR systems, where the spelling of words often differ in various way, e.g. one word might have different spellings evolved over time. It is very important to aid those techniques with the rules for automatic mapping of historical wordforms. In this paper, we propose a new technique to model the target modern language by means of a recurrent neural network with long-short term memory architecture. Because the network is recurrent, the considered context is not limited to a fixed size especially due to memory cells which are designed to deal with long-term dependencies. In the set of experiments conducted on the Luther bible database and transform wordforms from Early New High German (ENHG) 14th - 16th centuries to the corresponding modern wordforms in New High German (NHG). We compare our proposed supervised model LSTM to various methods for computing word alignments using statistical, heuristic models. Our new proposed LSTM outperforms the other three state-of-the-art methods. The evaluation shows the accuracy of our model on the known wordforms is 93.90% and on the unknown wordforms is 87.95%, while the accuracy of the existing state-of-the-art combined approach of the wordlist-based and rule-based normalization models is 92.93% for known and 76.88% for unknown tokens. Our proposed LSTM model outperforms on normalizing the modern wordform to historical wordform. The performance on seen tokens is 93.4%, while for unknown tokens is 89.17%.
Proceedings of the 3rd International Workshop on Historical Document Imaging and Processing | 2015
Muhammad Zeshan Afzal; Joan Pastor-Pellicer; Faisal Shafait; Thomas M. Breuel; Andreas Dengel; Marcus Liwicki
We propose to address the problem of Document Image Binarization (DIB) using Long Short-Term Memory (LSTM) which is specialized in processing very long sequences. Thus, the image is considered as a 2D sequence of pixels and in accordance to this a 2D LSTM is employed for the classification of each pixel as text or background. The proposed approach processes the information using local context and then propagates the information globally in order to achieve better visual coherence. The method is robust against most of the document artifacts. We show that with a very simple network without any feature extraction and with limited amount of data the proposed approach works reasonably well for the DIBCO 2013 dataset. Furthermore a synthetic dataset is considered to measure the performance of the proposed approach with both binarization and OCR groundtruth. The proposed approach significantly outperforms standard binarization approaches both for F-Measure and OCR accuracy with the availability of enough training samples.
Revised Selected Papers of the International Workshop on Camera-Based Document Analysis and Recognition - Volume 8357 | 2013
Muhammad Zeshan Afzal; Martin Krämer; Syed Saqib Bukhari; Mohammad Reza Yousefi; Faisal Shafait; Thomas M. Breuel
Camera captured documents can be a difficult case for standard binarization algorithms. These algorithms are specifically tailored to the requirements of scanned documents which in general have uniform illumination and high resolution with negligible geometric artifacts. Contrary to this, camera captured images generally are low resolution, contain non-uniform illumination and also posses geometric artifacts. The most important artifact is the defocused or blurred text which is the result of the limited depth of field of the general purpose hand-held capturing devices. These artifacts could be reduced with controlled capture with a single camera but it is inevitable for the case of stereo document images even with the orthoparallel camera setup. Existing methods for binarization require tuning for the parameters separately both for the left and the right images of a stereo pair. In this paper, an approach for binarization based on the local adaptive background estimation using percentile filter has been presented. The presented approach works reasonably well under the same set of parameters for both left and right images. It also shows competitive results for monocular images in comparison with standard binarization methods.
document analysis systems | 2012
Muhammad Zeshan Afzal; Martin Krämer; Syed Saqib Bukhari; Faisal Shafait; Thomas M. Breuel
Document images prove to be a difficult case for standard stereo correspondence approaches. One of the major problem is that document images are highly self-similar. Most algorithms try to tackle this problem by incorporating a global optimization scheme, which tends to be computationally expensive. In this paper, we show that incorporation of layout information into the matching paradigm, as a grouping entity for features, leads to better results in terms of robustness, efficiency, and ultimately in a better 3D model of the captured document, that can be used in various document restoration systems. This can be seen as a divide and conquer approach that partitions the search space into portions given by each grouping entity and then solves each of them independently. As a grouping entity text-lines are preferred over individual character blobs because it is easier to establish correspondences. Text-line extraction works reasonably well on stereo image pairs in the presence of perspective distortions. The proposed approach is highly efficient and matches obtained are more reliable. The claims are backed up by showing their practical applicability through experimental evaluations.
international conference on document analysis and recognition | 2015
Riaz Ahmad; Muhammad Zeshan Afzal; Sheikh Faisal Rashid; Marcus Liwicki; Andreas Dengel; Thomas M. Breuel
Atomic segmentation of cursive scripts into constituent characters is one of the most challenging problems in pattern recognition. To avoid segmentation in cursive script, concrete shapes are considered as recognizable units. Therefore, the objective of this work is to find out the alternate recognizable units in Pashto cursive script. These alternatives are ligatures and primary ligatures. However, we need sound statistical analysis to find the appropriate numbers of ligatures and primary ligatures in Pashto script. In this work, a corpus of 2, 313, 736 Pashto words are extracted from a large scale diversified web sources, and total of 19, 268 unique ligatures have been identified in Pashto cursive script. Analysis shows that only 7000 ligatures represent 91% portion of overall corpus of the Pashto unique words. Similarly, about 7, 681 primary ligatures are also identified which represent the basic shapes of all the ligatures.
Proceedings of the 4th International Workshop on Historical Document Imaging and Processing | 2017
Felix Trier; Muhammad Zeshan Afzal; Markus Ebbecke; Marcus Liwicki
In this paper, we present a novel approach based on convolutional neural networks (CNNs) to estimate the paper format (pixels per inch) of digitized document images. This format information is often required by commercial document analysis software. A correct estimation of format helps high-level tasks such as OCR and layout analysis. The contribution of this work is two-fold: First, it presents an algorithm for the estimation of paper formats. Second, it is the first publicly available collection of documents (aggregated from public datasets) useful as research benchmark. The collection is a mixture of modern and historical documents with a Pixel Per Inch (PPI) value range from 177 up to 711. The task is modeled as a regression task, leading to more flexible results than in a classification task (one class per format, e.g., A3, A4). For example, if an unknown format is presented to the network, it returns a useful output. Furthermore, more categories can be easily learned by curriculum learning without modifying the network structure itself. On the proposed dataset, the network is able to estimate the PPI values with only an average deviation (from the ground truth) of 14.8 PPI. On a private dataset, stemming from health insurance companies, an average deviation of 6.8 PPI points has been calculated.
arXiv: Computer Vision and Pattern Recognition | 2015
Peter Burkert; Felix Trier; Muhammad Zeshan Afzal; Andreas Dengel; Marcus Liwicki