Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ashok C. Popat is active.

Publication


Featured researches published by Ashok C. Popat.


Proceedings of the 3rd International Workshop on Historical Document Imaging and Processing | 2015

Publication Date Estimation for Printed Historical Documents using Convolutional Neural Networks

Yuanpeng Li; Dmitriy Genzel; Yasuhisa Fujii; Ashok C. Popat

This paper describes an approach to estimating the unknown publication date for printed historical documents from their scanned page images, using Convolutional Neural Networks (CNN). The method primarily harnesses visual features from small image patches. Optionally, we augment the feature set with textual Optical Character Recognition (OCR) result features to improve accuracy, though at greater preprocessing cost. To be applied in various tasks, we develop both classification and regression models. As an example application, we show that OCR can be improved if we use estimated publication date to select the appropriate OCR model. Moreover, the resulting improvement in OCR accuracy is close to what could be achieved knowing the true publication date. We are not aware of previous work in estimating publication dates for printed historical documents with visual features.


international conference on document analysis and recognition | 2011

Translation-Inspired OCR

Dmitriy Genzel; Ashok C. Popat; Nemanja L. Spasojevic; Michael Jahr; Andrew W. Senior; Eugene Ie; Frank Yung-Fong Tang

Optical character recognition is carried out using techniques borrowed from statistical machine translation. In particular, the use of multiple simple feature functions in linear combination, along with minimum-error-rate training, integrated decoding, and


international conference on document analysis and recognition | 2015

Label transition and selection pruning and automatic decoding parameter optimization for time-synchronous Viterbi decoding

Yasuhisa Fujii; Dmitriy Genzel; Ashok C. Popat; Remco Teunen

N


document engineering | 2009

A panlingual anomalous text detector

Ashok C. Popat

-gram language modeling is found to be remarkably effective, across several scripts and languages. Results are presented using both synthetic and real data in five languages.


empirical methods in natural language processing | 2007

Large Language Models in Machine Translation

Thorsten Brants; Ashok C. Popat; Peng Xu; Franz Josef Och; Jeffrey Dean

Hidden Markov Model (HMM)-based classifiers have been successfully used for sequential labeling problems such as speech recognition and optical character recognition for decades. They have been especially successful in the domains where the segmentation is not known or difficult to obtain, since, in principle, all possible segmentation points can be taken into account. However, the benefit comes with a non-negligible computational cost. In this paper, we propose simple yet effective new pruning algorithms to speed up decoding with HMM-based classifiers of up to 95% relative over a baseline. As the number of tunable decoding parameters increases, it becomes more difficult to optimize the parameters for each configuration. We also propose a novel technique to estimate the parameters based on a loss value without relying on a grid search.


international conference on computational linguistics | 2010

Large Scale Parallel Document Mining for Machine Translation

Jakob Uszkoreit; Jay Ponte; Ashok C. Popat; Moshe Dubiner

In a large-scale book scanning operation, material can vary widely in language, script, genre, domain, print quality, and other factors, giving rise to a corresponding variability in the OCRed text. It is often desirable to automatically detect errorful and otherwise anomalous text segments, so that they can be filtered out or appropriately flagged, for such applications as indexing, mining, analyzing, displaying, and selectively re-processing such data. Moreover, it is advantageous to require that the automated detector be independent of the underlying OCR engine (or engines), that it work over a broad range of languages, that it seamlessly handle mixed-language material, and that it accommodate documents that contain domain-specific and otherwise rare terminology. A technique is presented that satisfies these requirements, using an adaptive mixture of character-level N-gram language models. Its design, training, implementation, and evaluation are described within the context of high-volume book scanning.


Archive | 2011

Identifying matching canonical documents in response to a visual query and in accordance with geographic information

David Petrou; Ashok C. Popat; Matthew R. Casey


Archive | 2010

Identifying Matching Canonical Documents in Response to a Visual Query

David Petrou; Ashok C. Popat; Matthew R. Casey


Archive | 2014

Identifying Matching Canonical Documents Consistent with Visual Query Structural Information

David Petrou; Ashok C. Popat; Matthew R. Casey


meeting of the association for computational linguistics | 2011

Language-independent compound splitting with morphological operations

Klaus Macherey; Andrew M. Dai; David Talbot; Ashok C. Popat; Franz Josef Och

Collaboration


Dive into the Ashok C. Popat's collaboration.

Researchain Logo
Decentralizing Knowledge