Kengo Terasawa | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kengo Terasawa is active.

Explore More

Publication

Featured researches published by Kengo Terasawa.

international conference on document analysis and recognition | 2009

Slit Style HOG Feature for Document Image Word Spotting

Kengo Terasawa; Yuzuru Tanaka

This paper presents a word spotting method based on line-segmentation, sliding window, continuous dynamic programming, and slit style HOG feature. Our method is applicable regardless of what language is written in the manuscript because it does not require any language-dependent preprocess. The slit style HOG feature is a gradient-distribution-based feature with overlapping normalization and redundant expression, and the use of this feature improved the performance of the word spotting. We compared our method with some previously developed word spotting methods, and confirmed that our method outperforms them in both English and Japanese manuscripts.

workshop on algorithms and data structures | 2007

Spherical lsh for approximate nearest neighbor search on unit hypersphere

Kengo Terasawa; Yuzuru Tanaka

LSH (Locality Sensitive Hashing) is one of the best known methods for solving the c-approximate nearest neighbor problem in high dimensional spaces. This paper presents a variant of the LSH algorithm, focusing on the special case of where all points in the dataset lie on the surface of the unit hypersphere in a d-dimensional Euclidean space. The LSH scheme is based on a family of hash functions that preserves locality of points. This paper points out that when all points are constrained to lie on the surface of the unit hypersphere, there exist hash functions that partition the space more efficiently than the previously proposed methods. The design of these hash functions uses randomly rotated regular polytopes and it partitions the surface of the unit hypersphere like a Voronoi diagram. Our new scheme improves the exponent ρ, the main indicator of the performance of the LSH algorithm.

international conference on document analysis and recognition | 2005

Eigenspace method for text retrieval in historical document images

Kengo Terasawa; Takeshi Nagasaki; Toshio Kawashima

A new method for text retrieval that does not need segmentation is described. Segmenting the images in historical documents into individual characters is difficult. Therefore, the conventional OCR method, which uses segmentation, does not work well. Our method instead divides the text image into a sequence of small slits. The image region that corresponds to the query image region is retrieved by solving the matching problem of these sequences. Applying the eigenspace method to the slit images enables us to solve the matching problem efficiently. Moreover, using dynamic time warping (DTW) further improves the results. Our method has higher accuracy than the simple template matching method, and it has far higher efficiency in computational cost.

international conference on document analysis and recognition | 2007

Locality Sensitive Pseudo-Code for Document Images

Kengo Terasawa; Yuzuru Tanaka

In this paper, we propose a novel scheme for representing character string images in the scanned document. We converted conventional multi-dimensional descriptors into pseudo-codes which have a property that: if two vectors are near in the original space then encoded pseudo-codes are semi equivalent with high probability. For this conversion, we combined locality sensitive hashing (LSH) indices and at the same time we also developed a new family of LSH functions that is superior to earlier ones when all vectors are constrained to lie on the surface of the unit sphere. Word spotting based on our pseudo-code becomes faster than multi-dimensional descriptor-based method while it scarcely degrades the accuracy.

document analysis systems | 2006

Automatic keyword extraction from historical document images

Kengo Terasawa; Takeshi Nagasaki; Toshio Kawashima

This paper presents an automatic keyword extraction method from historical document images. The proposed method is language independent because it is purely appearance based, where neither lexical information nor any other statistical language models are required. Moreover, since it does not need word segmentation, it can be applied to Eastern languages where they do not put clear spacing between words. The first half of the paper describes the algorithm to retrieve document image regions which have similar appearance to the given query image. The algorithm was evaluated in recall-precision manner, and showed its performance of over 80–90% average precision. The second half of the paper describes the keyword extraction method which works even if no query word is explicitly specified. Since the computational cost was reduced by the efficient pruning techniques, the system could extract keywords successfully from relatively large documents.

asia information retrieval symposium | 2005

Robust matching method for scale and rotation invariant local descriptors and its application to image indexing

Kengo Terasawa; Takeshi Nagasaki; Toshio Kawashima

Interest point matching is widely used for image indexing. In this paper we introduce a new distance measure between two local descriptors instead of conventional Mahalanobis distance to improve matching accuracy. From experiments with synthetic images we show that the error distribution of local jet is gaussian but the distribution of the descriptors derived from local jet is not gaussian. Based on the observation, we design a new distance measure between two local descriptors and improve accuracy of point matching. We also reduce the number of candidate points and reduce the computational cost by taking into account the characteristic scale ratio. Experimental results confirm the validity of our method.

international conference on document analysis and recognition | 2009

Automatic Evaluation Framework for Word Spotting

Kengo Terasawa; Hajime Imura; Yuzuru Tanaka

Word spotting is the task of retrieving a text region that has a similar appearance to a query image specified by the user. This paper proposes an automatic evaluation framework for word spotting methods. In order to make our framework available to researchers around the world, we discuss some standard definitions and representations that are suitable for most word spotting methods, regardless of the assumptions and settings on which the individual methods depend. We also design a protocol for interprocess communication between a parent process and a word spotting engine. This protocol can modularize individual spotting methods to become interchangeable parts in a larger application. Our framework will promote the development of such a word spotting method and improve its usability.

international conference on document analysis and recognition | 2011

A Fast Appearance-Based Full-Text Search Method for Historical Newspaper Images

Kengo Terasawa; Takahiro Shima; Toshio Kawashima

This paper presents a fast appearance-based full-text search method for historical newspaper images. Since historical newspapers differ from recent newspapers in image quality, type fonts and language usages, optical character recognition (OCR) does not provide sufficient quality. Instead of OCR approach, we adopted appearance-based approach, that means we matched character to character with its shapes. Assuming proper character segmentation and proper feature description, full-text search problem is reduced to sequence matching problem of feature vector. To increase computational efficiency, we adopted pseudo-code expression called LSPC, which is a compact sketch of feature vector while retaining a good deal of its information. Experimental result showed that our method can retrieve a query string from a text of over eight million characters within a second. In addition, we predict that more sophisticated algorithm could be designed for LSPC. As an example, we established the Extended Boyer-Moore-Horspool algorithm that can reduce the computational cost further especially when the query string becomes longer.

Proceedings of the 2011 Workshop on Historical Document Imaging and Processing | 2011

Image processing for historical newspaper archives

Takahiro Shima; Kengo Terasawa; Toshio Kawashima

This paper presents some image processing methods that could produce accurate character segmentation results for historical newspaper archives. A full text search using a word spotting technique is no doubt a promising approach in order to facilitate the utilization of digital archives. Some word spotting techniques require the target images to be segmented into character images in advance, however character segmentation is a difficult issue especially for old and degraded document images. This paper figures out the causes that make the character segmentation difficult, and removes them in order to improve the accuracy of character segmentation. We first detect the ruled lines using Hough Transform in order to segment a whole newspaper image into column-separated images. Then we remove the ruled lines as well as ruby characters and noise. The proposed system is tested for 20 column-separated images of historical newspapers, and the accuracy of character segmentation is improved to 96.3%.

asian conference on pattern recognition | 2015

Character recognition of medieval English manuscripts supported by a word frequency table

Kei Tanaka; Kengo Terasawa

This paper proposes a method to reduce the effort involved in making transcriptions of historical documents. The method consists of preprocessing, line and word segmentation, and word clustering stages. In the line segmentation process, we determine the borders around lines using dynamic programming to be able to avoid influence of letter ascenders and descenders. In the word clustering process, we propose a novel method, basically a hierarchical cluster analysis, which uses a word frequency table as supplementary information. The effectiveness of the proposed method is evaluated experimentally by comparing with a baseline method which does not use a word frequency table. The experiments confirmed that the proposed method outperforms the baseline method.

Explore More