Proceedings of the 2019 3rd International Conference on Computer Science and Artificial Intelligence | 2019
An Attention-based Sequence Learning Model for Scene Text Recognition with Text Correction
Abstract
Recognizing text from images taken in natural scenes is a challenging task and a hot research topic in computer vision. Unlike traditional optical character recognition (OCR), words in natural images often possess irregular layout (e.g. arbitrarily orientation, blurring, perspective distortion) which are difficult to recognize. In this paper, we develop a novel method consisting of a text recognition network and a text correction component, which is more robust to irregular text. The text correction component rectify the text of an input image to a more readable text. The text recognition network is a more location aware attention-based sequence learning model that take the rectified image as input and recognize the text. The entire networks are trained jointly by only images and word-level annotations. The standard Softmax loss function only considers the separability between classes but does not restrict the aggregation within classes. Therefore, we adopt a new loss function based on the Softmax loss function to enable the model to learn more discriminative features, reduce misjudgments and improve accuracy. Extensive experiments on seven popular standard benchmarks, demonstrate the proposed method is comparable to state-of-the-art performance.