2019 International Joint Conference on Neural Networks (IJCNN) | 2019

Ensemble Attention For Text Recognition In Natural Images

 
 
 
 
 

Abstract


Recognizing text from natural images is a challenging and hot research topic in computer vision, yet not completely solved. The recent methods regard this task as a sequence labeling problem. In this task, there is a strong correspondence between the position of the input image patches sequence and the output character sequence. However, most of the recent recognition systems rarely consider this local information of the input sequence when recognizing the current character. In contrast to this, we present a Local Restricted Attention (LRA) mechanism to encode the current vector by considering adjacent vectors of the input sequence. We propose an ensemble decoder block which combines LRA mechanism with a regular decoder mechanism. This block not only brings significant improvement of recognition results under shorter training time but also can be easily embedded in other recognition frameworks. In addition, we propose a scene text recognition network based on the ensemble decoder. The experimental performances show that the proposed model achieves the state-of-the-art on several benchmark datasets including IIIT-5K, SVT, CUTE80, SVT-Perspective and ICDARs.

Volume None
Pages 1-8
DOI 10.1109/IJCNN.2019.8852010
Language English
Journal 2019 International Joint Conference on Neural Networks (IJCNN)

Full Text