Proceedings of the 2021 International Conference on Multimedia Retrieval | 2021

Look Back Again: Dual Parallel Attention Network for Accurate and Robust Scene Text Recognition

 
 
 
 

Abstract


Nowadays, it is a trend that using a parallel-decoupled encoder-decoder (PDED) framework in scene text recognition for its flexibility and efficiency. However, due to the inconsistent information content between queries and keys in the parallel positional attention module (PPAM) used in this kind of framework(queries: position information, keys: context and position information), visual misalignment tends to appear when confronting hard samples(e.g., blurred texts, irregular texts, or low-quality images). To tackle this issue, in this paper, we propose a dual parallel attention network (DPAN), in which a newly designed parallel context attention module (PCAM) is cascaded with the original PPAM, using linguistic contextual information to compensate for the information inconsistency between queries and keys. Specifically, in PCAM, we take the visual features from PPAM as inputs and present a bidirectional language model to enhance them with linguistic contexts to produce queries. In this way, we make the information content of the queries and keys consistent in PCAM, which helps to generate more precise visual glimpses to improve the entire PDED framework s accuracy and robustness. Experimental results verify the effectiveness of the proposed PCAM, showing the necessity of keeping the information consistency between queries and keys in the attention mechanism. On six benchmarks, including regular text and irregular text, the performance of DPAN surpasses the existing leading methods by large margins, achieving new state-of-the-art performance. The code is available on \\urlhttps://github.com/Jackandrome/DPAN.

Volume None
Pages None
DOI 10.1145/3460426.3463674
Language English
Journal Proceedings of the 2021 International Conference on Multimedia Retrieval

Full Text