2019 International Conference on Document Analysis and Recognition (ICDAR) | 2019

A Handwritten Chinese Text Recognizer Applying Multi-level Multimodal Fusion Network

 
 
 
 
 

Abstract


Handwritten Chinese text recognition (HCTR) has received extensive attention from the community of pattern recognition in the past decades. Most existing deep learning methods consist of two stages, i.e., training a text recognition network on the base of visual information, followed by incorporating language constrains with various language models. Therefore, the inherent linguistic semantic information is often neglected when designing the recognition network. To tackle this problem, in this work, we propose a novel multi-level multimodal fusion network and properly embed it into an attention-based LSTM so that both the visual information and the linguistic semantic information can be fully leveraged when predicting sequential outputs from the feature vectors. Experimental results on the ICDAR-2013 competition dataset demonstrate a comparable result with the state-of-the-art approaches.

Volume None
Pages 1464-1469
DOI 10.1109/ICDAR.2019.00235
Language English
Journal 2019 International Conference on Document Analysis and Recognition (ICDAR)

Full Text