2019 International Conference on Document Analysis and Recognition (ICDAR) | 2019

A Handwritten Chinese Text Recognizer Applying Multi-level Multimodal Fusion Network

Abstract

Handwritten Chinese text recognition (HCTR) has received extensive attention from the community of pattern recognition in the past decades. Most existing deep learning methods consist of two stages, i.e., training a text recognition network on the base of visual information, followed by incorporating language constrains with various language models. Therefore, the inherent linguistic semantic information is often neglected when designing the recognition network. To tackle this problem, in this work, we propose a novel multi-level multimodal fusion network and properly embed it into an attention-based LSTM so that both the visual information and the linguistic semantic information can be fully leveraged when predicting sequential outputs from the feature vectors. Experimental results on the ICDAR-2013 competition dataset demonstrate a comparable result with the state-of-the-art approaches.

Volume None

2019 International Conference on Document Analysis and Recognition (ICDAR) | 2019

A Handwritten Chinese Text Recognizer Applying Multi-level Multimodal Fusion Network

Abstract

Volume None

Pages 1464-1469

DOI 10.1109/ICDAR.2019.00235

Language English

Journal 2019 International Conference on Document Analysis and Recognition (ICDAR)

Full Text