IEEE Transactions on Multimedia | 2019

Track, Attend, and Parse (TAP): An End-to-End Framework for Online Handwritten Mathematical Expression Recognition

 
 
 

Abstract


In this paper, we introduce Track, Attend, and Parse (TAP), an end-to-end approach based on neural networks for online handwritten mathematical expression recognition (OHMER). The architecture of TAP consists of a tracker and a parser. The tracker employs a stack of bidirectional recurrent neural networks with gated recurrent units (GRU) to model the input handwritten traces, which can fully utilize the dynamic trajectory information in OHMER. Followed by the tracker, the parser adopts a GRU equipped with guided hybrid attention (GHA) to generate notations. The proposed GHA is composed of a coverage-based spatial attention, a temporal attention, and an attention guider. Moreover, we demonstrate the strong complementarity between offline information with static-image input and online information with ink-trajectory input by blending a fully convolutional networks-based watcher into TAP. Inherently, unlike traditional methods, this end-to-end framework does not require the explicit symbol segmentation and a predefined expression grammar for parsing. Validated on a benchmark published by the CROHME competition, the proposed approach outperforms the state-of-the-art methods and achieves the best reported results with an expression recognition accuracy of 61.16% on CROHME 2014 and 57.02% on CROHME 2016, using only official training dataset.

Volume 21
Pages 221-233
DOI 10.1109/TMM.2018.2844689
Language English
Journal IEEE Transactions on Multimedia

Full Text