2019 International Conference on Document Analysis and Recognition (ICDAR) | 2019
A Study of Script Language Effects in Deep Neural-Network-Based Scene Text Detection
Abstract
This study is different from most of the recent text detection work which focuses on creating a robust text detector system. In this work we studied how script languages affect a text detector s performance by using a multi-language synthetic dataset—namely, the Synthetic Octa-Language (SOL) dataset. The effect of script languages continues to be largely unexplored. Previously, this kind of experiment was infeasible because too many factors influence the performance of a text detector. We really cannot tell what role the factor X plays, neither positive nor negative. To overcome these difficulties, we used controlled synthesized data, which allows us to explicitly control factors such as base image, script language, text content, text color, font face, and font size. With the SOL dataset, we were able to investigate the effect that script languages have on on deep neural-network (DNN)-based methods under different scenarios. Moreover, this dataset can be used in other script-language-related text detection research as well.