IEEE Transactions on Intelligent Transportation Systems | 2021

Street View Text Recognition With Deep Learning for Urban Scene Understanding in Intelligent Transportation Systems

 
 
 
 
 

Abstract


Understanding the surrounding scenes is one of the fundamental tasks in intelligent transportation systems (ITS), especially in unpredictable driving scenes or in developing regions/cities without digital maps. Street view is the most common scene during driving. Since streets are often full of shops with signboards, scene text recognition over the shop sign images in street views is of great significance and utility to urban scene understanding in ITS. To advance research in this field, (1) we build ShopSign, which is a large-scale scene text dataset of Chinese shop signs in street views. It contains 25,770 natural scene images, and 267,049 text instances. The images in ShopSign were captured in different scenes, from downtown to developing regions, and across 8 provinces and 20 cities in China, using more than 50 different mobile phones. It is very sparse and imbalanced in nature. (2) we carry out a comprehensive empirical study on the performance of state-of-the-art DL based scene text reading algorithms on ShopSign and three other Chinese scene text datasets, which has not been addressed in the literature before. Through comparative analysis, we demonstrate that language has a critical influence on scene text detection. Moreover, by comparing the accuracy of four scene text recognition algorithms, we show that there is a very large room for further improvements in street view text recognition to fit real-world ITS applications.

Volume 22
Pages 4727-4743
DOI 10.1109/tits.2020.3017632
Language English
Journal IEEE Transactions on Intelligent Transportation Systems

Full Text