Cuong Tuan Nguyen
Tokyo University of Agriculture and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Cuong Tuan Nguyen.
Pattern Recognition | 2018
Hung Tuan Nguyen; Cuong Tuan Nguyen; Masaki Nakagawa
A Vietnamese Online Handwriting Database is made and analyzed.Vietnamese online handwritten text poses a challenge due to many delayed strokes.Long Short-Term Memory neural networks is effective to process delayed strokes. We present our efforts to create a database of unconstrained Vietnamese online handwritten text sampled from pen-based devices. The database stores handwritten text for paragraphs, lines, words, and characters, with the ground truth associated with every paragraph and line. We show a detailed statistical analysis of the handwritten text in this database and describe recognition experiments using several recent methods including the Bidirectional Long Short-Term Memory (BLSTM) network. Overall, our database contains over 480,000 strokes from more than 380,000 characters, which, at present, is the largest database of Vietnamese online handwritten text. Although Vietnamese script is based on a fixed set of alphabet letters, the recognition of Vietnamese online handwritten text poses a difficult challenge because of many diacritical marks, which usually result in delayed strokes during writing. We designed and implemented an online handwriting-collection tool to gather data, as well as a line-segmentation tool and a delayed-stroke-detection tool to analyze collected handwritten text. We also conducted a statistical analysis based on the writer profiles. We applied a number of the state-of-the-art recognition methods on unconstrained Vietnamese handwriting to evaluate their performance, including the BLSTM network, which is an efficient architecture derived from the Recurrent Neural Network (RNN) and is often applied to sequence labeling problems. The BLSTM network achieved 90% character recognition accuracy, despite many long sequences with several delayed strokes. Our database is allowed open access for research to stimulate the development of handwriting research technology.
asian conference on pattern recognition | 2015
Khanh Minh Phan; Cuong Tuan Nguyen; Anh Duc Le; Masaki Nakagawa
This paper presents an incremental recognition method for online handwritten mathematical expressions (MEs), which is used for busy recognition interface (recognition while writing) without large waiting time. We employ local processing strategy and focus on recent strokes. For the latest stroke, we perform segmentation, recognition and update of Cocke-Younger-Kasami (CYK) table. We also reuse the segmentation and recognition candidates in the previous processes. Moreover, using multi-thread reduces the waiting time. Experiments on our data set show the effectiveness of the incremental method not only in small waiting time but also keeping almost the same recognition rate of the batch recognition method without significant decrease. We also propose the combination of the two methods which succeeds the advantages of the both.
international conference on frontiers in handwriting recognition | 2016
Hung Tuan Nguyen; Cuong Tuan Nguyen; Masaki Nakagawa
This paper presents our attempts to collect and analyze unconstrained Vietnamese online handwriting text patterns by pen-based computers. Totally, our database contains over 120,000 strokes from more than 140,000 characters, which is one of the largest Vietnamese online handwriting pattern databases currently. For building and analyzing our database, we made a collection tool, a line segmentation tool, and a delayed stroke detection tool. Moreover, we investigated some statistical information from personal information of writers. In order to solve the unconstrained handwriting recognition problem, we conducted experiments using Bidirectional Long Short-Term Memory (BLSTM) networks. BLSTM network is architecture of Recurrent Neural Network (RNN) and applied recently for many related problems. The performance of BLSTM network on our database is nearly 80% of accuracy even though this database contains many delayed strokes. In near future, we are going to avail our database for research purposes, as it would be the fundamental for the handwriting recognition research.
Pattern Recognition Letters | 2018
Hung Tuan Nguyen; Cuong Tuan Nguyen; Takeya Ino; Bipin Indurkhya; Masaki Nakagawa
Abstract The text-independent approach to writer identification does not require the writer to write some predetermined text. Previous research on text-independent writer identification has been based on identifying writer-specific features designed by experts. However, in the last decade, deep learning methods have been successfully applied to learn features from data automatically. We propose here an end-to-end deep-learning method for text-independent writer identification that does not require prior identification of features. A Convolutional Neural Network (CNN) is trained initially to extract local features, which represent characteristics of individual handwriting in the whole character images and their sub-regions. Randomly sampled tuples of images from the training set are used to train the CNN and aggregate the extracted local features of images from the tuples to form global features. For every training epoch, the process of randomly sampling tuples is repeated, which is equivalent to a large number of training patterns being prepared for training the CNN for text-independent writer identification. We conducted experiments on the JEITA-HP database of offline handwritten Japanese character patterns. With 200 characters, our method achieved an accuracy of 99.97% to classify 100 writers. Even when using 50 characters for 100 writers or 100 characters for 400 writers, our method achieved accuracy levels of 92.80% or 93.82%, respectively. We conducted further experiments on the Firemaker and IAM databases of offline handwritten English text. Using only one page per writer to train, our method achieved over 91.81% accuracy to classify 900 writers. Overall, we achieved a better performance than the previously published best result based on handcrafted features and clustering algorithms, which demonstrates the effectiveness of our method for handwritten English text also.
Proceedings of the 4th International Workshop on Historical Document Imaging and Processing | 2017
Cong Kha Nguyen; Cuong Tuan Nguyen; Nakagawa Masaki
This paper proposes a method to recognize a large set of 32,695 Nom characters which had been used in Vietnam from the tenth century to the twentieth century before the Latin-based Vietnamese alphabet became common. So far, the largest sets to which character recognition methods have been studied, including the latest deep Neural Networks, are about 10,000 for the current set of Chinese, Japanese and Korean, but ancient languages of Chinese origin require much larger sizes of categories. Moreover, lack of training patterns makes the development of Optical Character Recognition (OCR) for Nom be a big challenge. On the other hand, the demand to archive Nom historical documents is very high since a large amount of documents are left uninterpreted and scholars who can comprehend Nom are decreasing. Therefore, we propose a method to recognize a very large set of Nom categories by Deep Convolution Neural Networks (CNN). The proposed method introduces coarse categories which are prepared by K-means beforehand. We construct deep CNNs composed by a coarse category feature extractor, a coarse category classifier and a fine category classifier including some inception modules. We pre-train the coarse category feature extractor and the coarse category classifier with the coarse categories, freeze them and then perform fine tuning to recognize characters in the whole categories of Nom. Unlike conventional cascade of coarse classification and fine classification, the coarse and fine category classifiers are executed in parallel to feature maps generated by the feature extractor and their likelihoods are multiplied. The experiment shows that this architecture provides the better recognition rate than the former cascade of GLVQ and MQDF.
Proceedings of the 4th International Workshop on Historical Document Imaging and Processing | 2017
Hung Tuan Nguyen; Nam Tuan Ly; Kha Cong Nguyen; Cuong Tuan Nguyen; Masaki Nakagawa
This paper presents methods for three different tasks of recognizing anomalously deformed Kana in Japanese historical documents, which were contested by IEICE PRMU1 2017. The tasks have three levels: single character recognition, three Kana characters sequence recognition and unrestricted Kana recognition. We compare several methods for each task. For the level 1, we evaluate CNN based methods and BLSTM based methods. For the level 2, we consider several variations of a combined architecture of CNN and BLSTM. For the level 3, we compare an extension of the method for the level 2 and a segmentation based method. We achieve the single character recognition accuracy of 96.8%, the three Kana characters sequence recognition accuracy of 87.12% and the unrestricted Kana recognition accuracy of 73.3%. These results prove the performance of CNN and BLSTM on these tasks.
international conference on frontiers in handwriting recognition | 2016
Cuong Tuan Nguyen; Masaki Nakagawa
This paper presents a Finite State Machine (FSM) to reduce users waiting time to get the recognition result after finishing writing in recognition of online handwritten English text. The lexicon is modeled by a FSM, and then determination and minimization are applied to reduce the number of states. The reduction of states in the FSM shortens the waiting time without degrading the recognition accuracy. Moreover, by merging incoming paths to each state, the recognition rate is improved. The N-best states decoding method also reduces the waiting time significantly with small degradation in recognition accuracy. Experiments on IAM-OnDB and IBM_UB_1 show the effectiveness of the method in both reducing waiting and improving recognition accuracy.
asian conference on pattern recognition | 2015
Cuong Tuan Nguyen; Masaki Nakagawa
Segmentation of online handwritten text recognition is better to employ the dependency on context of strokes written before and after it. This paper shows an application of Bidirectional Long Short-term Memory recurrent neural networks for segmentation of on-line handwritten English text. The networks allow incorporating long-range context from both forward and backward directions to improve the confident of segmentation over uncertainty. We show that applying the method in the semi-incremental recognition of online handwritten English text reduces up to 62% of waiting time, 50% of processing time. Moreover, recognition rate of the system also improves remarkably by 3 points from 71.7%.
international conference on document analysis and recognition | 2013
Cuong Tuan Nguyen; Bilan Zhu; Masaki Nakagawa
IEICE Transactions on Information and Systems | 2016
Cuong Tuan Nguyen; Bilan Zhu; Masaki Nakagawa