Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Da-Han Wang is active.

Publication


Featured researches published by Da-Han Wang.


international conference on document analysis and recognition | 2011

CASIA Online and Offline Chinese Handwriting Databases

Cheng-Lin Liu; Fei Yin; Da-Han Wang; Qiu-Feng Wang

This paper introduces a pair of online and offline Chinese handwriting databases, containing samples of isolated characters and handwritten texts. The samples were produced by 1,020 writers using Anoto pen on papers for obtaining both online trajectory data and offline images. Both the online samples and offline samples are divided into six datasets, three for isolated characters (DB1.0-C1.2) and three for handwritten texts (DB2.0-C2.2). The (either online or offline) datasets of isolated characters contain about 3.9 million samples of 7,356 classes (7,185 Chinese characters and 171 symbols), and the datasets of handwritten texts contain about 5,090 pages and 1.35 million character samples. Each dataset is segmented and annotated at character level, and is partitioned into standard training and test subsets. The online and offline databases can be used for the research of various handwritten document analysis tasks.


Pattern Recognition | 2013

Online and offline handwritten Chinese character recognition: Benchmarking on new databases

Cheng-Lin Liu; Fei Yin; Da-Han Wang; Qiu-Feng Wang

Recently, the Institute of Automation of Chinese Academy of Sciences (CASIA) released the unconstrained online and offline Chinese handwriting databases CASIA-OLHWDB and CASIA-HWDB, which contain isolated character samples and handwritten texts produced by 1020 writers. This paper presents our benchmarking results using state-of-the-art methods on the isolated character datasets OLHWDB1.0 and HWDB1.0 (called DB1.0 in general), OLHWDB1.1 and HWDB1.1 (called DB1.1 in general). The DB1.1 covers 3755 Chinese character classes as in the level-1 set of GB2312-80. The evaluated methods include 1D and pseudo 2D normalization methods, gradient direction feature extraction from binary images and from gray-scale images, online stroke direction feature extraction from pen-down trajectory and from pen lifts, classification using the modified quadratic discriminant function (MQDF), discriminative feature extraction (DFE), and discriminative learning quadratic discriminant function (DLQDF). Our experiments reported the highest test accuracies 89.55% and 93.22% on the HWDB1.1 (offline) and OLHWDB1.1 (online), respectively, when using the MQDF classifier trained with DB1.1. When training with both the DB1.0 and DB1.1, the test accuracies on HWDB1.1 and OLHWDB are improved to 90.71% and 93.95%, respectively. Using DFE and DLQDF, the best results on HWDB1.1 and OLHWDB1.1 are 92.08% and 94.85%, respectively. Our results are comparable to the best results of the ICDAR2011 Chinese Handwriting Recognition Competition though we used less training samples.


Pattern Recognition | 2012

An approach for real-time recognition of online Chinese handwritten sentences

Da-Han Wang; Cheng-Lin Liu; Xiang-Dong Zhou

With the advances of handwriting capturing devices and computing power of mobile computers, pen-based Chinese text input is moving from character-based input to sentence-based input. This paper proposes a real-time recognition approach for sentence-based input of Chinese handwriting. The main feature of the approach is a dynamically maintained segmentation-recognition candidate lattice that integrates multiple contexts including character classification, linguistic context and geometric context. Whenever a new stroke is produced, dynamic text line segmentation and character over-segmentation are performed to locate the position of the stroke in text lines and update the primitive segment sequence of the page. Candidate characters are then generated and recognized to assign candidate classes, and linguistic context and geometric context involving the newly generated candidate characters are computed. The candidate lattice is updated while the writing process continues. When the pen lift time exceeds a threshold, the system searches the candidate lattice for the result of sentence recognition. Since the computation of multiple contexts consumes the majority of computing and is performed during writing process, the recognition result is obtained immediately after the writing of a sentence is finished. Experiments on a large database CASIA-OLHWDB of unconstrained online Chinese handwriting demonstrate the robustness and effectiveness of the proposed approach.


international conference on document analysis and recognition | 2009

CASIA-OLHWDB1: A Database of Online Handwritten Chinese Characters

Da-Han Wang; Cheng-Lin Liu; Jinlun Yu; Xiang-Dong Zhou

This paper describes a publicly available database, CASIA-OLHWDB1, for research on online handwritten Chinese character recognition. This database is the first of our series of online/offline handwritten characters and texts, collected using Anoto pen on paper. It contains unconstrained handwritten characters of 4,037 categories (3,866 Chinese characters and 171 symbols) produced by 420 persons, and 1,694,741 samples in total. It can be used for design and evaluation of character recognition algorithms and classifier design for handwritten text recognition systems. We have partitioned the samples into three grades and into training and test sets. Preliminary experiments on the database using a state-of-the-art recognizer justify the challenge of recognition.


Pattern Recognition | 2009

A robust approach to text line grouping in online handwritten Japanese documents

Xiang-Dong Zhou; Da-Han Wang; Cheng-Lin Liu

In this paper, we present an effective approach for grouping text lines in online handwritten Japanese documents by combining temporal and spatial information. With decision functions optimized by supervised learning, the approach has few artificial parameters and utilizes little prior knowledge. First, the strokes in the document are grouped into text line strings according to off-stroke distances. Each text line string, which may contain multiple lines, is segmented by optimizing a cost function trained by the minimum classification error (MCE) method. At the temporal merge stage, over-segmented text lines (caused by stroke classification errors) are merged with a support vector machine (SVM) classifier for making merge/non-merge decisions. Last, a spatial merge module corrects the segmentation errors caused by delayed strokes. Misclassified text/non-text strokes (stroke type classification precedes text line grouping) can be corrected at the temporal merge stage. To evaluate the performance of text line grouping, we provide a set of performance metrics for evaluating from multiple aspects. In experiments on a large number of free form documents in the Tokyo University of Agriculture and Technology (TUAT) Kondate database, the proposed approach achieves the entity detection metric (EDM) rate of 0.8992 and the edit-distance rate (EDR) of 0.1114. For grouping of pure text strokes, the performance reaches EDM of 0.9591 and EDR of 0.0669.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2013

Handwritten Chinese/Japanese Text Recognition Using Semi-Markov Conditional Random Fields

Xiang-Dong Zhou; Da-Han Wang; Feng Tian; Cheng-Lin Liu; Masaki Nakagawa

This paper proposes a method for handwritten Chinese/Japanese text (character string) recognition based on semi-Markov conditional random fields (semi-CRFs). The high-order semi-CRF model is defined on a lattice containing all possible segmentation-recognition hypotheses of a string to elegantly fuse the scores of candidate character recognition and the compatibilities of geometric and linguistic contexts by representing them in the feature functions. Based on given models of character recognition and compatibilities, the fusion parameters are optimized by minimizing the negative log-likelihood loss with a margin term on a training string sample set. A forward-backward lattice pruning algorithm is proposed to reduce the computation in training when trigram language models are used, and beam search techniques are investigated to accelerate the decoding speed. We evaluate the performance of the proposed method on unconstrained online handwritten text lines of three databases. On the test sets of databases CASIA-OLHWDB (Chinese) and TUAT Kondate (Japanese), the character level correct rates are 95.20 and 95.44 percent, and the accurate rates are 94.54 and 94.55 percent, respectively. On the test set (online handwritten texts) of ICDAR 2011 Chinese handwriting recognition competition, the proposed method outperforms the best system in competition.


international conference on frontiers in handwriting recognition | 2010

Keyword Spotting from Online Chinese Handwritten Documents Using One-vs-All Trained Character Classifier

Heng Zhang; Da-Han Wang; Cheng-Lin Liu

This paper presents a text query-based method for keyword spotting from online Chinese handwritten documents. The similarity between a text word and handwriting is obtained by combining the character similiarity scores given by a character classifier. To overcome the ambiguity of character segmentation, multiple candidates of character patterns are generated by over-segmentation, and sequences of candidate characters are matched with the query word in beam search. The character classifier is trained by one-vs-all strategy so that it gives high similarity to the target class and low scores to the others. Particularly, we use a one-vs-all trained prototype classifier and a support vector machine (SVM) classifier for similarity scoring. The method yielded promising performance in experiments on a database containing 550 pages of 110 writers. For words of four characters, the recall, precision and F measure are 87.25%, 94.84% and 90.88%, respectively.


international conference on document analysis and recognition | 2013

Learning-Based Candidate Segmentation Scoring for Real-Time Recognition of Online Overlaid Chinese Handwriting

Yan-Fei Lv; Lin-Lin Huang; Da-Han Wang; Cheng-Lin Liu

In overlaid handwriting, multiple characters are written sequentially in the same area. This needs special consideration for segmenting the stroke sequence into characters. We propose a learning-based model for scoring the candidate stroke cuts and segments for online overlaid Chinese handwriting recognition. Based on stroke cut classification using support vector machine (SVM), strokes are grouped into segments, and consecutive segments are concatenated into candidate characters. The likeliness of candidate characters (unary geometry) and the compatibility between adjacent characters (binary geometry) are measured by combining the stroke cut score and the between-segment geometric score, and are integrated with the character classification score and linguistic context for character string recognition. Experiments on a large database of online Chinese handwriting demonstrate the effectiveness of the proposed method.


Pattern Recognition | 2014

Character confidence based on N-best list for keyword spotting in online Chinese handwritten documents

Heng Zhang; Da-Han Wang; Cheng-Lin Liu

In keyword spotting from handwritten documents by text query, the word similarity is usually computed by combining character similarities, which are desired to approximate the logarithm of the character probabilities. In this paper, we propose to directly estimate the posterior probability (also called confidence) of candidate characters based on the N-best paths from the candidate segmentation-recognition lattice. On evaluating the candidate segmentation-recognition paths by combining multiple contexts, the scores of the N-best paths are transformed to posterior probabilities using soft-max. The parameter of soft-max (confidence parameter) is estimated from the character confusion network, which is constructed by aligning different paths using a string matching algorithm. The posterior probability of a candidate character is the summation of the probabilities of the paths that pass through the candidate character. We compare the proposed posterior probability estimation method with some reference methods including the word confidence measure and the text line recognition method. Experimental results of keyword spotting on a large database CASIA-OLHWDB of unconstrained online Chinese handwriting demonstrate the effectiveness of the proposed method.


international conference on frontiers in handwriting recognition | 2010

Error Reduction by Confusing Characters Discrimination for Online Handwritten Japanese Character Recognition

Xiang-Dong Zhou; Da-Han Wang; Masaki Nakagawa; Cheng-Lin Liu

To reduce the classification errors of online handwritten Japanese character recognition, we propose a method for confusing characters discrimination with little additional costs. After building confusing sets by cross validation using a baseline quadratic classifier, a logistic regression (LR) classifier is trained to discriminate the characters in each set. The LR classifier uses subspace features selected from existing vectors of the baseline classifier, thus has no extra parameters except the weights, which consumes a small storage space compared to the baseline classifier. In experiments on the TUAT HANDS databases with the modified quadratic discriminant function (MQDF) as baseline classifier, the proposed method has largely reduced the confusion caused by non-Kanji characters.

Collaboration


Dive into the Da-Han Wang's collaboration.

Top Co-Authors

Avatar

Cheng-Lin Liu

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Xiang-Dong Zhou

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Fei Yin

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Qiu-Feng Wang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Heng Zhang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Masaki Nakagawa

Tokyo University of Agriculture and Technology

View shared research outputs
Top Co-Authors

Avatar

Jinlun Yu

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Liang Xu

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Feng Tian

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Lin-Lin Huang

Beijing Jiaotong University

View shared research outputs
Researchain Logo
Decentralizing Knowledge