Hongwei Hao | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hongwei Hao is active.

Explore More

Publication

Featured researches published by Hongwei Hao.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2014

Robust Text Detection in Natural Scene Images

Xu-Cheng Yin; Xuwang Yin; Kaizhu Huang; Hongwei Hao

Text detection in natural scene images is an important prerequisite for many content-based image analysis tasks. In this paper, we propose an accurate and robust method for detecting texts in natural scene images. A fast and effective pruning algorithm is designed to extract Maximally Stable Extremal Regions (MSERs) as character candidates using the strategy of minimizing regularized variations. Character candidates are grouped into text candidates by the single-link clustering algorithm, where distance weights and clustering threshold are learned automatically by a novel self-training distance metric learning algorithm. The posterior probabilities of text candidates corresponding to non-text are estimated with a character classifier; text candidates with high non-text probabilities are eliminated and texts are identified with a text classifier. The proposed system is evaluated on the ICDAR 2011 Robust Reading Competition database; the f-measure is over 76%, much better than the state-of-the-art performance of 71%. Experiments on multilingual, street view, multi-orientation and even born-digital databases also demonstrate the effectiveness of the proposed method.Text detection in natural scene images is an important prerequisite for many content-based image analysis tasks. In this paper, we propose an accurate and robust method for detecting texts in natural scene images. A fast and effective pruning algorithm is designed to extract Maximally Stable Extremal Regions (MSERs) as character candidates using the strategy of minimizing regularized variations. Character candidates are grouped into text candidates by the single-link clustering algorithm, where distance weights and clustering threshold are learned automatically by a novel self-training distance metric learning algorithm. The posterior probabilities of text candidates corresponding to non-text are estimated with a character classifier; text candidates with high non-text probabilities are eliminated and texts are identified with a text classifier. The proposed system is evaluated on the ICDAR 2011 Robust Reading Competition database; the f-measure is over 76%, much better than the state-of-the-art performance of 71%. Experiments on multilingual, street view, multi-orientation and even born-digital databases also demonstrate the effectiveness of the proposed method.

meeting of the association for computational linguistics | 2016

Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification

Peng Zhou; Wei Shi; Jun Tian; Zhenyu Qi; Bingchen Li; Hongwei Hao; Bo Xu

Relation classification is an important semantic processing task in the field of natural language processing (NLP). State-ofthe-art systems still rely on lexical resources such as WordNet or NLP systems like dependency parser and named entity recognizers (NER) to get high-level features. Another challenge is that important information can appear at any position in the sentence. To tackle these problems, we propose Attention-Based Bidirectional Long Short-Term Memory Networks(AttBLSTM) to capture the most important semantic information in a sentence. The experimental results on the SemEval-2010 relation classification task show that our method outperforms most of the existing methods, with only word vectors.

international joint conference on natural language processing | 2015

Semantic Clustering and Convolutional Neural Network for Short Text Categorization

Peng Wang; Jiaming Xu; Bo Xu; Cheng-Lin Liu; Heng Zhang; Fangyuan Wang; Hongwei Hao

Short texts usually encounter data sparsity and ambiguity problems in representations for their lack of context. In this paper, we propose a novel method to model short texts based on semantic clustering and convolutional neural network. Particularly, we first discover semantic cliques in embedding spaces by a fast clustering algorithm. Then, multi-scale semantic units are detected under the supervision of semantic cliques, which introduce useful external knowledge for short texts. These meaningful semantic units are combined and fed into convolutional layer, followed by max-pooling operation. Experimental results on two open benchmarks validate the effectiveness of the proposed method.

Pattern Recognition | 1997

Handwritten Chinese character recognition by metasynthetic approach

Hongwei Hao; Xu-Hong Xiao; Ruwei Dai

Enlightened by the idea of metasynthesis, two integration approaches for handwritten Chinese character recognition are proposed in this paper. The first one is Integration based on a Linear Model and the second one is Network Integration based on Supervised Learning. Compared with previous integration approaches, the proposed methods succeed in automatically acquiring the parameters of the integrated systems by supervised learning which is very important for the large number of classes of pattern recognition problems. The experimental results show that the performances of the synthesized systems are much better than any of the individual classifiers.

Neurocomputing | 2014

A novel classifier ensemble method with sparsity and diversity

Xu-Cheng Yin; Kaizhu Huang; Hongwei Hao; Khalid Iqbal; Zhi-Bin Wang

We consider the classifier ensemble problem in this paper. Due to its superior performance to individual classifiers, class ensemble has been intensively studied in the literature. Generally speaking, there are two prevalent research directions on this, i.e., to diversely generate classifier components, and to sparsely combine multiple classifiers. While most current approaches are emphasized on either sparsity or diversity only, we investigate the classifier ensemble by learning both sparsity and diversity simultaneously. We manage to formulate the classifier ensemble problem with the sparsity or/and diversity learning in a general framework. In particular, the classifier ensemble with sparsity and diversity can be represented as a mathematical optimization problem. We then propose a heuristic algorithm, capable of obtaining ensemble classifiers with consideration of both sparsity and diversity. We exploit the genetic algorithm, and optimize sparsity and diversity for classifier selection and combination heuristically and iteratively. As one major contribution, we introduce the concept of the diversity contribution ability so as to select proper classifier components and evolve classifier weights eventually. Finally, we compare our proposed novel method with other conventional classifier ensemble methods such as Bagging, least squares combination, sparsity learning, and AdaBoost, extensively on UCI benchmark data sets and the Pascal Large Scale Learning Challenge 2008 webspam data. The experimental results confirm that our approach leads to better performance in many aspects.

Information Fusion | 2014

Convex ensemble learning with sparsity and diversity

Xu-Cheng Yin; Kaizhu Huang; Chun Yang; Hongwei Hao

Classifier ensemble has been broadly studied in two prevalent directions, i.e., to diversely generate classifier components, and to sparsely combine multiple classifiers. While most current approaches are emphasized on either sparsity or diversity only, we investigate classifier ensemble focused on both in this paper. We formulate the classifier ensemble problem with the sparsity and diversity learning in a general mathematical framework, which proves beneficial for grouping classifiers. In particular, derived from the error-ambiguity decomposition, we design a convex ensemble diversity measure. Consequently, accuracy loss, sparseness regularization, and diversity measure can be balanced and combined in a convex quadratic programming problem. We prove that the final convex optimization leads to a closed-form solution, making it very appealing for real ensemble learning problems. We compare our proposed novel method with other conventional ensemble methods such as Bagging, least squares combination, sparsity learning, and AdaBoost, extensively on a variety of UCI benchmark data sets and the Pascal Large Scale Learning Challenge 2008 webspam data. Experimental results confirm that our approach has very promising performance

north american chapter of the association for computational linguistics | 2015

Short Text Clustering via Convolutional Neural Networks

Jiaming Xu; peng wang; Guanhua Tian; Bo Xu; Jun Zhao; Fangyuan Wang; Hongwei Hao

Short text clustering has become an increasing important task with the popularity of social media, and it is a challenging problem due to its sparseness of text representation. In this paper, we propose a Short Text Clustering via Convolutional neural networks (abbr. to STCC), which is more beneficial for clustering by considering one constraint on learned features through a self-taught learning framework without using any external tags/labels. First, we embed the original keyword features into compact binary codes with a localitypreserving constraint. Then, word embeddings are explored and fed into convolutional neural networks to learn deep feature representations, with the output units fitting the pre-trained binary code in the training process. After obtaining the learned representations, we use K-means to cluster them. Our extensive experimental study on two public short text datasets shows that the deep feature representation learned by our approach can achieve a significantly better performance than some other existing features, such as term frequency-inverse document frequency, Laplacian eigenvectors and average embedding, for clustering.

international conference on document analysis and recognition | 2003

Comparison of genetic algorithm and sequential search methods for classifier subset selection

Hongwei Hao; Cheng-Lin Liu; Hiroshi Sako

Classifier subset selection (CSS) from a large ensemble isan effective way to design multiple classifier systems(MCSs). Given a validation dataset and a selectioncriterion, the task of CSS is reduced to searching thespace of classifier subsets to find the optimal subset. Thisstudy investigates the search efficiency of geneticalgorithm (GA) and sequential search methods for CSS.In experiments of handwritten digit recognition, we selecta subset from 32 candidate classifiers with aim to achievehigh accuracy of combination. The results show that inrespect of optimality, no method wins others in all cases.All the methods are very fast except the generalized plus land take away r(GPTA) method.

systems, man and cybernetics | 2011

An improved topic relevance algorithm for focused crawling

Hongwei Hao; Cui-Xia Mu; Xu-Cheng Yin; Shen Li; Zhi-Bin Wang

Topic relevance of pages and hyperlinks is the key issue in focused crawling. In this paper, an improved topic relevance algorithm for focused crawling is proposed. First, we implement a prototype system of the focused crawler - a topic-specific news gathering system which is prepared for comparative experiments on different similarity measures with the anchor text. Second, experiments on Chinese text corpus show that using LSI (Latent Semantic Indexing) outperforms using TF-IDF (term frequency- inverse document frequency) for hyperlink topic relevance prediction and pages topic relevance calculation. Third, in real crawling experiments on the prototype system, the crawler using TF-IDF has high performance with the accumulated topic relevance increasing quickly at the beginning of crawling, however the crawler using LSI can find more related pages and tunnel through. Fourth, combining their advantages of LSI and TF-IDF, we propose TFIDF+LSI algorithm to guide the crawling. Last, the crawler using TFIDF+LSI performs the same crawl task and demonstrates the combination advantage of TF-IDF and LSI. The experiment suggests that the crawlers performance using TFIDF+LSI is greatly superior to that using either TF-IDF or LSI respectively.

international conference on document analysis and recognition | 2011

Robust Vanishing Point Detection for MobileCam-Based Documents

Xu-Cheng Yin; Hongwei Hao; Jun Sun; Satoshi Naoi

Document images captured by a mobile phone camera often have perspective distortions. In this paper, fast and robust vanishing point detection methods for such perspective documents are presented. Most of previous methods are either slow or unstable. Based on robust detection of text baselines and character tilt orientations, our proposed technology is fast and robust with the following features: (1) quick detection of vanishing point candidates by clustering and voting on the Gaussian sphere space, and (2) precise and efficient detection of the final vanishing points using a hybrid approach, which combines the results from clustering and projection analysis. The rectified image acceptance rate for Mobile Cam-based documents, signboards and posters is more than 98% with an average speed of about 100ms.

Explore More