Wen-Hoar Hsaio | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wen-Hoar Hsaio is active.

Explore More

Publication

Featured researches published by Wen-Hoar Hsaio.

systems man and cybernetics | 2012

Movie Rating and Review Summarization in Mobile Environment

Chien-Liang Liu; Wen-Hoar Hsaio; Chia-Hoang Lee; Gen-Chi Lu; Emery Jou

In this paper, we design and develop a movie-rating and review-summarization system in a mobile environment. The movie-rating information is based on the sentiment-classification result. The condensed descriptions of movie reviews are generated from the feature-based summarization. We propose a novel approach based on latent semantic analysis (LSA) to identify product features. Furthermore, we find a way to reduce the size of summary based on the product features obtained from LSA. We consider both sentiment-classification accuracy and system response time to design the system. The rating and review-summarization system can be extended to other product-review domains easily.

IEEE Transactions on Systems, Man, and Cybernetics | 2014

Semi-Supervised Linear Discriminant Clustering

Chien-Liang Liu; Wen-Hoar Hsaio; Chia-Hoang Lee; Fu-Sheng Gou

This paper devises a semi-supervised learning method called semi-supervised linear discriminant clustering (Semi-LDC). The proposed algorithm considers clustering and dimensionality reduction simultaneously by connecting K-means and linear discriminant analysis (LDA). The goal is to find a feature space where the K-means can perform well in the new space. To exploit the information brought by unlabeled examples, this paper proposes to use soft labels to denote the labels of unlabeled examples. The Semi-LDC uses the proposed algorithm, called constrained-PLSA, to estimate the soft labels of unlabeled examples. We use soft LDA with hard labels of labeled examples and soft labels of unlabeled examples to find a projection matrix. The clustering is then performed in the new feature space. We conduct experiments on three data sets. The experimental results indicate that the proposed method can generally outperform other semi-supervised methods. We further discuss and analyze the influence of soft labels on classification performance by conducting experiments with different percentages of labeled examples. The finding shows that using soft labels can improve performance particularly when the number of available labeled examples is insufficient to train a robust and accurate model. Additionally, the proposed method can be viewed as a framework, since different soft label estimation methods can be used in the proposed method according to application requirements.

IEEE Transactions on Systems, Man, and Cybernetics | 2016

Semi-Supervised Text Classification With Universum Learning

Chien-Liang Liu; Wen-Hoar Hsaio; Chia-Hoang Lee; Tao-Hsing Chang; Tsung-Hsun Kuo

Universum, a collection of nonexamples that do not belong to any class of interest, has become a new research topic in machine learning. This paper devises a semi-supervised learning with Universum algorithm based on boosting technique, and focuses on situations where only a few labeled examples are available. We also show that the training error of AdaBoost with Universum is bounded by the product of normalization factor, and the training error drops exponentially fast when each weak classifier is slightly better than random guessing. Finally, the experiments use four data sets with several combinations. Experimental results indicate that the proposed algorithm can benefit from Universum examples and outperform several alternative methods, particularly when insufficient labeled examples are available. When the number of labeled examples is insufficient to estimate the parameters of classification functions, the Universum can be used to approximate the prior distribution of the classification functions. The experimental results can be explained using the concept of Universum introduced by Vapnik, that is, Universum examples implicitly specify a prior distribution on the set of classification functions.

systems man and cybernetics | 2013

An HMM-Based Algorithm for Content Ranking and Coherence-Feature Extraction

Chien-Liang Liu; Wen-Hoar Hsaio; Chia-Hoang Lee; Hsiao-Cheng Chi

In this paper, we propose an algorithm called coherence hidden Markov model (HMM) to extract coherence features and rank content. Coherence HMM is a variant of HMM and is used to model the stochastic process of essay writing and identify topics as hidden states, given sequenced clauses as observations. This study uses probabilistic latent semantic analysis for parameter estimation of coherence HMM. In coherence-feature extraction, support vector regression (SVR) with surface features and coherence features is used for essay grading. The experimental results indicate that SVR can benefit from coherence features. The adjacent agreement rate and the exact agreement rate are 95.24% and 59.80%, respectively. Moreover, this study submits high-scoring essays to the same experiment and finds that the adjacent agreement rate and exact agreement rate are 98.33% and 64.50%, respectively. In content ranking, we design and implement an intelligent assisted blog writing system based on the coherence-HMM ranking model. Several corpora are employed to help users efficiently compose blog articles. When users finish composing a clause or sentence, the system provides candidate texts for their reference based on current clause or sentence content. The experimental results demonstrate that all participants can benefit from the system and save considerable time on writing articles.

Pattern Recognition | 2017

Locality-constrained max-margin sparse coding

Wen-Hoar Hsaio; Chien-Liang Liu; Wei-Liang Wu

This work devises a locality-constrained max-margin sparse coding (LC-MMSC) framework, which jointly considers reconstruction loss and hinge loss simultaneously. Traditional sparse coding algorithms use ź1 constraint to force the representation to be sparse, leading to computational expensive process to optimize the objective function. This work uses locality constraint in the framework to preserve information of data locality and avoid the optimization of ź1. The obtained representation can achieve the goal of data locality and sparsity. Additionally, this work optimizes coefźcients, dictionaries and classiźcation parameters simultaneously, and uses block coordinate descent to learn all the components of the proposed model. This work uses semi-supervised learning approach in the proposed framework, and the goal is to use both labeled data and unlabeled data to achieve accurate classiźcation performance and improve the generalization of the model. We provide theoretical analysis on the convergence of the proposed LC-MMSC algorithm based on Zangwills global convergence theorem. This work conducts experiments on three real datasets, including Extended YaleB dataset, AR face dataset and Caltech101 dataset. The experimental results indicate that the proposed algorithm outperforms other comparison algorithms. HighlightsDevise a locality-constrained max-margin sparse coding (LC-MMSC) framework.Use both labeled and unlabeled data to construct classification model.Provide theoretical analysis on the convergence of the proposed LC-MMSC.The proposed LC-MMSC outperforms other comparison algorithms on three datasets.

Information Processing and Management | 2013

Clustering tagged documents with labeled and unlabeled documents

Chien-Liang Liu; Wen-Hoar Hsaio; Chia-Hoang Lee; Chun-Hsien Chen

This study employs our proposed semi-supervised clustering method called Constrained-PLSA to cluster tagged documents with a small amount of labeled documents and uses two data sets for system performance evaluations. The first data set is a document set whose boundaries among the clusters are not clear; while the second one has clear boundaries among clusters. This study employs abstracts of papers and the tags annotated by users to cluster documents. Four combinations of tags and words are used for feature representations. The experimental results indicate that almost all of the methods can benefit from tags. However, unsupervised learning methods fail to function properly in the data set with noisy information, but Constrained-PLSA functions properly. In many real applications, background knowledge is ready, making it appropriate to employ background knowledge in the clustering process to make the learning more fast and effective.

Neurocomputing | 2017

Maximum-margin sparse coding

Chien-Liang Liu; Wen-Hoar Hsaio; Bin Xiao; Chun-Yu Chen; Wei-Liang Wu

This work devises a maximum-margin sparse coding algorithm, jointly considering reconstruction loss and hinge loss in the model. The sparse representation along with maximum-margin constraint is analogous to kernel trick and maximum-margin properties of support vector machine (SVM), giving a base for the proposed algorithm to perform well in classification tasks. The key idea behind the proposed method is to use labeled and unlabeled data to learn discriminative representations and model parameters simultaneously, making it easier to classify data in the new space. We propose to use block coordinate descent to learn all the components of the proposed model and give detailed derivation for the update rules of the model variables. Theoretical analysis on the convergence of the proposed MMSC algorithm is provided based on Zangwills global convergence theorem. Additionally, most previous research studies on dictionary learning suggest to use an overcomplete dictionary to improve classification performance, but it is computationally intensive when the dimension of the input data is huge. We conduct experiments on several real data sets, including Extended YaleB, AR face, and Caltech101 data sets. The experimental results indicate that the proposed algorithm outperforms other comparison algorithms without an overcomplete dictionary, providing flexibility to deal with high-dimensional data sets.

intelligent data analysis | 2018

Bayesian exploratory clustering with entropy Chinese restaurant process

Chien-Liang Liu; Wen-Hoar Hsaio; Che-Yuan Lin

Data exploration is essential to data analytics, especially when one is confronted with massive datasets. Clustering is a commonly used technique in data exploration, since it can automatically group data instances into a list of meaningful categories, and capture the natural structure of data. Traditional finite mixture model requires the number of clusters to be specified in advance of analyzing the data, and this parameter is crucial to the clustering performance. Chinese restaurant process (CRP) mixture model provides an alternative to this problem, allowing the model complexity to grow as more data instances are observed. Although CRP provides the flexibility to create a new cluster for subsequent data instances, one still has to determine the hyperparameter of the prior and the parameters for the base distribution in the likelihood part. This work proposes a non-parametric clustering algorithm based on CRP with two main differences. First, we propose to create a new cluster based on entropy of the posterior, whereas the CRP uses a hyperparameter to control the probability of creating a new cluster. Second, we propose to dynamically adjust the parameters of the base distribution according to the mean of the observed data owing to Chebyshev’s inequality. Additionally, detailed derivation and update rules are provided to perform posterior inference with the proposed collapsed Gibbs sampling algorithm. The experimental results indicate that the proposed algorithm avoids to specify the number of clusters and works well on several datasets.

intelligent data analysis | 2017

Nonparametric multi-assignment clustering

Chien-Liang Liu; Wen-Hoar Hsaio; Tao-Hsing Chang; Tzai-Min Jou

Multi-label learning has attracted significant attention from machine learning and data mining over the last decade. Although many multi-label classification algorithms have been devised, few research studies focus on multi-assignment clustering (MAC), in which a data instance can be assigned to multiple clusters. The MAC problem is practical in many application domains, such as document clustering, customer segmentation and image clustering. Additionally, specifying the number of clusters is always a difficult but critical problem for a certain class of clustering algorithms. Hence, this work proposes a nonparametric multiassignment clustering algorithm called multi-assignment Chinese restaurant process (MACRP), which allows the model complexity to grow as more data instances are observed. The proposed algorithm determines the number of clusters from data, so it provides a practical model to process massive data sets. In the proposed algorithm, we devise a novel prior distribution based on the similarity graph to achieve the goal of multiassignment, and propose a Gibbs sampling algorithm to carry out posterior inference. The implementation in this work uses collapsed Gibbs sampling and compares with several methods. Additionally, previous evaluation metrics used by multi-label classification are inappropriate for MAC, since label information is unavailable. This work further devises an evaluation metric for MAC based on the characteristics of clustering and multiassignment problems. We conduct experiments on two real data sets, and the experimental results indicate that the proposed method is competitive and outperforms the alternatives on most data sets.

Journal of Information Science and Engineering | 2018