Yu-Chieh Wu
Ming Chuan University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yu-Chieh Wu.
Pattern Recognition | 2008
Yu-Chieh Wu; Yue-Shi Lee; Jie-Chi Yang
Phrase pattern recognition (phrase chunking) refers to automatic approaches for identifying predefined phrase structures in a stream of text. Support vector machines (SVMs)-based methods had shown excellent performance in many sequential text pattern recognition tasks such as protein name finding, and noun phrase (NP)-chunking. Even though they yield very accurate results, they are not efficient for online applications, which need to handle hundreds of thousand words in a limited time. In this paper, we firstly re-examine five typical multiclass SVM methods and the adaptation to phrase chunking. However, most of them were inefficient when the number of phrase types scales. We thus introduce the proposed two new multiclass SVM models that make the system substantially faster in terms of training and testing while keeps the SVM accurate. The two methods can also be applied to similar tasks such as named entity recognition and Chinese word segmentation. Experiments on CoNLL-2000 chunking and Chinese base-chunking tasks showed that our method can achieve very competitive accuracy and at least 100 times faster than the state-of-the-art SVM-based phrase chunking method. Besides, the computational time complexity and the time cost analysis of our methods were also given in this paper.
Information Sciences | 2013
Show-Jane Yen; Yu-Chieh Wu; Jie-Chi Yang; Yue-Shi Lee; Chung-Jung Lee; Jui-Jung Liu
Modern information technologies and Internet services are suffering from the problem of selecting and managing a growing amount of textual information, to which access is often critical. Machine learning techniques have recently shown excellent performance and flexibility in many applications, such as artificial intelligence and pattern recognition. Question answering (QA) is a method of locating exact answer sentences from vast document collections. This paper presents a machine learning-based question-answering framework, which integrates a question classifier, simple document/passage retrievers, and the proposed context-ranking models. The question classifier is trained to categorize the answer type of the given question and instructs the context-ranking model to re-rank the passages retrieved from the initial retrievers. This method provides flexible features to learners, such as word forms, syntactic features, and semantic word features. The proposed context-ranking model, which is based on the sequential labeling of tasks, combines rich features to predict whether the input passage is relevant to the question type. We employ TREC-QA tracks and question classification benchmarks to evaluate the proposed method. The experimental results show that the question classifier achieves 85.60% accuracy without any additional semantic or syntactic taggers, and reached 88.60% after we employed the proposed term expansion techniques and a predefined related-word set. In the TREC-10 QA task, by using the gold TREC-provided relevant document set, the QA model achieves a 0.563 mean reciprocal rank (MRR) score, and a 0.342 MRR score is achieved after using the simple document and passage retrieval algorithms.
pacific-asia conference on knowledge discovery and data mining | 2006
Yu-Chieh Wu; Teng-Kai Fan; Yue-Shi Lee; Show-Jane Yen
Identifying proper names, like gene names, DNAs, or proteins is useful to help researchers to mining the text information. Learning to extract proper names in natural language text is a named entity recognition (NER) task. Previous studies focus on combining abundant human made rules, trigger words, to enhance the system performance. However these methods require domain experts to build up these rules and word set which relies on lots of human efforts. In this paper, we present a robust named entity recognition system based on support vector machines (SVM). By integrating with rich feature set and the proposed mask method, the system performance is satisfactory on the MUC-7 and biology named entity recognition tasks which outperforms famous machine learning-based method, such as hidden markov model (HMM), and maximum entropy model (MEM). We compare our method to previous systems that were performed on the same data set. The experiments show that when training with the MUC-7 data set, our system achieves 86.4 in F(β=1) rate and 81.57 for the biology corpus. Besides, our named entity system is able to handle real time processing applications, the turn around time on a 63 K words document set is less than 30 seconds.
international conference on computational linguistics | 2006
Yu-Chieh Wu; Chia-Hui Chang; Yue-Shi Lee
Several phrase chunkers have been proposed over the past few years. Some state-of-the-art chunkers achieved better performance via integrating external resources, e.g., parsers and additional training data, or combining multiple learners. However, in many languages and domains, such external materials are not easily available and the combination of multiple learners will increase the cost of training and testing. In this paper, we propose a mask method to improve the chunking accuracy. The experimental results show that our chunker achieves better performance in comparison with other deep parsers and chunkers. For CoNLL-2000 data set, our system achieves 94.12 in F rate. For the base-chunking task, our system reaches 92.95 in F rate. When porting to Chinese, the performance of the base-chunking task is 92.36 in F rate. Also, our chunker is quite efficient. The complete chunking time of a 50K words document is about 50 seconds.
IEEE Transactions on Circuits and Systems for Video Technology | 2008
Yu-Chieh Wu; Jie-Chi Yang
In this paper, we present a robust passage retrieval algorithm to extend the conventional text question answering (Q/A) to videos. Users interact with our videoQ/A system through natural language queries, while the top-ranked passage fragments with associated video clips are returned as answers. We compare our method with five of the high-performance ranking algorithms that are portable to different languages and domains. The experiments were evaluated with 75.3 h of Chinese videos and 253 questions. The experimental results showed that our method outperformed the second best retrieval model (language models) in relatively 1.43% in mean reciprocal rank (MRR) score and 11.36% when employing a Chinese word segmentation tool. By adopting the initial retrieval results from the retrieval models, our method yields an improvement of at least 5.94% improvement in MRR score. This makes it very attractive for the Asia-like languages since the use of a well-developed word tokenizer is unnecessary.
meeting of the association for computational linguistics | 2007
Yu-Chieh Wu; Jie-Chi Yang; Yue-Shi Lee
Kernel methods such as support vector machines (SVMs) have attracted a great deal of popularity in the machine learning and natural language processing (NLP) communities. Polynomial kernel SVMs showed very competitive accuracy in many NLP problems, like part-of-speech tagging and chunking. However, these methods are usually too inefficient to be applied to large dataset and real time purpose. In this paper, we propose an approximate method to analogy polynomial kernel with efficient data mining approaches. To prevent exponential-scaled testing time complexity, we also present a new method for speeding up SVM classifying which does independent to the polynomial degree d. The experimental results showed that our method is 16.94 and 450 times faster than traditional polynomial kernel in terms of training and testing respectively.
conference on computational natural language learning | 2006
Yu-Chieh Wu; Yue-Shi Lee; Jie-Chi Yang
In this paper, we propose a three-step multilingual dependency parser, which generalizes an efficient parsing algorithm at first phase, a root parser and post-processor at the second and third stages. The main focus of our work is to provide an efficient parser that is practical to use with combining only lexical and part-of-speech features toward language independent parsing. The experimental results show that our method outperforms Maltparser in 13 languages. We expect that such an efficient model is applicable for most languages.
Expert Systems With Applications | 2011
Show-Jane Yen; Yue-Shi Lee; Jia-Ching Ying; Yu-Chieh Wu
Automatic Chinese text classification is an important and a well-known technology in the field of machine learning. The first step for solving Chinese text categorization problems is to tokenize the Chinese words from a sequence of non-segmented sentences. However, previous literatures often employ a Chinese word tokenizer that was trained with different sources and then perform the conventional text classification approaches. However, these taggers are not perfect and often provide incorrect word boundary information. In this paper, we propose an N-gram-based language model which takes word relations into account for Chinese text categorization without Chinese word tokenizer. To prevent from out-of-vocabulary, we also propose a novel smoothing approach based on logistic regression to improve accuracy. The experimental result shows that our approach outperforms traditional methods at least 11% on micro-average F-measure.
Information Sciences | 2014
Yu-Chieh Wu
Abstract Semi-supervised machine learning methods have the features of both, integrating labeled and unlabeled training data. In most structural problems, such as natural language processing and image processing, developing labeled data for a specific domain requires considerable amount of human resources. In this paper, we present a cluster-based method to fuse labeled training and unlabeled raw data. We design a top-down divisive clustering algorithm that ensures maximal information gain in the use of unlabeled data via clustering similar words. To implement this idea, we design a top-down iterative K -means clustering algorithm to merge word clusters. Differently, the derived term groups are then encoded as new features for the supervised learners in order to improve the coverage of lexical information. Without additional training data or external materials, this approach yields state-of-the-art performance on the shallow parsing and base-chunking benchmark datasets (94.50 and 93.12 in F ( β ) rates).
international conference on advanced learning technologies | 2011
Wen-Hsuan Chang; Jie-Chi Yang; Yu-Chieh Wu
In general, video-based learning contains rich media information, but displaying an entire video linearly is time-consuming. As an alternative, video summarization techniques extract important content provides short but informative fragments. In this paper, a video learning platform (KVSUM: Keyword-based Video Summarization) is presented that integrates image processing, text summarization, and keyword extraction techniques. Without human annotators, the learning platform can process an input video and transform it into online learning materials automatically. The video frames are first split from a given video while the transcription is used to generate the text summary and keywords. In the current study, the video surrogates are composed of extracted keywords, text and video summaries, and video frames. In other words, KVSUM is able to provide both visual and verbal surrogates. In order to validate the effect of surrogates in the KVSUM, a comparison with another video surrogate, the fast forward (FF), to evaluate learners¡¦ comprehension to video contents. Sixty undergraduate students took part in examining two different video surrogates. The experimental results show that KVSUM had a more positive effect than FF in comprehension to videos. In terms of system usage and satisfaction, KVSUM is significantly more attractive than FF to learners.