Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Cheng-Wei Shih is active.

Publication


Featured researches published by Cheng-Wei Shih.


中文計算語言學期刊 | 2004

Mencius: A Chinese Named Entity Recognizer Using the Maximum Entropy-based Hybrid Model

Tzong-Han Tsai; Shih-Hung Wu; Cheng-Wei Lee; Cheng-Wei Shih; Wen-Lian Hsu

This paper presents a Chinese named entity recognizer (NER): Mencius. It aims to address Chinese NER problems by combining the advantages of rule-based and machine learning (ML) based NER systems. Rule-based NER systems can explicitly encode human comprehension and can be tuned conveniently, while ML-based systems are robust, portable and inexpensive to develop. Our hybrid system incorporates a rule-based knowledge representation and template-matching tool, called InfoMap [Wu et al. 2002], into a maximum entropy (ME) framework. Named entities are represented in InfoMap as templates, which serve as ME features in Mencius. These features are edited manually, and their weights are estimated by the ME framework according to the training data. To understand how word segmentation might influence Chinese NER and the differences between a pure template-based method and our hybrid method, we configure Mencius using four distinct settings. The F-Measures of person names (PER), location names (LOC) and organization names (ORO) of the best configuration in our experiment were respectively 94.3%, 77.8% and 75.3%. From comparing the experiment results obtained using these configurations reveals that hybrid NER Systems always perform better performance in identifying person names. On the other hand, they have a little difficulty identifying location and organization names. Furthermore, using a word segmentation module improves the performance of pure Template-based NER Systems, but, it has little effect on hybrid NER systems.


ACM Transactions on Asian Language Information Processing | 2012

Validating Contradiction in Texts Using Online Co-Mention Pattern Checking

Cheng-Wei Shih; Cheng-Wei Lee; Richard Tzong-Han Tsai; Wen-Lian Hsu

Detecting contradictive statements is a foundational and challenging task for text understanding applications such as textual entailment. In this article, we aim to address the problem of the shortage of specific background knowledge in contradiction detection. A novel contradiction detecting approach based on the distribution of the query composed of critical mismatch combinations on the Internet is proposed to tackle the problem. By measuring the availability of mismatch conjunction phrases (MCPs), the background knowledge about two target statements can be implicitly obtained for identifying contradictions. Experiments on three different configurations show that the MCP-based approach achieves remarkable improvement on contradiction detection and can significantly improve the performance of textual entailment recognition.


ACM Transactions on Asian Language Information Processing | 2008

Boosting Chinese Question Answering with Two Lightweight Methods: ABSPs and SCO-QAT

Cheng-Wei Lee; Min-Yuh Day; Cheng-Lung Sung; Yi-Hsun Lee; Tian-Jian Jiang; Chia-Wei Wu; Cheng-Wei Shih; Yu-Ren Chen; Wen-Lian Hsu

Question Answering (QA) research has been conducted in many languages. Nearly all the top performing systems use heavy methods that require sophisticated techniques, such as parsers or logic provers. However, such techniques are usually unavailable or unaffordable for under-resourced languages or in resource-limited situations. In this article, we describe how a top-performing Chinese QA system can be designed by using lightweight methods effectively. We propose two lightweight methods, namely the Sum of Co-occurrences of Question and Answer Terms (SCO-QAT) and Alignment-based Surface Patterns (ABSPs). SCO-QAT is a co-occurrence-based answer-ranking method that does not need extra knowledge, word-ignoring heuristic rules, or tools. It calculates co-occurrence scores based on the passage retrieval results. ABSPs are syntactic patterns trained from question-answer pairs with a multiple alignment algorithm. They are used to capture the relations between terms and then use the relations to filter answers. We attribute the success of the ABSPs and SCO-QAT methods to the effective use of local syntactic information and global co-occurrence information. By using SCO-QAT and ABSPs, we improved the RU-Accuracy of our testbed QA system, ASQA, from 0.445 to 0.535 on the NTCIR-5 dataset. It also achieved the top 0.5 RU-Accuracy on the NTCIR-6 dataset. The result shows that lightweight methods are not only cheaper to implement, but also have the potential to achieve state-of-the-art performances.


International Journal of Computational Linguistics & Chinese Language Processing, Volume 17, Number 3, September 2012 | 2012

Enhancement of Feature Engineering for Conditional Random Field Learning in Chinese Word Segmentation Using Unlabeled Data

Mike Tian-Jian Jiang; Cheng-Wei Shih; Ting-Hao Yang; Chan-Hung Kuo; Richard Tzong-Han Tsai; Wen-Lian Hsu

This work proposes a unified view of several features based on frequent strings extracted from unlabeled data that improve the conditional random fields (CRF) model for Chinese word segmentation (CWS). These features include character-based n-gram (CNG), accessor variety based string (AVS) and its variation of left-right co-existed feature (LRAVS), term-contributed frequency (TCF), and term-contributed boundary (TCB) with a specific manner of boundary overlapping. For the experiments, the baseline is the 6-tag, a state-of-the-art labeling scheme of CRF-based CWS, and the data set is acquired from the 2005 CWS Bakeoff of Special Interest Group on Chinese Language Processing (SIGHAN) of the Association for Computational Linguistics (ACL) and SIGHAN CWS Bakeoff 2010. The experimental results show that all of these features improve the performance of the baseline system in terms of recall, precision, and their harmonic average as F1 measure score, on both accuracy (F) and out-of-vocabulary recognition (FOOV). In particular, this work presents compound features involving LRAVS/AVS and TCF/TCB that are competitive with other types of features for CRF-based CWS in terms of F and FOOV, respectively.


NTCIR | 2005

ASQA: Academia Sinica Question Answering System for NTCIR-5 CLQA

Cheng-Wei Lee; Cheng-Wei Shih; Min-Yuh Day; Tzong-Han Tsai; Mike Tian-Jian Jiang; Chia-Wei Wu; Cheng-Lung Sung; Yu-Ren Chen; Shih-Hung Wu; Wen-Lian Hsu


NTCIR | 2008

Complex Question Answering with ASQA at NTCIR 7 ACLIA.

Yi-Hsun Lee; Cheng-Wei Lee; Cheng-Lung Sung; Mon-Ting Tzou; Chih-Chien Wang; Shih-Hung Liu; Cheng-Wei Shih; Pei-Yin Yang; Wen-Lian Hsu


NTCIR | 2007

Chinese-Chinese and English-Chinese Question Answering with ASQA at NTCIR-6 CLQA

Cheng-Wei Lee; Min-Yuh Day; Cheng-Lung Sung; Yi-Hsun Lee; Tian-Jian Jiang; Chia-Wei Wu; Cheng-Wei Shih; Yu-Ren Chen; Wen-Lian Hsu


ROCLING | 2005

Applying Maximum Entropy to Robust Chinese Shallow Parsing.

Shih-Hung Wu; Cheng-Wei Shih; Chia-Wei Wu; Tzong-Han Tsai; Wen-Lian Hsu


NTCIR | 2011

IASL RITE System at NTCIR-10.

Cheng-Wei Shih; Cheng-Wei Lee; Ting-Hao Yang; Wen-Lian Hsu


Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing | 2013

Sinica-IASL Chinese spelling check system at Sighan-7

Ting-Hao Yang; Yu-Lun Hsieh; Yu-Hsuan Chen; Michael Tsang; Cheng-Wei Shih; Wen-Lian Hsu

Collaboration


Dive into the Cheng-Wei Shih's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Shih-Hung Wu

Chaoyang University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mike Tian-Jian Jiang

National Tsing Hua University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge