Chia-Wei Wu
Academia Sinica
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Chia-Wei Wu.
asia information retrieval symposium | 2005
Chia-Wei Wu; Tzong-Han Tsai; Wen-Lian Hsu
As web taxonomy integration is an emerging issue on the Internet, many research topics, such as personalization, web searches, and electronic markets, would benefit from further development of taxonomy integration techniques. The integration task is to transfer documents from a source web taxonomy to a target web taxonomy. In most current techniques, integration performance is enhanced by referring to the relations between corresponding categories in the source and target taxonomies. However, the techniques may not be effective, since the concepts of the corresponding categories may overlap partially. In this paper we present an effective approach for integrating taxonomies and alleviating the partial overlap problem by considering fine-grained relations using a Maximum Entropy Model. The experiment results show that the proposed approach improves the classification accuracy of taxonomies over previous approaches.
ACM Transactions on Asian Language Information Processing | 2008
Cheng-Wei Lee; Min-Yuh Day; Cheng-Lung Sung; Yi-Hsun Lee; Tian-Jian Jiang; Chia-Wei Wu; Cheng-Wei Shih; Yu-Ren Chen; Wen-Lian Hsu
Question Answering (QA) research has been conducted in many languages. Nearly all the top performing systems use heavy methods that require sophisticated techniques, such as parsers or logic provers. However, such techniques are usually unavailable or unaffordable for under-resourced languages or in resource-limited situations. In this article, we describe how a top-performing Chinese QA system can be designed by using lightweight methods effectively. We propose two lightweight methods, namely the Sum of Co-occurrences of Question and Answer Terms (SCO-QAT) and Alignment-based Surface Patterns (ABSPs). SCO-QAT is a co-occurrence-based answer-ranking method that does not need extra knowledge, word-ignoring heuristic rules, or tools. It calculates co-occurrence scores based on the passage retrieval results. ABSPs are syntactic patterns trained from question-answer pairs with a multiple alignment algorithm. They are used to capture the relations between terms and then use the relations to filter answers. We attribute the success of the ABSPs and SCO-QAT methods to the effective use of local syntactic information and global co-occurrence information. By using SCO-QAT and ABSPs, we improved the RU-Accuracy of our testbed QA system, ASQA, from 0.445 to 0.535 on the NTCIR-5 dataset. It also achieved the top 0.5 RU-Accuracy on the NTCIR-6 dataset. The result shows that lightweight methods are not only cheaper to implement, but also have the potential to achieve state-of-the-art performances.
Expert Systems With Applications | 2008
Chia-Wei Wu; Richard Tzong-Han Tsai; Cheng-Wei Lee; Wen-Lian Hsu
We address the problem of integrating web taxonomies from different real Internet applications. Integrating web taxonomies is to transfer instances from a source to target taxonomy. Unlike the conventional text categorization problem, in taxonomy integration, the source taxonomy contains extra information that can be used to improve the categorization. The major existing methods can be divided in two types: those that use neighboring categories to smooth the document term vector and those that consider the semantic relationship between corresponding categories of the target and source taxonomies to facilitate categorization. In contrast to the first type of approach, which only uses a flattened hierarchy for smoothing, we apply a hierarchy shrinkage algorithm to smooth child documents by their parents. We also discuss the effect of using different hierarchical levels for smoothing. To extend the second type of approach, we extract fine-grain semantic relationships, which consider the relationships between lower-level categories. In addition, we use the cosine similarity to measure the semantic relationships, which achieves better performance than existing methods. Finally, we integrate the existing approaches and the proposed methods into one machine learning model to find the best feature configuration. The results of experiments on real Internet data demonstrate that our system outperforms standard text classifiers by about 10%.
asia information retrieval symposium | 2008
Chia-Wei Wu; Richard Tzong-Han Tsai; Wen-Lian Hsu
Named entity recognition (NER) is an essential component of text mining applications. In Chinese sentences, words do not have delimiters; thus, incorporating word segmentation information into an NER model can improve its performance. Based on the framework of dynamic conditional random fields, we propose a novel labeling format, called semi-joint labeling which partially integrates word segmentation information and named entity tags for NER. The model enhances the interaction of segmentation tags and NER achieved by traditional approaches. Moreover, it allows us to consider interactions between multiple chains in a linear-chain model. We use data from the SIGHAN 2006 NER bakeoff to evaluate the proposed model. The experimental results demonstrate that our approach outperforms state-of-the-art systems.
NTCIR | 2005
Cheng-Wei Lee; Cheng-Wei Shih; Min-Yuh Day; Tzong-Han Tsai; Mike Tian-Jian Jiang; Chia-Wei Wu; Cheng-Lung Sung; Yu-Ren Chen; Shih-Hung Wu; Wen-Lian Hsu
meeting of the association for computational linguistics | 2006
Chia-Wei Wu; Shyh-Yi Jan; Richard Tzong-Han Tsai; Wen-Lian Hsu
IJCNLP (companion) | 2005
Tzong-Han Tsai; Chia-Wei Wu; Wen-Lian Hsu
NTCIR | 2007
Cheng-Wei Lee; Min-Yuh Day; Cheng-Lung Sung; Yi-Hsun Lee; Tian-Jian Jiang; Chia-Wei Wu; Cheng-Wei Shih; Yu-Ren Chen; Wen-Lian Hsu
ROCLING | 2005
Shih-Hung Wu; Cheng-Wei Shih; Chia-Wei Wu; Tzong-Han Tsai; Wen-Lian Hsu
text retrieval conference | 2005
Tzong-Han Tsai; Chia-Wei Wu; Hsieh-Chuan Hung; Yu-Chun Wang; Ding He; Yi-Feng Lin; Cheng-Wei Lee; Ting-Yi Sung; Wen-Lian Hsu