Cheng-Wei Lee | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Cheng-Wei Lee is active.

Explore More

Publication

Featured researches published by Cheng-Wei Lee.

decision support systems | 2007

Reference metadata extraction using a hierarchical knowledge representation framework

Min-Yuh Day; Richard Tzong-Han Tsai; Cheng-Lung Sung; Chiu-Chen Hsieh; Cheng-Wei Lee; Shih-Hung Wu; Kuen-Pin Wu; Chorng-Shyong Ong; Wen-Lian Hsu

The integration of bibliographical information on scholarly publications available on the Internet is an important task in the academic community. Accurate reference metadata extraction from such publications is essential for the integration of metadata from heterogeneous reference sources. In this paper, we propose a hierarchical template-based reference metadata extraction method for scholarly publications. We adopt a hierarchical knowledge representation framework called INFOMAP, which automatically extracts metadata. The experimental results show that, by using INFOMAP, we can extract author, title, journal, volume, number (issue), year, and page information from different kinds of reference styles with a high degree of precision. The overall average accuracy is 92.39% for the six major reference styles compared in this study.

information reuse and integration | 2005

A knowledge-based approach to citation extraction

Min-Yuh Day; Tzong-Han Tsai; Cheng-Lung Sung; Cheng-Wei Lee; Shih-Hung Wu; Chorng-Shyong Ong; Wen-Lian Hsu

Integration of the bibliographical information of scholarly publications available on the Internet is an important task in academic research. To accomplish this task, accurate reference metadata extraction for scholarly publications is essential for the integration of information from heterogeneous reference sources. In this paper, we propose a knowledge-based approach to literature mining and focus on reference metadata extraction methods for scholarly publications. We adopt an ontological knowledge representation framework called INFOMAP to automatically extract the reference metadata. The experimental results show that, by using INFOMAP, we can extract author, title, journal, volume, number (issue), year, and page information from different reference styles with a high degree of accuracy. The overall average field accuracy of citation extraction for a bioinformatics dataset is 97.87% for six reference styles.

中文計算語言學期刊 | 2004

Mencius: A Chinese Named Entity Recognizer Using the Maximum Entropy-based Hybrid Model

Tzong-Han Tsai; Shih-Hung Wu; Cheng-Wei Lee; Cheng-Wei Shih; Wen-Lian Hsu

This paper presents a Chinese named entity recognizer (NER): Mencius. It aims to address Chinese NER problems by combining the advantages of rule-based and machine learning (ML) based NER systems. Rule-based NER systems can explicitly encode human comprehension and can be tuned conveniently, while ML-based systems are robust, portable and inexpensive to develop. Our hybrid system incorporates a rule-based knowledge representation and template-matching tool, called InfoMap [Wu et al. 2002], into a maximum entropy (ME) framework. Named entities are represented in InfoMap as templates, which serve as ME features in Mencius. These features are edited manually, and their weights are estimated by the ME framework according to the training data. To understand how word segmentation might influence Chinese NER and the differences between a pure template-based method and our hybrid method, we configure Mencius using four distinct settings. The F-Measures of person names (PER), location names (LOC) and organization names (ORO) of the best configuration in our experiment were respectively 94.3%, 77.8% and 75.3%. From comparing the experiment results obtained using these configurations reveals that hybrid NER Systems always perform better performance in identifying person names. On the other hand, they have a little difficulty identifying location and organization names. Furthermore, using a word segmentation module improves the performance of pure Template-based NER Systems, but, it has little effect on hybrid NER systems.

international conference natural language processing | 2005

An integrated knowledge-based and machine learning approach for Chinese question classification

Min-Yuh Day; Cheng-Wei Lee; Shih-Hung Wu; Chorng-Shyong Ong; Wen-Lian Hsu

Question classification plays an important role in question-answering systems. Chinese question classification is the process that analyzes a question and labels it based on its question type and expected answer type. In this paper, we propose an integrated knowledge-based and machine learning approach for Chinese question classification that focuses on factoid question answering. We develop a Chinese question classification scheme for CLQA C-C (cross language question answering Chinese to Chinese) factoid question answering, and define a coarse-grained and fine-grained classification taxonomy for a Chinese question-answering system. We adopt INFOMAP inference engine to support the knowledge-based approach for Chinese questions, which can be formulated as templates and use SVM (support vector machines) as the machine learning approach for large collections of labeled Chinese questions. Our experimental results show that the accuracy of Chinese question classification using INFOMAP alone is 88% and 73.5% with SVM alone. In contrast, classification based on a hybrid approach that incorporates SVM and INFOMAP yields an accuracy rate of 92%.

information reuse and integration | 2008

An alignment-based surface pattern for a question answering system

Cheng-Lung Sung; Cheng-Wei Lee; Hsu Chun Yen; Wen-Lian Hsu

In this paper, we propose an alignment-based surface pattern approach, called ABSP, which integrates semantic information into syntactic patterns for question answering (QA). ABSP uses surface patterns to extract important terms from questions, and constructs the terms’ relations from sentences in the corpus. The relations are then used to filter appropriate answer candidates. Experiments show that ABSP can achieve high accuracy and can be incorporated into other QA systems that have high coverage. It can also be used in cross-lingual QA systems. The approach is both robust and portable to other domains.

ACM Transactions on Asian Language Information Processing | 2012

Validating Contradiction in Texts Using Online Co-Mention Pattern Checking

Cheng-Wei Shih; Cheng-Wei Lee; Richard Tzong-Han Tsai; Wen-Lian Hsu

Detecting contradictive statements is a foundational and challenging task for text understanding applications such as textual entailment. In this article, we aim to address the problem of the shortage of specific background knowledge in contradiction detection. A novel contradiction detecting approach based on the distribution of the query composed of critical mismatch combinations on the Internet is proposed to tackle the problem. By measuring the availability of mismatch conjunction phrases (MCPs), the background knowledge about two target statements can be implicitly obtained for identifying contradictions. Experiments on three different configurations show that the MCP-based approach achieves remarkable improvement on contradiction detection and can significantly improve the performance of textual entailment recognition.

ACM Transactions on Asian Language Information Processing | 2008

Boosting Chinese Question Answering with Two Lightweight Methods: ABSPs and SCO-QAT

Cheng-Wei Lee; Min-Yuh Day; Cheng-Lung Sung; Yi-Hsun Lee; Tian-Jian Jiang; Chia-Wei Wu; Cheng-Wei Shih; Yu-Ren Chen; Wen-Lian Hsu

Question Answering (QA) research has been conducted in many languages. Nearly all the top performing systems use heavy methods that require sophisticated techniques, such as parsers or logic provers. However, such techniques are usually unavailable or unaffordable for under-resourced languages or in resource-limited situations. In this article, we describe how a top-performing Chinese QA system can be designed by using lightweight methods effectively. We propose two lightweight methods, namely the Sum of Co-occurrences of Question and Answer Terms (SCO-QAT) and Alignment-based Surface Patterns (ABSPs). SCO-QAT is a co-occurrence-based answer-ranking method that does not need extra knowledge, word-ignoring heuristic rules, or tools. It calculates co-occurrence scores based on the passage retrieval results. ABSPs are syntactic patterns trained from question-answer pairs with a multiple alignment algorithm. They are used to capture the relations between terms and then use the relations to filter answers. We attribute the success of the ABSPs and SCO-QAT methods to the effective use of local syntactic information and global co-occurrence information. By using SCO-QAT and ABSPs, we improved the RU-Accuracy of our testbed QA system, ASQA, from 0.445 to 0.535 on the NTCIR-5 dataset. It also achieved the top 0.5 RU-Accuracy on the NTCIR-6 dataset. The result shows that lightweight methods are not only cheaper to implement, but also have the potential to achieve state-of-the-art performances.

Expert Systems With Applications | 2008

Web taxonomy integration with hierarchical shrinkage algorithm and fine-grained relations

Chia-Wei Wu; Richard Tzong-Han Tsai; Cheng-Wei Lee; Wen-Lian Hsu

We address the problem of integrating web taxonomies from different real Internet applications. Integrating web taxonomies is to transfer instances from a source to target taxonomy. Unlike the conventional text categorization problem, in taxonomy integration, the source taxonomy contains extra information that can be used to improve the categorization. The major existing methods can be divided in two types: those that use neighboring categories to smooth the document term vector and those that consider the semantic relationship between corresponding categories of the target and source taxonomies to facilitate categorization. In contrast to the first type of approach, which only uses a flattened hierarchy for smoothing, we apply a hierarchy shrinkage algorithm to smooth child documents by their parents. We also discuss the effect of using different hierarchical levels for smoothing. To extend the second type of approach, we extract fine-grain semantic relationships, which consider the relationships between lower-level categories. In addition, we use the cosine similarity to measure the semantic relationships, which achieves better performance than existing methods. Finally, we integrate the existing approaches and the proposed methods into one machine learning model to find the best feature configuration. The results of experiments on real Internet data demonstrate that our system outperforms standard text classifiers by about 10%.

Journal of Organometallic Chemistry | 1988

Cycloaddition reactions between tetrafluorodisilicyclobutene and cyclic dienes mediated by transition metal carbonyls

Chih‐Yuan Lin; Cheng-Wei Lee; T.T. Jzang; Chien-Fu Lin; Chih-Hao Liu

Under the mediation of transition metal carbonyls, cycloaddition of 3-t-butyl-1,1,2,2-tetrafluoro,2-disilacyclobutane (1) to various cyclic dienes have been studied under photochemical conditions. For conjugates dienes, Ni and Co mediated reactions give 1,4-addition products, whereas Fe, Mo and W mediated reactions result in 1,2-addition products. The hard metal Cr mediates the reactions to give products due to F migration. The reaction of non-conjugate norbornadiene with 1 gives 1,2- and 1,4-addition products under Cr mediation but gives only 1,2-addition products in the case of Mo mediation. Correlations between the structures of the intemediates and reaction pathways are discussed.

中文計算語言學期刊 | 2008

Exploring Shallow Answer Ranking Features in Cross-Lingual and Monolingual Factoid Question Answering

Cheng-Wei Lee; Yi-Hsun Lee; Wen-Lian Hsu

Answer ranking is critical to a QA (Question Answering) system because it determines the final system performance. In this paper, we explore the behavior of shallow ranking features under different conditions. The features are easy to implement and are also suitable when complex NLP techniques or resources are not available for monolingual or cross-lingual tasks. We analyze six shallow ranking features, namely, SCO-QAT, keyword overlap, density, IR score, mutual information score, and answer frequency. SCO-QAT (Sum of Co-occurrence of Question and Answer Terms) is a new feature proposed by us that performed well in NTCIR CLQA. It is a co-occurrence based feature that does not need extra knowledge, word-ignoring heuristic rules, or special tools. Instead, for the whole corpus, SCO-QAT calculates co-occurrence scores based solely on the passage retrieval results. Our experiments show that there is no perfect shallow ranking feature for every condition. SCO-QAT performs the best in C-C (Chinese-Chinese) QA, but it is not a good choice in E-C (English-Chinese) QA. Overall, Frequency is the best choice for E-C QA, but its performance is impaired when translation noise is present. We also found that passage depth has little impact on shallow ranking features, and that a proper answer filter with fined-grained answer types is important for E-C QA. We measured the performance of answer ranking in terms of a newly proposed metric EAA (Expected Answer Accuracy) to cope with cases of answers that have the same score after ranking.

Explore More