Tzong-Han Tsai | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tzong-Han Tsai is active.

Explore More

Publication

Featured researches published by Tzong-Han Tsai.

Expert Systems With Applications | 2006

Integrating linguistic knowledge into a conditional random fieldframework to identify biomedical named entities

Tzong-Han Tsai; Wen-Chi Chou; Shih-Hung Wu; Ting-Yi Sung; Jieh Hsiang; Wen-Lian Hsu

As new high-throughput technologies have created an explosion of biomedical literature, there arises a pressing need for automatic information extraction from the literature bank. To this end, biomedical named entity recognition (NER) from natural language text is indispensable. Current NER approaches include: dictionary based, rule based, or machine learning based. Since, there is no consolidated nomenclature for most biomedical NEs, any NER system relying on limited dictionaries or rules does not seem to perform satisfactorily. In this paper, we consider a machine learning model, CRF, for the construction of our NER framework. CRF is a well-known model for solving other sequence tagging problems. In our framework, we do our best to utilize available resources including dictionaries, web corpora, and lexical analyzers, and represent them as linguistic features in the CRF model. In the experiment on the JNLPBA 2004 data, with minimal post-processing, our system achieves an F-score of 70.2%, which is better than most state-of-the-art systems. On the GENIA 3.02 corpus, our system achieves an F-score of 78.4% for protein names, which is 2.8% higher than the next-best system. In addition, we also examine the usefulness of each feature in our CRF model. Our experience could be valuable to other researchers working on machine learning based NER.

information reuse and integration | 2005

A knowledge-based approach to citation extraction

Min-Yuh Day; Tzong-Han Tsai; Cheng-Lung Sung; Cheng-Wei Lee; Shih-Hung Wu; Chorng-Shyong Ong; Wen-Lian Hsu

Integration of the bibliographical information of scholarly publications available on the Internet is an important task in academic research. To accomplish this task, accurate reference metadata extraction for scholarly publications is essential for the integration of information from heterogeneous reference sources. In this paper, we propose a knowledge-based approach to literature mining and focus on reference metadata extraction methods for scholarly publications. We adopt an ontological knowledge representation framework called INFOMAP to automatically extract the reference metadata. The experimental results show that, by using INFOMAP, we can extract author, title, journal, volume, number (issue), year, and page information from different reference styles with a high degree of accuracy. The overall average field accuracy of citation extraction for a bioinformatics dataset is 97.87% for six reference styles.

conference on computational natural language learning | 2005

Exploiting Full Parsing Information to Label Semantic Roles Using an Ensemble of ME and SVM via Integer Linear Programming

Tzong-Han Tsai; Chia Wei Wu; Yu-Chun Lin; Wen-Lian Hsu

In this paper, we propose a method that exploits full parsing information by representing it as features of argument classification models and as constraints in integer linear learning programs. In addition, to take advantage of SVM-based and Maximum Entropy-based argument classification models, we incorporate their scoring matrices, and use the combined matrix in the above-mentioned integer linear programs. The experimental results show that full parsing information not only increases the F-score of argument classification models by 0.7%, but also effectively removes all labeling inconsistencies, which increases the F-score by 0.64%. The ensemble of SVM and ME also boosts the F-score by 0.77%. Our system achieves an F-score of 76.53% in the development set and 76.38% in Test WSJ.

中文計算語言學期刊 | 2004

Mencius: A Chinese Named Entity Recognizer Using the Maximum Entropy-based Hybrid Model

Tzong-Han Tsai; Shih-Hung Wu; Cheng-Wei Lee; Cheng-Wei Shih; Wen-Lian Hsu

This paper presents a Chinese named entity recognizer (NER): Mencius. It aims to address Chinese NER problems by combining the advantages of rule-based and machine learning (ML) based NER systems. Rule-based NER systems can explicitly encode human comprehension and can be tuned conveniently, while ML-based systems are robust, portable and inexpensive to develop. Our hybrid system incorporates a rule-based knowledge representation and template-matching tool, called InfoMap [Wu et al. 2002], into a maximum entropy (ME) framework. Named entities are represented in InfoMap as templates, which serve as ME features in Mencius. These features are edited manually, and their weights are estimated by the ME framework according to the training data. To understand how word segmentation might influence Chinese NER and the differences between a pure template-based method and our hybrid method, we configure Mencius using four distinct settings. The F-Measures of person names (PER), location names (LOC) and organization names (ORO) of the best configuration in our experiment were respectively 94.3%, 77.8% and 75.3%. From comparing the experiment results obtained using these configurations reveals that hybrid NER Systems always perform better performance in identifying person names. On the other hand, they have a little difficulty identifying location and organization names. Furthermore, using a word segmentation module improves the performance of pure Template-based NER Systems, but, it has little effect on hybrid NER systems.

Proceedings of the Sixth International Workshop on Information Retrieval with Asian Languages | 2003

Text Categorization Using Automatically Acquired Domain Ontology

Shih-Hung Wu; Tzong-Han Tsai; Wen-Lian Hsu

In this paper, we describe ontology-based text categorization in which the domain ontologies are automatically acquired through morphological rules and statistical methods. The ontology-based approach is a promising way for general information retrieval applications such as knowledge management or knowledge discovery. As a way to evaluate the quality of domain ontologies, we test our method through several experiments. Automatically acquired domain ontologies, with or without manual editing, have been used for text categorization. The results are quite satisfactory. Furthermore, we have developed an automatic method to evaluate the quality of our domain ontology.

Archive | 2002

FAQ-Centered Organizational Memory

Shih-Hung Wu; Min-Yuh Day; Tzong-Han Tsai; Wen-Lian Hsu

The value of a piece of information in an organization is related to its retrieval (or requested) frequency. Therefore, collecting the answers to the frequently asked questions (FAQs) and constructing a good retrieval mechanism is a useful way to maintain organizational memory (OM). Since natural language is the easiest way for people to communicate, we have designed a natural language dialogue system for sharing the valuable knowledge of an organization. The system receives a natural language query from the user and matches it with a FAQ. Either an appropriate answer will be returned according to the user profile or the system will ask-back another question to the user so that a more detailed query can be formed. This dialogue will continue until the user is satisfied or a detailed answer is obtained, hi this paper, we apply natural language processing techniques to build a computer system that can help achieve the goal of OM.

asia information retrieval symposium | 2005

Learning to integrate web taxonomies with fine-grained relations: a case study using maximum entropy model

Chia-Wei Wu; Tzong-Han Tsai; Wen-Lian Hsu

As web taxonomy integration is an emerging issue on the Internet, many research topics, such as personalization, web searches, and electronic markets, would benefit from further development of taxonomy integration techniques. The integration task is to transfer documents from a source web taxonomy to a target web taxonomy. In most current techniques, integration performance is enhanced by referring to the relations between corresponding categories in the source and target taxonomies. However, the techniques may not be effective, since the concepts of the corresponding categories may overlap partially. In this paper we present an effective approach for integrating taxonomies and alleviating the partial overlap problem by considering fine-grained relations using a Maximum Entropy Model. The experiment results show that the proposed approach improves the classification accuracy of taxonomies over previous approaches.

international conference on data mining | 2004

A maximum entropy approach to biomedical named entity recognition

Yi-Feng Lin; Tzong-Han Tsai; Wen-Chi Chou; Kuen-Pin Wu; Ting-Yi Sung; Wen-Lian Hsu

NTCIR | 2005

ASQA: Academia Sinica Question Answering System for NTCIR-5 CLQA

Cheng-Wei Lee; Cheng-Wei Shih; Min-Yuh Day; Tzong-Han Tsai; Mike Tian-Jian Jiang; Chia-Wei Wu; Cheng-Lung Sung; Yu-Ren Chen; Shih-Hung Wu; Wen-Lian Hsu

international joint conference on artificial intelligence | 2003