Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Shih-Hung Wu is active.

Publication


Featured researches published by Shih-Hung Wu.


BMC Bioinformatics | 2006

Various criteria in the evaluation of biomedical named entity recognition

Richard Tzong-Han Tsai; Shih-Hung Wu; Wen-Chi Chou; Yu-Chun Lin; Ding He; Jieh Hsiang; Ting-Yi Sung; Wen-Lian Hsu

BackgroundText mining in the biomedical domain is receiving increasing attention. A key component of this process is named entity recognition (NER). Generally speaking, two annotated corpora, GENIA and GENETAG, are most frequently used for training and testing biomedical named entity recognition (Bio-NER) systems. JNLPBA and BioCreAtIvE are two major Bio-NER tasks using these corpora. Both tasks take different approaches to corpus annotation and use different matching criteria to evaluate system performance. This paper details these differences and describes alternative criteria. We then examine the impact of different criteria and annotation schemes on system performance by retesting systems participated in the above two tasks.ResultsTo analyze the difference between JNLPBAs and BioCreAtIvEs evaluation, we conduct Experiment 1 to evaluate the top four JNLPBA systems using BioCreAtIvEs classification scheme. We then compare them with the top four BioCreAtIvE systems. Among them, three systems participated in both tasks, and each has an F-score lower on JNLPBA than on BioCreAtIvE. In Experiment 2, we apply hypothesis testing and correlation coefficient to find alternatives to BioCreAtIvEs evaluation scheme. It shows that right-match and left-match criteria have no significant difference with BioCreAtIvE. In Experiment 3, we propose a customized relaxed-match criterion that uses right match and merges JNLPBAs five NE classes into two, which achieves an F-score of 81.5%. In Experiment 4, we evaluate a range of five matching criteria from loose to strict on the top JNLPBA system and examine the percentage of false negatives. Our experiment gives the relative change in precision, recall and F-score as matching criteria are relaxed.ConclusionIn many applications, biomedical NEs could have several acceptable tags, which might just differ in their left or right boundaries. However, most corpora annotate only one of them. In our experiment, we found that right match and left match can be appropriate alternatives to JNLPBA and BioCreAtIvEs matching criteria. In addition, our relaxed-match criterion demonstrates that users can define their own relaxed criteria that correspond more realistically to their application requirements.


decision support systems | 2007

Reference metadata extraction using a hierarchical knowledge representation framework

Min-Yuh Day; Richard Tzong-Han Tsai; Cheng-Lung Sung; Chiu-Chen Hsieh; Cheng-Wei Lee; Shih-Hung Wu; Kuen-Pin Wu; Chorng-Shyong Ong; Wen-Lian Hsu

The integration of bibliographical information on scholarly publications available on the Internet is an important task in the academic community. Accurate reference metadata extraction from such publications is essential for the integration of metadata from heterogeneous reference sources. In this paper, we propose a hierarchical template-based reference metadata extraction method for scholarly publications. We adopt a hierarchical knowledge representation framework called INFOMAP, which automatically extracts metadata. The experimental results show that, by using INFOMAP, we can extract author, title, journal, volume, number (issue), year, and page information from different kinds of reference styles with a high degree of precision. The overall average accuracy is 92.39% for the six major reference styles compared in this study.


Expert Systems With Applications | 2006

Integrating linguistic knowledge into a conditional random fieldframework to identify biomedical named entities

Tzong-Han Tsai; Wen-Chi Chou; Shih-Hung Wu; Ting-Yi Sung; Jieh Hsiang; Wen-Lian Hsu

As new high-throughput technologies have created an explosion of biomedical literature, there arises a pressing need for automatic information extraction from the literature bank. To this end, biomedical named entity recognition (NER) from natural language text is indispensable. Current NER approaches include: dictionary based, rule based, or machine learning based. Since, there is no consolidated nomenclature for most biomedical NEs, any NER system relying on limited dictionaries or rules does not seem to perform satisfactorily. In this paper, we consider a machine learning model, CRF, for the construction of our NER framework. CRF is a well-known model for solving other sequence tagging problems. In our framework, we do our best to utilize available resources including dictionaries, web corpora, and lexical analyzers, and represent them as linguistic features in the CRF model. In the experiment on the JNLPBA 2004 data, with minimal post-processing, our system achieves an F-score of 70.2%, which is better than most state-of-the-art systems. On the GENIA 3.02 corpus, our system achieves an F-score of 78.4% for protein names, which is 2.8% higher than the next-best system. In addition, we also examine the usefulness of each feature in our CRF model. Our experience could be valuable to other researchers working on machine learning based NER.


conference on information and knowledge management | 2000

Semantic search on Internet tabular information extraction for answering queries

Huei-Long Wang; Shih-Hung Wu; I. C. Wang; Cheng-Lung Sung; Wen-Lian Hsu; Wei-Kuan Shih

[email protected] ABSTRACT Although extracting information from tables is essential for Internet information agents, most tables are designed for human eyes and their layout and semantic meanings are not well defined. In practice, encoding the layout of each information source is impossible. This work presents a novel semantic search approach capable of extracting information from general tables. Semantic ontology allows our agents to read tables in the same knowledge domain with different layouts. In addition, a system of layout syntax and a set of transformation rules are defined to transform tables into databases without losing their semantic meanings.


systems man and cybernetics | 2001

Event identification based on the information map-INFOMAP

Wen-Lian Hsu; Shih-Hung Wu; Yi-Shiou Chen

We present a knowledge representation scheme, INFOMAP, together with a mechanism that matches the event of a natural language sentence with part of the domain ontology in the INFOMAP The design of this scheme is to facilitate both human browsing and computer processing of the domain ontology. INFOMAP is also a knowledge framework designed to facilitate knowledge sharing by different application systems. We constructed a question answering, system to demonstrate the power of INFOMAP. When the QA-system receives a users query, it will extract the corresponding events or scripts based on the ontology in INFOMAP The understanding of a question involves extracting such information as the question type, the question subject, the question condition and the question context A dialogue on the question is triggered at the same time to guide the user to retrieve more relevant information.


information reuse and integration | 2005

A knowledge-based approach to citation extraction

Min-Yuh Day; Tzong-Han Tsai; Cheng-Lung Sung; Cheng-Wei Lee; Shih-Hung Wu; Chorng-Shyong Ong; Wen-Lian Hsu

Integration of the bibliographical information of scholarly publications available on the Internet is an important task in academic research. To accomplish this task, accurate reference metadata extraction for scholarly publications is essential for the integration of information from heterogeneous reference sources. In this paper, we propose a knowledge-based approach to literature mining and focus on reference metadata extraction methods for scholarly publications. We adopt an ontological knowledge representation framework called INFOMAP to automatically extract the reference metadata. The experimental results show that, by using INFOMAP, we can extract author, title, journal, volume, number (issue), year, and page information from different reference styles with a high degree of accuracy. The overall average field accuracy of citation extraction for a bioinformatics dataset is 97.87% for six reference styles.


中文計算語言學期刊 | 2004

Mencius: A Chinese Named Entity Recognizer Using the Maximum Entropy-based Hybrid Model

Tzong-Han Tsai; Shih-Hung Wu; Cheng-Wei Lee; Cheng-Wei Shih; Wen-Lian Hsu

This paper presents a Chinese named entity recognizer (NER): Mencius. It aims to address Chinese NER problems by combining the advantages of rule-based and machine learning (ML) based NER systems. Rule-based NER systems can explicitly encode human comprehension and can be tuned conveniently, while ML-based systems are robust, portable and inexpensive to develop. Our hybrid system incorporates a rule-based knowledge representation and template-matching tool, called InfoMap [Wu et al. 2002], into a maximum entropy (ME) framework. Named entities are represented in InfoMap as templates, which serve as ME features in Mencius. These features are edited manually, and their weights are estimated by the ME framework according to the training data. To understand how word segmentation might influence Chinese NER and the differences between a pure template-based method and our hybrid method, we configure Mencius using four distinct settings. The F-Measures of person names (PER), location names (LOC) and organization names (ORO) of the best configuration in our experiment were respectively 94.3%, 77.8% and 75.3%. From comparing the experiment results obtained using these configurations reveals that hybrid NER Systems always perform better performance in identifying person names. On the other hand, they have a little difficulty identifying location and organization names. Furthermore, using a word segmentation module improves the performance of pure Template-based NER Systems, but, it has little effect on hybrid NER systems.


International journal of continuing engineering education and life-long learning | 2011

Improve the detection of improperly used Chinese characters in students’ essays with error model

Yong Zhi Chen; Shih-Hung Wu; Ping Che Yang; Tsun Ku; Gwo-Dong Chen

In this research, we propose a Chinese essay error detection system that can be used in online writing tutorial environment. The system consists of word segmentation, template module, and language model module. We build the system with a dictionary for word segmentation and generation of detection templates from news corpus. We also gather the character error probabilities from students’ essays to build error model. Error types include pronunciation-related errors and composition-related errors. Our system provides two operating modes for users with different goals. For example, we can help students learn to write effectively by providing high precision correction mode; for teachers, we provide high correction detection mode which can help teachers to check students’ essays.


Proceedings of the Sixth International Workshop on Information Retrieval with Asian Languages | 2003

Text Categorization Using Automatically Acquired Domain Ontology

Shih-Hung Wu; Tzong-Han Tsai; Wen-Lian Hsu

In this paper, we describe ontology-based text categorization in which the domain ontologies are automatically acquired through morphological rules and statistical methods. The ontology-based approach is a promising way for general information retrieval applications such as knowledge management or knowledge discovery. As a way to evaluate the quality of domain ontologies, we test our method through several experiments. Automatically acquired domain ontologies, with or without manual editing, have been used for text categorization. The results are quite satisfactory. Furthermore, we have developed an automatic method to evaluate the quality of our domain ontology.


pacific rim international conference on multi-agents | 1998

A Fuzzy Game Theoretic Approach to Multi-Agent Coordination

Shih-Hung Wu; Von-Wun Soo

Game theoretic decision making is a practical approach to multi-agent coordination. Rational agents may make decisions based on different principles of rationality assumptions that usually involve knowledge of how other agents might move. After formulating a game matrix of utility entries of possible combination of moves from both agents, agents can reason which combination is the equilibrium. Most previous game theoretic works treat the utility values qualitatively (i.e., consider only the order of the utility values). This is not practical since the utility values are usually approximate and the differences between utility values are somewhat vague. In this paper, we present a fuzzy game theoretic decision making mechanism that can deal with uncertain utilities. We thus construct a fuzzy-theoretic game framework under both the fuzzy theory and the game theory. The notions of fuzzy dominant relations, fuzzy Nash equilibrium, and fuzzy strategies are defined and fuzzy reasoning are carried out in agent decision making. We show that a fuzzy strategy can perform better than a mixed strategy in traditional game theory in dealing with more than one Nash equilibrium games.

Collaboration


Dive into the Shih-Hung Wu's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tsun Ku

National Central University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Shan-Shun Yang

Chaoyang University of Technology

View shared research outputs
Top Co-Authors

Avatar

Von-Wun Soo

National Tsing Hua University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Chorng-Shyong Ong

National Taiwan University

View shared research outputs
Researchain Logo
Decentralizing Knowledge