Hsin-Hsi Chen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hsin-Hsi Chen is active.

Explore More

Publication

Featured researches published by Hsin-Hsi Chen.

international acm sigir conference on research and development in information retrieval | 2007

FRank: a ranking method with fidelity loss

Ming-Feng Tsai; Tie-Yan Liu; Tao Qin; Hsin-Hsi Chen; Wei-Ying Ma

Ranking problem is becoming important in many fields, especially in information retrieval (IR). Many machine learning techniques have been proposed for ranking problem, such as RankSVM, RankBoost, and RankNet. Among them, RankNet, which is based on a probabilistic ranking framework, is leading to promising results and has been applied to a commercial Web search engine. In this paper we conduct further study on the probabilistic ranking framework and provide a novel loss function named fidelity loss for measuring loss of ranking. The fidelity loss notonly inherits effective properties of the probabilistic ranking framework in RankNet, but possesses new properties that are helpful for ranking. This includes the fidelity loss obtaining zero for each document pair, and having a finite upper bound that is necessary for conducting query-level normalization. We also propose an algorithm named FRank based on a generalized additive model for the sake of minimizing the fedelity loss and learning an effective ranking function. We evaluated the proposed algorithm for two datasets: TREC dataset and real Web search dataset. The experimental results show that the proposed FRank algorithm outperforms other learning-based ranking methods on both conventional IR problem and Web search.

international conference on computational linguistics | 2000

Mining tables from large scale HTML texts

Hsin-Hsi Chen; Shih-Chung Tsai; Jin-He Tsai

Table is a very common presentation scheme, but few papers touch on table extraction in text data mining. This paper focuses on mining tables from large-scale HTML texts. Table filtering, recognition, interpretation, and presentation are discussed. Heuristic rules and cell similarities are employed to identify tables. The F-measure of table recognition is 86.50%. We also propose an algorithm to capture attribute-value relationships among table cells. Finally, more structured data is extracted and presented.

meeting of the association for computational linguistics | 2006

Novel Association Measures Using Web Search with Double Checking

Hsin-Hsi Chen; Ming-Shun Lin; Yu-Chuan Wei

A web search with double checking model is proposed to explore the web as a live corpus. Five association measures including variants of Dice, Overlap Ratio, Jaccard, and Cosine, as well as Co-Occurrence Double Check (CODC), are presented. In the experiments on Rubenstein-Goodenoughs benchmark data set, the CODC measure achieves correlation coefficient 0.8492, which competes with the performance (0.8914) of the model using WordNet. The experiments on link detection of named entities using the strategies of direct association, association matrix and scalar association matrix verify that the double-check frequencies are reliable. Further study on named entity clustering shows that the five measures are quite useful. In particular, CODC measure is very stable on word-word and name-name experiments. The application of CODC measure to expand community chains for personal name disambiguation achieves 9.65% and 14.22% increase compared to the system without community expansion. All the experiments illustrate that the novel model of web search with double checking is feasible for mining associations from the web.

web intelligence | 2007

Emotion Classification Using Web Blog Corpora

Changhua Yang; Kevin Lin; Hsin-Hsi Chen

In this paper, we investigate the emotion classification of web blog corpora using support vector machine (SVM) and conditional random field (CRF) machine learning techniques. The emotion classifiers are trained at the sentence level and applied to the document level. Our methods also determine an emotion category by taking the context of a sentence into account. Experiments show that CRF classifiers outperform SVM classifiers. When applying emotion classification to a blog at the document level, the emotion of the last sentence in a document plays an important role in determining the overall emotion.This paper is based on the theory of Finite State Automata (FSAs), models a web service as a FSA, extends WSDL for conceptually describing the behaviors of Web services, and introduces the concept of Temporal Logic of Actions (short for TLA) to describe and specify the behavior of a service in a formal way.

meeting of the association for computational linguistics | 2007

Building Emotion Lexicon from Weblog Corpora

Changhua Yang; Kevin Lin; Hsin-Hsi Chen

An emotion lexicon is an indispensable resource for emotion analysis. This paper aims to mine the relationships between words and emotions using weblog corpora. A collocation model is proposed to learn emotion lexicons from weblog articles. Emotion classification at sentence level is experimented by using the mined lexicons to demonstrate their usefulness.

international conference on computational linguistics | 2002

Backward machine transliteration by learning phonetic similarity

Wei-Hao Lin; Hsin-Hsi Chen

In many cross-lingual applications we need to convert a transliterated word into its original word. In this paper, we present a similarity-based framework to model the task of backward transliteration, and provide a learning algorithm to automatically acquire phonetic similarities from a corpus. The learning algorithm is based on Widrow-Hoff rule with some modifications. The experiment results show that the learning algorithm converges quickly, and the method using acquired phonetic similarities remarkably outperforms previous methods using pre-defined phonetic similarities or graphic similarities in a corpus of 1574 pairs of English names and transliterated Chinese names. The learning algorithm does not assume any underlying phonological structures or rules, and can be extended to other language pairs once a training corpus and a pronouncing dictionary are available.

BMC Complementary and Alternative Medicine | 2008

TCMGeneDIT: a database for associated traditional Chinese medicine, gene and disease information using text mining

Yu-Ching Fang; H.-C. Huang; Hsin-Hsi Chen; Hsueh-Fen Juan

BackgroundTraditional Chinese Medicine (TCM), a complementary and alternative medical system in Western countries, has been used to treat various diseases over thousands of years in East Asian countries. In recent years, many herbal medicines were found to exhibit a variety of effects through regulating a wide range of gene expressions or protein activities. As available TCM data continue to accumulate rapidly, an urgent need for exploring these resources systematically is imperative, so as to effectively utilize the large volume of literature.MethodsTCM, gene, disease, biological pathway and protein-protein interaction information were collected from public databases. For association discovery, the TCM names, gene names, disease names, TCM ingredients and effects were used to annotate the literature corpus obtained from PubMed. The concept to mine entity associations was based on hypothesis testing and collocation analysis. The annotated corpus was processed with natural language processing tools and rule-based approaches were applied to the sentences for extracting the relations between TCM effecters and effects.ResultsWe developed a database, TCMGeneDIT, to provide association information about TCMs, genes, diseases, TCM effects and TCM ingredients mined from vast amount of biomedical literature. Integrated protein-protein interaction and biological pathways information are also available for exploring the regulations of genes associated with TCM curative effects. In addition, the transitive relationships among genes, TCMs and diseases could be inferred through the shared intermediates. Furthermore, TCMGeneDIT is useful in understanding the possible therapeutic mechanisms of TCMs via gene regulations and deducing synergistic or antagonistic contributions of the prescription components to the overall therapeutic effects. The database is now available at http://tcm.lifescience.ntu.edu.tw/.ConclusionTCMGeneDIT is a unique database that offers diverse association information on TCMs. This database integrates TCMs with biomedical studies that would facilitate clinical research and elucidate the possible therapeutic mechanisms of TCMs and gene regulations.

web intelligence | 2008

Emotion Classification of Online News Articles from the Reader's Perspective

Kevin Lin; Changhua Yang; Hsin-Hsi Chen

Past studies on emotion classification focus on the writerpsilas emotional state. This research addresses the reader aspect instead. The classification of documents into reader-emotion categories has several applications. One of them is to integrate reader-emotion classification into a Web search engine to allow users to retrieve documents that contain relevant contents and at the same time instill proper emotions. In this paper, we automatically classify documents into reader-emotion categories, and examine classification performance under different feature settings. Experiments show that certain feature combinations achieve good accuracy. We also compare the best classifierpsilas classification results with the emotional distributions of documents to determine how closely the classifier models the underlying reader behavior. Finally, we investigate the feasibility of emotion ranking.

meeting of the association for computational linguistics | 1998

Proper Name Translation in Cross-Language Information Retrieval

Hsin-Hsi Chen; Sheng-Jie Huang; Yung-Wei Ding; Shih-Chung Tsai

Recently, language barrier becomes the major problem for people to search, retrieve, and understand WWW documents in different languages. This paper deals with query translation issue in cross-language information retrieval, proper names in particular. Models for name identification, name translation and name searching are presented. The recall rates and the precision rates for the identification of Chinese organization names, person names and location names under MET data are (76.67%, 79.33%), (87.33%, 82.33%) and (77.00%, 82.00%), respectively. In name translation, only 0.79% and 1.11% of candidates for English person names and location names, respectively, have to be proposed. The name searching facility is implemented on an MT sever for information retrieval on the WWW. Under this system, user can issue queries and read documents with his familiar language.

asia information retrieval symposium | 2006

Query expansion with conceptnet and wordnet: an intrinsic comparison

Ming-Hung Hsu; Ming-Feng Tsai; Hsin-Hsi Chen

This paper compares the utilization of ConceptNet and WordNet in query expansion. Spreading activation selects candidate terms for query expansion from these two resources. Three measures including discrimination ability, concept diversity, and retrieval performance are used for comparisons. The topics and document collections in the ad hoc track of TREC-6, TREC-7 and TREC-8 are adopted in the experiments. The results show that ConceptNet and WordNet are complementary. Queries expanded with WordNet have higher discrimination ability. In contrast, queries expanded with ConceptNet have higher concept diversity. The performance of queries expanded by selecting the candidate terms from ConceptNet and WordNet outperforms that of queries without expansion, and queries expanded with a single resource.

Explore More