Hen-Hsen Huang
National Taiwan University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hen-Hsen Huang.
international health informatics symposium | 2012
Han-Bin Chen; Hen-Hsen Huang; Ching-Ting Tan; Jeng-Wei Tjiu; Hsin-Hsi Chen
In a hospital, a medical summary is indispensable for both a clinician and a patient. However, it is written in English in some non-English native countries and becomes a barrier for a patient to read. In this paper we propose a framework for rapid acquisition of bilingual medical summaries using machine translation (MT) techniques. We describe a medical summary corpus and some terminological databases prepared for the framework. We then touch on the challenging issues of MT adapted from generic to specific domains, and propose a pattern translation scheme to achieve domain adaptation based on a background statistical MT system. We identify the significant patterns to capture the specific writing styles in a medical summary. The patterns are then translated with the involvements of doctors. Our major concern is to reduce the cost of translation and better allocate the efforts made by the domain experts. The experimental results show the proposed methods are effective in terms of the significance and diversity of the patterns. The approaches to integrate the mined patterns into background MT are also discussed.
european conference on information retrieval | 2017
Yu-Hsiang Huang; Hen-Hsen Huang; Hsin-Hsi Chen
Automatic Irony Detection refers to making computer understand the real intentions of human behind the ironic language. Much work has been done using classic machine learning techniques applied on various features. In contrast to sophisticated feature engineering, this paper investigates how the deep learning can be applied to the intended task with the help of word embedding. Three different deep learning models, Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and Attentive RNN, are explored. It shows that the Attentive RNN achieves the state-of-the-art on Twitter datasets. Furthermore, with a closer look at the attention vectors generated by Attentive RNN, an insight into how the attention mechanism helps find out the linguistic clues of ironic utterances is provided.
international world wide web conferences | 2016
Yang-Yin Lee; Hao Ke; Hen-Hsen Huang; Hsin-Hsi Chen
While many traditional studies on semantic relatedness utilize the lexical databases, such as WordNet or Wikitionary, the recent word embedding learning approaches demonstrate their abilities to capture syntactic and semantic information, and outperform the lexicon-based methods. However, word senses are not disambiguated in the training phase of both Word2Vec and GloVe, two famous word embedding algorithms, and the path length between any two senses of words in lexical databases cannot reflect their true semantic relatedness. In this paper, a novel approach that linearly combines Word2Vec and GloVe with the lexical database WordNet is proposed for measuring semantic relatedness. The experiments show that the simple method outperforms the state-of-the-art model SensEmbed.
meeting of the association for computational linguistics | 2017
Yow-Ting Shiue; Hen-Hsen Huang; Hsin-Hsi Chen
Selecting appropriate words to compose a sentence is one common problem faced by non-native Chinese learners. In this paper, we propose (bidirectional) LSTM sequence labeling models and explore various features to detect word usage errors in Chinese sentences. By combining CWINDOW word embedding features and POS information, the best bidirectional LSTM model achieves accuracy 0.5138 and MRR 0.6789 on the HSK dataset. For 80.79% of the test data, the model ranks the ground-truth within the top two at position level.
international world wide web conferences | 2017
Hen-Hsen Huang; Yu-Wei Wen; Hsin-Hsi Chen
In addition to opinion spam, the overstated or unproven information in false advertisements could also mislead customers while making purchasing decisions. A false-advertisement judgement system aims at recognizing and explaining the illegal false advertisements. In this paper, we incorporate the convolutional neural network (CNN) with word embeddings and syntactic features in the system. The recognition experiments show that Dependency-based CNN (DCNN) achieves F-scores of 86.77%, 93.18%, and 87.46% in the cosmetics, food, and drug datasets, respectively. Moreover, the explanation of illegality experiments shows the F-scores of 56.19%, 50.36%, and 62.06% in the three datasets. Our judgement system can contribute to different roles in the online advertising.
Computer Speech & Language | 2017
Han-Bin Chen; Hen-Hsen Huang; An-Chang Hsieh; Hsin-Hsi Chen
Abstract Integration of in-domain knowledge into an out-of-domain statistical machine translation (SMT) system poses challenges due to the lack of resources. Lack of in-domain bilingual corpora is one such issue. In this paper, we propose a simplification–translation–restoration (STR) framework for domain adaptation in SMT systems. An SMT system to translate medical records from English to Chinese is taken as a case study. We identify the critical segments in a medical sentence and simplify them to alleviate the data sparseness problem in the out-of-domain SMT system. After translating the simplified sentence, the translations of these critical segments are restored to their proper positions. Besides the simplification pre-processing step and the restoration post-processing step, we also enhance the translation and language models in the STR framework by using pseudo bilingual corpora generated by the background MT system. In the experiments, we adapt an SMT system from a government document domain to a medical record domain. The results show the effectiveness of the STR framework.
international world wide web conferences | 2016
Yang-Yin Lee; Hao Ke; Hen-Hsen Huang; Hsin-Hsi Chen
GloVe, global vectors for word representation, performs well in some word analogy and semantic relatedness tasks. However, we find that some dimensions of the trained word embedding are abnormal. We verify our conjecture via removing these abnormal dimensions using Kolmogorov-Smimov test and experiment on several benchmark datasets for semantic relatedness measurement. The experimental results confirm our finding. Interestingly, some of the tasks outperform the state-of-the-art model SensEmbed by simply removing these abnormal dimensions. The novel rule of thumb technique which leads to better performance is expected to be useful in practice.
asia information retrieval symposium | 2012
Hen-Hsen Huang; Chia-Chun Lee; Hsin-Hsi Chen
The family medicine in some regions is not as popular as that in the United States. Most patients choose the outpatient department without professional advice. In this work, we propose a health care aiding system that recommends the outpatient department for a patient according to his/her chief complaint and personal attributes. The recommendation is based on the past medical summaries of a hospital. Three methods including language model, support vector machine, and k-nearest neighbor algorithm along with different features are explored. The experimental results show that the SVM classifier with features selected from chief complaint, as well as personal attributes such as age, gender, and disease information achieves an f-measure of 79.35%.
international joint conference on artificial intelligence | 2018
Hen-Hsen Huang; Hsin-Hsi Chen
This work demonstrates a writing assistant system that provides high level advice for Chinese scientific writing. Cross-lingual approaches are investigated to analyze the information structure of a given Chinese abstract and retrieve useful knowledge in the related work written in both English and Chinese. To the best of our knowledge, this is the first study on Chinese information structure identification. Without the need of labeled Chinese data, our novel model is capable of dealing with Chinese instances by acquiring language-invariant knowledge from the labeled English data. Adversarial learning is employed to enhance the crosslingual sentence representation.
Companion of the The Web Conference 2018 on The Web Conference 2018 - WWW '18 | 2018
Ting-Yu Yen; Yang-Yin Lee; Hen-Hsen Huang; Hsin-Hsi Chen
While recent word embedding models demonstrate their abilities to capture syntactic and semantic information, the demand for sense level embedding is getting higher. In this study, we propose a novel joint sense embedding learning model that retrofits the word representation into sense representation from contextual and ontological information. The experiment shows the effectiveness and robustness of our model that outperforms previous approaches in four public available benchmark datasets.