Wen Juan Hou
National Taiwan University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Wen Juan Hou.
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications | 2004
Chih Lee; Wen Juan Hou; Hsin-Hsi Chen
Named entity recognition is a fundamental task in biomedical data mining. Multiple - class annotation is more challenging than single - class annotation. In this paper, we took a single word classification approach to dealing with the multiple - class annotation problem using Support Vector Machines (SVMs). Word attributes, results of existing gene/protein name taggers, context, and other information are important features for classification. During training, the size of training data and the distribution of named entities are considered. The preliminary results showed that the approach might be feasible when more training data is used to alleviate the data imbalance problem.
meeting of the association for computational linguistics | 2003
Wen Juan Hou; Hsin-Hsi Chen
Named entity recognition is a fundamental task in biological relationship mining. This paper employs protein collocates extracted from a biological corpus to enhance the performance of protein name recognizers. Yapex and KeX are taken as examples. The precision of Yapex is increased from 70.90% to 81.94% at the low expense of recall rate (i.e., only decrease 2.39%) when collocates are incorporated. We also integrate the results proposed by Yapex and KeX, and employs collocates to filter the merged results. Because the candidates suggested by these two systems may be inconsistent, i.e., overlap in partial, one of them is considered as a basis. The experiments show that Yapex-based integration is better than KeX-based integration.
international conference industrial engineering other applications applied intelligent systems | 2015
Wen Juan Hou; Bamfa Ceesay
Gene Regulation Network GRN is a graphical representation of the relationship between molecular mechanisms and cellular behavior in system biology. This paper examines the extraction of GRN from biological literatures using text mining techniques. The study proposes two independent methods first, a syntactic method and a semantic method in text mining, to extract biological events from the unstructured text. The paper presents the performance of the two methods and then experiments with the combined strategy to construct a gene regulation network from texts. The results show that the graph-based approach obtains a better result on event extraction and produces a much better regulation network than the semantic analysis method. The combination of the two approaches has yet a much slightly better result than that with the individual approach. This exhilarates us to find more future directions in the biological event extraction research.
Gene | 2013
Wen Juan Hou; Hsiao Yuan Chen
BACKGROUND Biomedical data available to researchers and clinicians have increased dramatically over the past years because of the exponential growth of knowledge in medical biology. It is difficult for curators to go through all of the unstructured documents so as to curate the information to the database. Associating genes with diseases is important because it is a fundamental challenge in human health with applications to understanding disease properties and developing new techniques for prevention, diagnosis and therapy. METHODS Our study uses the automatic rule-learning approach to gene-disease relationship extraction. We first prepare the experimental corpus from MEDLINE and OMIM. A parser is applied to produce some grammatical information. We then learn all possible rules that discriminate relevant from irrelevant sentences. After that, we compute the scores of the learned rules in order to select rules of interest. As a result, a set of rules is generated. RESULTS We produce the learned rules automatically from the 1000 positive and 1000 negative sentences. The test set includes 400 sentences composed of 200 positives and 200 negatives. Precision, recall and F-score served as our evaluation metrics. The results reveal that the maximal precision rate is 77.8% and the maximal recall rate is 63.5%. The maximal F-score is 66.9% where the precision rate is 70.6% and the recall rate is 63.5%. CONCLUSIONS We employ the rule-learning approach to extract gene-disease relationships. Our main contributions are to build rules automatically and to support a more complete set of rules than a manually generated one. The experiments show exhilarating results and some improving efforts will be made in the future.
biomedical engineering and informatics | 2011
Wen Juan Hou; Li Che Chen; Chieh Shiang Lu
Associating genes with diseases is an active area of research because it is useful for helping human health with applications to clinical diagnosis and therapy. This paper proposes two methods to guide the associations between genes and diseases: (1) making use of the proximity relationship between genes and diseases and (2) utilizing GO terms shared by genes and diseases for similarity comparison. The experiments show that associations utilizing GO terms perform better than using word proximity. The results reveal that the GO terms act as a good gene-disease association feature.
conference of the european chapter of the association for computational linguistics | 2006
Wen Juan Hou; Chih Lee; Hsin-Hsi Chen
In this paper, we propose an approach for identifying curatable articles from a large document set. This system considers three parts of an article (title and abstract, MeSH terms, and captions) as its three individual representations and utilizes two domain-specific resources (UMLS and a tumor name list) to reveal the deep knowledge contained in the article. An SVM classifier is trained and cross-validation is employed to find the best combination of representations. The experimental results show overall high performance.
north american chapter of the association for computational linguistics | 2015
Bamfa Ceesay; Wen Juan Hou
Taxonomy structures are important tools in the science of classification of things or concepts, including the principles that underlie such classification. This paper presents an approach to the problem of taxonomy construction from texts focusing on the hyponym-hypernym relation between two terms. Given a set of terms in a particular domain, the approach in this study uses Wikipedia and WordNet as knowledge sources and applies the information extraction methods to analyze and establish the hyponym-hypernym relationship between two terms. Our system is ranked fourth among the participating systems in SemEval-2015 task 17.
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications | 2004
Chih Lee; Wen Juan Hou; Hsin-Hsi Chen
In the biological domain, extracting newly discovered functional features from the massive literature is a major challenging issue. To automatically annotate Gene References into Function (GeneRIF) in a new literature is the main goal of this paper. We tried to find GRIF words in a training corpus, and then applied these informative words to annotate the GeneRIFs in abstracts with several different weighting schemes. The experiments showed that the Classic Dice score is at most 50.18%, when the weighting schemes proposed in the paper (Hou et al., 2003) were adopted. In contrast, after employing Support Vector Machines (SVMs) and the definition of classes proposed by Jelier et al. (2003), the score greatly improved to 56.86% for Classic Dice (CD). Adopting the same features, SVMs demonstrated advantage over the Naive Bayes Classifier. Finally, the combination of the former two models attained a score of 59.51% for CD.
international conference industrial engineering other applications applied intelligent systems | 2016
Bamfa Ceesay; Wen Juan Hou
To understand and automatically extract information about events presented in a text, semantically meaningful units expressing these events are important. Extracting events and classifying them into event types and subtypes using Natural Language Processing techniques poses a challenging research problem. There is no clear-cut definitions to what an event from a text is and what the optimal representation of semantic units within a given text is. In addition, events in a text can be classified into types and subtypes of events; and a single event can have multiple mentions in a given sentence. In this paper, we propose a model to determine events within a given text and classify them into event types or subtypes and REALIS by the distributional semantic role labeling and neural embedding techniques. For the task of the event nugget detection, we trained a three-layer network to determine the event mentions from texts achieving F1-score of 77.37 % for macro average and 71.10 % for micro average, respectively.
international conference industrial engineering other applications applied intelligent systems | 2016
Wen Juan Hou; Bamfa Ceesay
Gene Regulation Network (GRN) is a graphical representation of the relationship for a collection of regulators that interact with each other and with other substances in the cell to govern the gene expression levels of mRNA and proteins. In this study, we examine the extraction of GRN from literatures using a statistical method. Markovian logic has been used in the natural language processing domain extensively such as in the field of speech recognition. This paper presents an event extraction approach using the Markov’s method and the logical predicates. An event extraction task is modeled into a Markov’s model using the logical predicates and a set of weighted first ordered formulae that defines a distribution of events over a set of ground atoms of the predicates that is specified using the training and development data. The experimental results has a state-of-the-art F-score comparable 2013 BioNLP shared task and gets 81 % precision in forming the gene regulation network. It shows we have a good performance in solving this problem.