Tae-Gil Noh
Kyungpook National University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tae-Gil Noh.
meeting of the association for computational linguistics | 2014
Bernardo Magnini; Roberto Zanoli; Ido Dagan; Kathrin Eichler; Guenter Neumann; Tae-Gil Noh; Sebastian Padó; Asher Stern; Omer Levy
This paper presents the Excitement Open Platform (EOP), a generic architecture and a comprehensive implementation for textual inference in multiple languages. The platform includes state-of-art algorithms, a large number of knowledge resources, and facilities for experimenting and testing innovative approaches. The EOP is distributed as an open source software.
Natural Language Engineering | 2015
Sebastian Padó; Tae-Gil Noh; Asher Stern; Rui Wang; Roberto Zanoli
A key challenge at the core of many Natural Language Processing (NLP) tasks is the ability to determine which conclusions can be inferred from a given natural language text. This problem, called the Recognition of Textual Entailment (RTE) , has initiated the development of a range of algorithms, methods, and technologies. Unfortunately, research on Textual Entailment (TE), like semantics research more generally, is fragmented into studies focussing on various aspects of semantics such as world knowledge, lexical and syntactic relations, or more specialized kinds of inference. This fragmentation has problematic practical consequences. Notably, interoperability among the existing RTE systems is poor, and reuse of resources and algorithms is mostly infeasible. This also makes systematic evaluations very difficult to carry out. Finally, textual entailment presents a wide array of approaches to potential end users with little guidance on which to pick. Our contribution to this situation is the novel EXCITEMENT architecture, which was developed to enable and encourage the consolidation of methods and resources in the textual entailment area. It decomposes RTE into components with strongly typed interfaces. We specify (a) a modular linguistic analysis pipeline and (b) a decomposition of the ‘core’ RTE methods into top-level algorithms and subcomponents. We identify four major subcomponent types, including knowledge bases and alignment methods. The architecture was developed with a focus on generality, supporting all major approaches to RTE and encouraging language independence. We illustrate the feasibility of the architecture by constructing mappings of major existing systems onto the architecture. The practical implementation of this architecture forms the EXCITEMENT open platform. It is a suite of textual entailment algorithms and components which contains the three systems named above, including linguistic-analysis pipelines for three languages (English, German, and Italian), and comprises a number of linguistic resources. By addressing the problems outlined above, the platform provides a comprehensive and flexible basis for research and experimentation in textual entailment and is available as open source software under the GNU General Public License.
international acm sigir conference on research and development in information retrieval | 2009
Tae-Gil Noh; Seong-Bae Park; Hee-Geun Yoon; Sang-Jo Lee; Se-Young Park
This paper proposes a novel method to translate tags attached to multimedia contents for cross-language retrieval. The main issue in this problem is the sense disambiguation of tags given with few textual contexts. In order to solve this problem, the proposed method represents both tags and its translation candidates as networks of co-occurring tags since a network allows richer expression of contexts than other expressions such as co-occurrence vectors. The method translates a tag by selecting the optimal one from possible candidates based on a network similarity even when neither the textual contexts nor sophisticated language resources are available. The experiments on the MIR Flickr-2008 test set show that the proposed method achieves 90.44% accuracy in translating tags from English into German, which is significantly higher than the baseline methods of a frequency based translation and a co-occurrence-based translation.
intelligent user interfaces | 2010
Yong-Jin Han; Tae-Gil Noh; Seong-Bae Park; Se Young Park; Sang-Jo Lee
One of the critical problems in natural language interfaces is the discordance between the expressions covered by the interface and those by the knowledge base. In the graph-based knowledge base such as an ontology, all possible queries can be prepared in advance. As a solution of the discordance problem in natural language interfaces, this paper proposes a method that translates a natural language query into a formal language query such as SPARQL. In this paper, a user query is translated into a formal language by choosing the most appropriate query from the prepared queries. The experimental results show a high accuracy and coverage for the given knowledge base.
Journal of Web Semantics | 2010
Tae-Gil Noh; Seong-Bae Park; Se-Young Park; Sang-Jo Lee
Emergent knowledge does not come from a particular document or a particular knowledge source, but comes from a collection of documents or knowledge sources. This paper proposes a system which combines social web content and semantic web technology to process the emergent knowledge from the blogosphere. The proposed system regards blog postings as experiences of people on particular topics. By annotating postings in the selected domains with ontology vocabularies, the system collects experiences from various people into an ontology about people and experiences. The system processes this ontology with semantic rules to find the emergent knowledge. Users can access previously unavailable facts, concepts and trends which are emerging from social web content by using the proposed system.
joint conference on lexical and computational semantics | 2015
Tae-Gil Noh; Sebastian Padó; Vered Shwartz; Ido Dagan; Vivi Nastase; Kathrin Eichler; Lili Kotlerman; Meni Adler
A major problem in research on Textual Entailment (TE) is the high implementation effort for TE systems. Recently, interoperable standards for annotation and preprocessing have been proposed. In contrast, the algorithmic level remains unstandardized, which makes component re-use in this area very difficult in practice. In this paper, we introduce multi-level alignments as a central, powerful representation for TE algorithms that encourages modular, reusable, multilingual algorithm development. We demonstrate that a pilot open-source implementation of multi-level alignment with minimal features competes with state-of-theart open-source TE engines in three languages.
Engineering Applications of Artificial Intelligence | 2013
Jeong Woo Son; Tae-Gil Noh; Hyun-Je Song; Seong-Bae Park
Program plagiarism detection is a task of detecting plagiarized code pairs among a set of source codes. In this paper, we propose a code plagiarism detection system that uses a parse tree kernel. Our parse tree kernel calculates a similarity value between two source codes in terms of their parse tree similarity. Since parse trees contain the essential syntactic structure of source codes, the system effectively handles structural information. The contributions of this paper are two-fold. First, we propose a parse tree kernel that is optimized for program source code. The evaluation shows that our system based on this kernel outperforms well-known baseline systems. Second, we collected a large number of real-world Java source codes from a university programming class. This test set was manually analyzed and tagged by two independent human annotators to mark plagiarized codes. It can be used to evaluate the performance of various detection systems in real-world environments. The experiments with the test set show that the performance of our plagiarism detection system reaches to 93% level of human annotators.
computational science and engineering | 2009
Tae-Gil Noh; Yong-Jin Han; Jeong-Woo Son; Hyun-Jae Song; Hee-Geun Yoon; Jae-Ahn Lee; Sang-Do Lee; Kye-Sung Kim; Young-Hwa Lee; Seong-Bae Park; Se-Young Park; Sang-Jo Lee
Emergent knowledge does not come from a particular document or a particular knowledge source, but comes from a collection of documents or knowledge sources. This paper proposes a system which combines the social web contents and the semantic web technology to process the emergent knowledge from the blogosphere. The proposed system regards blog postings as experiences of people on particular topics. By annotating postings in the selected domains with ontology vocabularies, the system collects experiences from various people into an ontology about people and experiences. The system processes this ontology with semantic rules to find the emergent knowledge. Users can access previously unavailable facts, concepts and trends which are emerging from system.
Proceedings of the ACM fourth international workshop on Data and text mining in biomedical informatics | 2010
Tae-Gil Noh; Seong-Bae Park; Sang-Jo Lee
This paper proposes an unsupervised word sense disambiguation method for the biomedical domain. In this paper, a network representation of co-occurrence data is first defined to represent both word senses and word contexts. The representation expresses textual context observed around a certain term as a network, where nodes are terms and edges are the number of co-occurrences between connected terms. A graph kernel is adopted as a similarity measure between terms and senses represented in networks. Candidate senses and ambiguous contexts are then compared directly in the representation space to resolve the word sense. It only needs the sense definitions and a large amount of unlabeled texts. The experiments in the biomedical domain show that the method outperforms a baseline method of vector representation. The performance of the proposed method is comparable to the state-of-the-art unsupervised word sense disambiguation methods.
international conference on the computer processing of oriental languages | 2009
Tae-Gil Noh; Yong-Jin Han; Seong-Bae Park; Se-Young Park
For casual web users, a natural language is more accessible than formal query languages. However, understanding of a natural language query is not trivial for computer systems. This paper proposes a method to parse and understand Korean natural language queries with local grammars. A local grammar is a formalism that can model syntactic structures and synonymous phrases. With local grammars, the system directly extracts users intentions from natural language queries. With 163 hand-crafted local grammar graphs, the system could attain a good level of accuracy and meaningful coverage over IT company/people domain.