Is this you? Create Your Porfile

Sin-Jae Kang

Pohang University of Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sin-Jae Kang is active.

Explore More

Publication

Featured researches published by Sin-Jae Kang.

human language technology | 2001

Semi-automatic practical ontology construction by using a thesaurus, computational dictionaries, and large corpora

Sin-Jae Kang; Jong-Hyeok Lee

This paper presents the semi-automatic construction method of a practical ontology by using various resources. In order to acquire a reasonably practical ontology in a limited time and with less manpower, we extend the Kadokawa thesaurus by inserting additional semantic relations into its hierarchy, which are classified as case relations and other semantic relations. The former can be obtained by converting valency information and case frames from previously-built computational dictionaries used in machine translation. The latter can be acquired from concept co-occurrence information, which is extracted automatically from large corpora. The ontology stores rich semantic constraints among 1,110 concepts, and enables a natural language processing system to resolve semantic ambiguities by making inferences with the concept network of the ontology. In our practical machine translation system, our ontology-based word sense disambiguation method achieved an 8.7% improvement over methods which do not use an ontology for Korean translation.

International Conference on U- and E-Service, Science and Technology | 2011

A Comparison of Classifiers for Detecting Hedges

Sin-Jae Kang; In-Su Kang; Seung-Hoon Na

A hedge is a linguistic device used to avoid using a categorical sentence. Hedges can be used to determine whether a sentence is factual by merely regarding a sentence containing hedges as non-factual. In this paper, we perform a comparative experiment of various classification methods for hedge detection. Among four different classification methods, we observe that SVM shows the best performance and that the SVM-based method finally outperforms the best system in the CoNLL2010-ST task.

asia information retrieval symposium | 2004

Estimation of query model from parsimonious translation model

Seung-Hoon Na; In-Su Kang; Sin-Jae Kang; Jong-Hyeok Lee

The KL divergence framework, the extended language modeling approach, have a critical problem with estimation of query model, which is the probabilistic model that encodes users information need. However, at initial retrieval, it is difficult to expand query model using co-occurrence, because the two-dimensional matrix information such as term co-occurrence must be constructed in offline. Especially in large collection, constructing such large matrix of term co-occurrences prohibitively increases time and space complexity. This paper proposes an effective method to construct co-occurrence statistics by employing parsimonious translation model. Parsimonious translation model is a compact version of translation model, and it contains very small number of parameters that includes non-zero probabilities. Parsimonious translation model enables us to enormously reduce the number of remaining terms in document so that co-occurrence statistics can be calculated in tractable time. In experimentations, the results show that query model derived from parsimonious translation model significantly improves baseline language modeling performance.

international conference on computational linguistics | 2002

Word sense disambiguation in a Korean-to-Japanese MT system using neural networks

You-Jin Chung; Sin-Jae Kang; Kyonghi Moon; Jong-Hyeok Lee

This paper presents a method to resolve word sense ambiguity in a Korean-to-Japanese machine translation system using neural networks. The execution of our neural network model is based on the concept codes of a thesaurus. Most previous word sense disambiguation approaches based on neural networks have limitations due to their huge feature set size. By contrast, we reduce the number of features of the network to a practical size by using concept codes as features rather than the lexical words themselves.

international conference on distributed computing systems | 2002

XExplainer: a tool for generating descriptive text from database

Ji-Eun Roh; Sin-Jae Kang; Jong-Hyeok Lee

We focus on how to generate well-written texts to describe an object from a database, and propose several strategies that are needed in generation stages. To build reliable generation rules, we performed corpus analysis through annotating descriptive texts. This paper also describes an implemented text generation system called XExplainer, which can dynamically produce a description of an object in Korean. XExplainer was applied to two domains-a home shopping database and a business administration database-to show that it can be applied to any domain as long as the information is provided in the required format. The generated texts were evaluated by humans using several criteria, such as content completeness, structural coherence, expression conciseness, and text layout.

Literary and Linguistic Computing | 2002

Language Independent and Practical Ontology in Korean–Japanese Machine Translation Systems

Sin-Jae Kang; You-Jin Chung; Jong-Hyeok Lee

This paper presents a semi-automatic ontology construction method using various resources, and an ontology-based word sense disambiguation method in machine translation. To acquire a reasonably practical ontology in limited time and with less manpower, we extend the Kadokawa thesaurus by inserting additional semantic relations into its hierarchy, which are classified as case relations and other semantic relations. The former can be obtained by converting valency information and case frames from previously built computational dictionaries used in machine translation. The latter can be acquired from concept co-occurrence information, which is extracted automatically from large corpora. The ontology stores rich semantic constraints among 1110 concepts, and enables a natural language processing system to resolve semantic ambiguities by making inferences with the concept network of the ontology. In practical machine translation systems, our word sense disambiguation method achieved a 6.0 per cent and 7.9 per cent improvement over methods that do not use an ontology for each Japanese and Korean translation.

NLPRS | 2001