Is this you? Create Your Porfile

Ho-Seop Choe

Information Technology University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ho-Seop Choe is active.

Explore More

Publication

Featured researches published by Ho-Seop Choe.

Journal of Information Processing Systems | 2009

Automatic In-Text Keyword Tagging based on Information Retrieval

Jinsuk Kim; Du-Seok Jin; Kwang-Young Kim; Ho-Seop Choe

Abstract: As shown in Wikipedia, tagging or cross-linking through major keywords in a document collection improves not only the readability of documents but also responsive and adaptive navigation among related documents. In recent years, the Semantic Web has increased the importance of social tagging as a key feature of the Web 2.0 and, as its crucial phenotype, Tag Cloud has emerged to the public. In this paper we provide an efficient method of automated in-text keyword tagging based on large-scale controlled term collection or keyword dictionary, where the computational complexity of O(mN) – if a pattern matching algorithm is used – can be reduced to O(mlogN) – if an Information Retrieval technique is adopted – while m is the length of target document and N is the total number of candidate terms to be tagged. The result shows that automatic in-text tagging with keywords filtered by Information Retrieval speeds up to about 6 ~ 40 times compared with the fastest pattern matching algorithm.

Journal of computing science and engineering | 2009

HKIB-20000 & HKIB-40075: Hangul Benchmark Collections for Text Categorization Research

Jinsuk Kim; Ho-Seop Choe; Beom-Jong You; Jeong-Hyun Seo; Suk-Hoon Lee; Dong-Yul Ra

The HKIB, or Hankookilbo, test collections are two archives of Korean newswire stories manually categorized with semi-hierarchical or hierarchical category taxonomies. The base newswire stories were made available by the Hankook Ilbo (The Korea Daily) for research purposes. At first, Chungnam National University and KISTI collaborated to manually tag 40,075 news stories with categories by semi-hierarchical and balanced three-level classification scheme, where each news story has only one level-3 category (single-labeling). We refer to this original data set as HKIB-40075 test collection. And then Yonsei University and KISTI collaborated to select 20,000 newswire stories from the HKIB-40075 test collection, to rearrange the classification scheme to be fully hierarchical but unbalanced, and to assign one or more categories to each news story (multi-labeling). We refer to this modified data set as HKIB-20000 test collection. We benchmark a k-NN categorization algorithm both on HKIB-20000 and on HKIB-40075, illustrating properties of the collections, providing baseline results for future studies, and suggesting new directions for further research on Korean text categorization problem.

text speech and dialogue | 2007

On the evaluation of Korean wordnet

Altangerel Chagnaa; Ho-Seop Choe; Cheol-Young Ock; Hwa-Mook Yoon

WordNet has become an important and useful resource for the natural language processing field. Recently, many countries have been developing their own WordNet. In this paper we show an evaluation of the Korean WordNet (U-WIN). The purpose of the work is to study how well the manually created lexical taxonomy U-WIN is built. Evaluation is done level by level, and the reason for selecting words for each level is that we want to compare each level and to find relations between them. As a result the words at a certain level (level 6) give the best score, for which we can make a conclusion that the words at this level are better organized than those at other levels. The score decreases as the level goes up or down from this particular level.

international conference on advanced language processing and web information technology | 2007

Toward DB-IR Integration: Per-Document Basis Transactional Index Maintenance

Jinsuk Kim; Du-Seok Jin; Yun-Soo Choi; Chang-Hoo Jeong; Kwang-Young Kim; Sung-Pil Choi; Min-Ho Lee; Min-Hee Cho; Ho-Seop Choe; Hwa-Mook Yoon; Jeong-Hyun Seo

While information retrieval(IR) and databases(DB) have been developed independently, there have been emerging requirements that both data management and efficient text retrieval should be supported simultaneously in an information system such as health care systems, bulletin boards, XML data management, and digital libraries. Recently DB-IR integration issue has been budded in the research field. The great divide between DB and IR has caused different manners in index maintenance for newly arriving documents. While DB has extended its SQL layer to cope with text fields due to lack of intact mechanism to build IR-like index, IR usually treats a block of new documents as a logical unit of index maintenance since it has no concept of integrity constraint. However, towards DB-IR integration, a transaction on adding or updating a document should include maintenance of the postings lists accompanied by the document - hence per-document basis transactional index maintenance. In this paper, performance of a few strategies for per-document basis transaction for inserting documents -- direct index update, stand-alone auxiliary index and pulsing auxiliary index - will be evaluated. The result tested on the KRISTAL-IRMS shows that the pulsing auxiliary strategy, where long postings lists in the auxiliary index are in-place updated to the main index whereas short lists are directly updated in the auxiliary index, can be a challenging candidate for text field indexing in DB-IR integration.

international conference on advanced language processing and web information technology | 2008

Finding Similar Texts Using U-WIN

Kang-seop Shim; Cheol-Young Ock; Dong-Meong Kim; Ho-Seop Choe; Chang-Hwan Kim

Many researches in foreign country for finding similar texts are in progress using the semantic language resources like WordNet. However, in the domestic situation, the language resources like WordNet are still insufficient and so, researches for finding similar texts methods based on it or the methods for utilizing it are insufficient, too. Most of the previous domestic researches for finding similar texts used only the words that occur in texts. In this paper, we propose the semantic based method of finding similar texts by determining the meaning of the words, using semantic similarity measurement with U-WIN.

knowledge science engineering and management | 2007

Extracting features for verifying WordNet

Altangerel Chagnaa; Cheol-Young Ock; Ho-Seop Choe

WordNet is a semantic lexicon for the English language and many countries have been developing their own WordNet. Almost, all of the WordNets are manually built and unfortunately these WordNets are not verified and are being used in many knowledge-based applications. In this paper we aimed at the clustering based verification of a manually built lexical taxonomy WordNet, namely the Korean WordNet, U-WIN. For this purpose two kinds of clustering methods are used: K-Means approach and ICA based approach. As a result the ICA based approach gives better result, and it shows very effective characteristic for extracting features.

international conference on advanced language processing and web information technology | 2008

Development of Korean Concept & Instance Classification System

Young-Jun Bae; Cheol-Young Ock; Ho-Seop Choe; Wang-Woo Lee; Haw-Mook Yoon

The manual ontology construction can enhance accuracy of ontology construction than an automatic ontology construction. But much time and effort are needed. And it is hard to add the newly created concept, the instance and relation to the ontology one-by-one. So, the automatic construction of ontology is needed and the tools for an auto-construction and systems are needed. In this paper, we studied on the concept/instance automatic classification which is one of them. In order to classification of concept/instance, we made the concept rules and the instance rules using the patterns in terminology dictionary items. And a system was built. The precision of classification result of a concept and instance is 97.7%, 86.4%. The whole precision is 92.1%.

international conference on advanced language processing and web information technology | 2007

A Terminology Tagging System Using Knowledge in an Encyclopedia

Young-Jun Bae; Ji-Hui Im; Cheol-Young Ock; Kang-seop Shim; Dong-Myoung Kim; Ho-Seop Choe

In this paper, we developed a terminology tagging system using knowledge in an encyclopedia on web.Using the encyclopedia we first of all, constructed a terminology tagged corpus. The terminology tag sets consist of category information of terminology and identification in the encyclopedia. The corpus was used to extract context rules and heuristic rules for terminology tagging system and to be as an answer set for evaluating the system.The context rules and heuristic rules are automatically extracted from the terminology tagged corpus and heuristic rules according to the syntactic patterns and morphological characteristics. The terminology tagging system uses the context rules, heuristic rules and dictionary of postposition and ending of a word instead of morphological analyzer for faster system.The terminology tagging system is evaluated using the tagged corpus and its precision is resulted in 98.0%.

asia information retrieval symposium | 2005

Extracting the significant terms from a sentence-term matrix by removal of the noise in term usage

Chang Beom Lee; Ho-Seop Choe; Hyuk Ro Park; Cheol-Young Ock

In this paper, we propose an approach to extracting the significant terms in a document by the quantification methods which are both singular value decomposition (SVD) and principal component analysis (PCA). The SVD can remove the noise of variability in term usage of an original sentence-term matrix by using the singular values acquired after computing the SVD. This adjusted sentence-term matrix, which have removed its noisy usage of terms, can be used to perform the PCA, since the dimensionality of the revised matrix is the same as that of the original. Since the PCA can be used to extract the significant terms on the basis of the eigenvalue-eigenvector pairs for the sentence-term matrix, the extracted terms by the revised matrix instead of the original can be regarded as more effective or appropriate. Experimental results on Korean newspaper articles in automatic summarization show that the proposed method is superior to that over the only PCA.

Journal of KIISE:Software and Applications | 2008