Is this you? Create Your Porfile

Seung-Hoon Na

Electronics and Telecommunications Research Institute

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Seung-Hoon Na is active.

Explore More

Publication

Featured researches published by Seung-Hoon Na.

Information Processing and Management | 2009

On co-authorship for author disambiguation

In-Su Kang; Seung-Hoon Na; Seungwoo Lee; Hanmin Jung; Pyung Kim; Won-Kyung Sung; Jong-Hyeok Lee

Author name disambiguation deals with clustering the same-name authors into different individuals. To attack the problem, many studies have employed a variety of disambiguation features such as coauthors, titles of papers/publications, topics of articles, emails/affiliations, etc. Among these, co-authorship is the most easily accessible and influential, since inter-person acquaintances represented by co-authorship could discriminate the identities of authors more clearly than other features. This study attempts to explore the net effects of co-authorship on author clustering in bibliographic data. First, to handle the shortage of explicit coauthors listed in known citations, a web-assisted technique of acquiring implicit coauthors of the target author to be disambiguated is proposed. Then, a coauthor disambiguation hypothesis that the identity of an author can be determined by his/her coauthors is examined and confirmed through a variety of author disambiguation experiments.

Information Processing and Management | 2007

Cluster-based patent retrieval

In-Su Kang; Seung-Hoon Na; Jungi Kim; Jong-Hyeok Lee

Through the recent NTCIR workshops, patent retrieval casts many challenging issues to information retrieval community. Unlike newspaper articles, patent documents are very long and well structured. These characteristics raise the necessity to reassess existing retrieval techniques that have been mainly developed for structure-less and short documents such as newspapers. This study investigates cluster-based retrieval in the context of invalidity search task of patent retrieval. Cluster-based retrieval assumes that clusters would provide additional evidence to match users information need. Thus far, cluster-based retrieval approaches have relied on automatically-created clusters. Fortunately, all patents have manually-assigned cluster information, international patent classification codes. International patent classification is a standard taxonomy for classifying patents, and has currently about 69,000 nodes which are organized into a five-level hierarchical system. Thus, patent documents could provide the best test bed to develop and evaluate cluster-based retrieval techniques. Experiments using the NTCIR-4 patent collection showed that the cluster-based language model could be helpful to improving the cluster-less baseline language model.

european conference on information retrieval | 2009

Improving Opinion Retrieval Based on Query-Specific Sentiment Lexicon

Seung-Hoon Na; Yeha Lee; Sang-Hyob Nam; Jong-Hyeok Lee

Lexicon-based approaches have been widely used for opinion retrieval due to their simplicity. However, no previous work has focused on the domain-dependency problem in opinion lexicon construction. This paper proposes simple feedback-style learning for query-specific opinion lexicon using the set of top-retrieved documents in response to a query. The proposed learning starts from the initial domain-independent general lexicon and creates a query-specific lexicon by re-updating the opinion probability of the initial lexicon based on top-retrieved documents. Experimental results on recent TREC test sets show that the query-specific lexicon provides a significant improvement over previous approaches, especially in BLOG-06 topics.

european conference on information retrieval | 2009

DiffPost: Filtering Non-relevant Content Based on Content Difference between Two Consecutive Blog Posts

Sang-Hyob Nam; Seung-Hoon Na; Yeha Lee; Jong-Hyeok Lee

One of the important issues in blog search engines is to extract the cleaned text from blog post. In practice, this extraction process is confronted with many non-relevant contents in the original blog post, such as menu, banner, site description, etc, causing the ranking be less-effective. The problem is that these non-relevant contents are not encoded in a unified way but encoded in many different ways between blog sites. Thus, the commercial vendor of blog sites should consider tuning works such as making human-driven rules for eliminating these non-relevant contents for all blog sites. However, such tuning is a very inefficient process. Rather than this labor-intensive method, this paper first recognizes that many of these non-relevant contents are not changed between several consequent blog posts, and then proposes a simple and effective DiffPost algorithm to eliminate them based on content difference between two consequent blog posts in the same blog site. Experimental result in TREC blog track is remarkable, showing that the retrieval system using DiffPost makes an important performance improvement of about 10% MAP (Mean Average Precision) increase over that without DiffPost.

european conference on information retrieval | 2008

Improving term frequency normalization for multi-topical documents and application to language modeling approaches

Seung-Hoon Na; In-Su Kang; Jong-Hyeok Lee

Term frequency normalization is a serious issue since lengths of documents are various. Generally, documents become long due to two different reasons - verbosity and multi-topicality. First, verbosity means that the same topic is repeatedly mentioned by terms related to the topic, so that term frequency is more increased than the well-summarized one. Second, multi-topicality indicates that a document has a broad discussion of multi-topics, rather than single topic. Although these document characteristics should be differently handled, all previous methods of term frequency normalization have ignored these differences and have used a simplified length-driven approach which decreases the term frequency by only the length of a document, causing an unreasonable penalization. To attack this problem, we propose a novel TF normalization method which is a type of partially-axiomatic approach. We first formulate two formal constraints that the retrieval model should satisfy for documents having verbose and multitopicality characteristic, respectively. Then, we modify language modeling approaches to better satisfy these two constraints, and derive novel smoothing methods. Experimental results show that the proposed method increases significantly the precision for keyword queries, and substantially improves MAP (Mean Average Precision) for verbose queries.

international acm sigir conference on research and development in information retrieval | 2009

A 2-poisson model for probabilistic coreference of named entities for improved text retrieval

Seung-Hoon Na; Hwee Tou Ng

Text retrieval queries frequently contain named entities. The standard approach of term frequency weighting does not work well when estimating the term frequency of a named entity, since anaphoric expressions (like he, she, the movie, etc) are frequently used to refer to named entities in a document, and the use of anaphoric expressions causes the term frequency of named entities to be underestimated. In this paper, we propose a novel 2-Poisson model to estimate the frequency of anaphoric expressions of a named entity, without explicitly resolving the anaphoric expressions. Our key assumption is that the frequency of anaphoric expressions is distributed over named entities in a document according to the probabilities of whether the document is elite for the named entities. This assumption leads us to formulate our proposed Co-referentially Enhanced Entity Frequency (CEEF). Experimental results on the text collection of TREC Blog Track show that CEEF achieves significant and consistent improvements over state-of-the-art retrieval methods using standard term frequency estimation. In particular, we achieve a 3% increase of MAP over the best performing run of TREC 2008 Blog Track.

asia information retrieval symposium | 2008

Completely-arbitrary passage retrieval in language modeling approach

Seung-Hoon Na; In-Su Kang; Yeha Lee; Jong-Hyeok Lee

Passage retrieval has been expected to be an alternative method to re-solve length-normalization problem, since passages have more uniform lengths and topics, than documents. An important issue in the passage retrieval is to determine the type of the passage. Among several different passage types, the arbitrary passage type which dynamically varies according to query has shown the best performance. However, the previous arbitrary passage type is not fully examined, since it still uses the fixed-length restriction such as n consequent words. This paper proposes a new type of passage, namely completely-arbitrary passages by eliminating all possible restrictions of passage on both lengths and starting positions, and by extremely relaxing the type of the original arbitrary passage. The main advantage using completely-arbitrary passages is that the proximity feature of query terms can be well-supported in the passage retrieval, while the non-completely arbitrary passage cannot clearly support. Experimental result extensively shows that the passage retrieval using the completely-arbitrary passage significantly improves the document retrieval, as well as the passage retrieval using previous non-completely arbitrary passages, on six standard TREC test collections, in the context of language modeling approaches.

applications of natural language to data bases | 2004

Lightweight Natural Language Database Interfaces

In-Su Kang; Seung-Hoon Na; Jong-Hyeok Lee; Gijoo Yang

Most natural language database interfaces suffer from the translation knowledge portability problem, and are vulnerable to ill-formed questions because of their deep analysis. To alleviate those problems, this paper proposes a lightweight approach to natural language interfaces, where translation knowledge is semi-automatically acquired and user questions are only syntactically analyzed. For the acquisition of translation knowledge, first, a target database is reverse-engineered into a physical database schema on which domain experts annotate linguistic descriptions to produce a pER (physically-derived Entity-Relationship) schema. Next, from the pER schema, initial translation knowledge is automatically extracted. Then, it is extended with synonyms from lexical databases. In the stage of question answering, this semi-automatically constructed translation knowledge is then used to resolve translation ambiguities.

international acm sigir conference on research and development in information retrieval | 2011

Enriching document representation via translation for improved monolingual information retrieval

Seung-Hoon Na; Hwee Tou Ng

Word ambiguity and vocabulary mismatch are critical problems in information retrieval. To deal with these problems, this paper proposes the use of translated words to enrich document representation, going beyond the words in the original source language to represent a document. In our approach, each original document is automatically translated into an auxiliary language, and the resulting translated document serves as a semantically enhanced representation for supplementing the original bag of words. The core of our translation representation is the expected term frequency of a word in a translated document, which is calculated by averaging the term frequencies over all possible translations, rather than focusing on the 1-best translation only. To achieve better efficiency of translation, we do not rely on full-fledged machine translation, but instead use monotonic translation by removing the time-consuming reordering component. Experiments carried out on standard TREC test collections show that our proposed translation representation leads to statistically significant improvements over using only the original language of the document collection.

asia information retrieval symposium | 2005

An empirical study of query expansion and cluster-based retrieval in language modeling approach

Seung-Hoon Na; In-Su Kang; Ji-Eun Roh; Jong-Hyeok Lee

In information retrieval, the word mismatch problem is a critical issue. To resolve the problem, several techniques have been developed, such as query expansion, cluster-based retrieval, and dimensionality reduction. Of these techniques, this paper performs an empirical study on query expansion and cluster-based retrieval. We examine the effect of using parsimony in query expansion and the effect of clustering algorithms in cluster-based retrieval. In addition, query expansion and cluster-based retrieval are compared, and their combinations are evaluated in terms of retrieval performance. By performing experimentation on seven test collections of NTCIR and TREC, we conclude that 1) query expansion using parsimony is well performed, 2) cluster-based retrieval by agglomerative clustering is better than that by partitioning clustering, and 3) query expansion is generally more effective than cluster-based retrieval in resolving the word-mismatch problem, and finally 4) their combinations are effective when each method significantly improves baseline performance.

Explore More