Kyung-Soon Lee | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kyung-Soon Lee is active.

Explore More

Publication

Featured researches published by Kyung-Soon Lee.

international acm sigir conference on research and development in information retrieval | 2008

A cluster-based resampling method for pseudo-relevance feedback

Kyung-Soon Lee; W. Bruce Croft; James Allan

Typical pseudo-relevance feedback methods assume the top-retrieved documents are relevant and use these pseudo-relevant documents to expand terms. The initial retrieval set can, however, contain a great deal of noise. In this paper, we present a cluster-based resampling method to select better pseudo-relevant documents based on the relevance model. The main idea is to use document clusters to find dominant documents for the initial retrieval set, and to repeatedly feed the documents to emphasize the core topics of a query. Experimental results on large-scale web TREC collections show significant improvements over the relevance model. For justification of the resampling approach, we examine relevance density of feedback documents. A higher relevance density will result in greater retrieval accuracy, ultimately approaching true relevance feedback. The resampling approach shows higher relevance density than the baseline relevance model on all collections, resulting in better retrieval accuracy in pseudo-relevance feedback. This result indicates that the proposed method is effective for pseudo-relevance feedback.

Information Processing and Management | 2001

Re-ranking model based on document clusters

Kyung-Soon Lee; Yc Park; Key-Sun Choi

Abstract In this paper, we describe a model of information retrieval system that is based on a document re-ranking method using document clusters. In the first step, we retrieve documents based on the inverted-file method. Next, we analyze the retrieved documents using document clusters, and re-rank them. In this step, we use static clusters and dynamic cluster view. Consequently, we can produce clusters that are tailored to characteristics of the query. We focus on the merits of the inverted-file method and cluster analysis. In other words, we retrieve documents based on the inverted-file method and analyze all terms in document based on the cluster analysis. By these two steps, we can get the retrieved results which are made by the consideration of the context of all terms in a document as well as query terms. We will show that our method achieves significant improvements over the method based on similarity search ranking alone.

Information Processing and Management | 2013

A deterministic resampling method using overlapping document clusters for pseudo-relevance feedback

Kyung-Soon Lee; W. Bruce Croft

Typical pseudo-relevance feedback methods assume the top-retrieved documents are relevant and use these pseudo-relevant documents to expand terms. The initial retrieval set can, however, contain a great deal of noise. In this paper, we present a cluster-based resampling method to select novel pseudo-relevant documents based on Lavrenkos relevance model approach. The main idea is to use overlapping clusters to find dominant documents for the initial retrieval set, and to repeatedly use these documents to emphasize the core topics of a query. The proposed resampling method can skip some documents in the initial high-ranked documents and deterministically construct overlapping clusters as sampling units. The hypothesis behind using overlapping clusters is that a good representative document for a query may have several nearest neighbors with high similarities, participating in several different clusters. Experimental results on large-scale web TREC collections show significant improvements over the baseline relevance model. To justify the proposed approach, we examine the relevance density and redundancy ratio of feedback documents. A higher relevance density will result in greater retrieval accuracy, ultimately approaching true relevance feedback. The resampling approach shows higher relevance density than the baseline relevance model on all collections, resulting in better retrieval accuracy in pseudo-relevance feedback.

Information Processing and Management | 2004

Implicit ambiguity resolution using incremental clustering in cross-language information retrieval

Kyung-Soon Lee; Kyo Kageura; Key-Sun Choi

This paper presents a method to implicitly resolve ambiguities using dynamic incremental clustering in cross-language information retrieval (CLIR) such as Korean-to-English and Japanese-to-English CLIR. The main objective of this paper shows that document clusters can effectively resolve the ambiguities tremendously increased in translated queries as well as take into account the context of all the terms in a document. In the framework we propose, a query in Korean/Japanese is first translated into English by looking up bilingual dictionaries, then documents are retrieved for the translated query terms based on the vector space retrieval model or the probabilistic retrieval model. For the top-ranked retrieved documents, query-oriented document clusters are incrementally created and the weight of each retrieved document is recalculated by using the clusters. In the experiment based on TREC CLIR test collection, our method achieved 39.41% and 36.79% improvement for translated queries without ambiguity resolution in Korean-to-English CLIR, and 17.89% and 30.46% improvements in Japanese-to-English CLIR, on the vector space retrieval and on the probabilistic retrieval, respectively. Our method achieved 12.30% improvement for all translation queries, compared with blind feedback for the probabilistic retrieval in Korean-to-English CLIR. These results indicate that cluster analysis help to resolve ambiguity.

DaEng | 2014

A Graph-Based Reliable User Classification

Bayar Tsolmon; Kyung-Soon Lee

When some hot social issue or event occurs, it will significantly increase the number of comments and retweet on that day on Twitter. However, as the amount of SNS data increases, the noise also increases synchronously, thus a reliable user classification method is being required. In this paper, we classify the users who are interested in the issue as “socially well-known user” and “reliable and highly active user”. “A graph-based user reliability measurement” and “Weekly user activity measurement” are introduced to classify users who are interested in the issue. Eight of social issues were experimented in Twitter data to verify validity of the proposed method. The top 10 results of the experiment showed 76.8 % of performance in average precision (P@10). The experimental results show that the proposed method is effective for classifying users in Twitter corpus.

applications of natural language to data bases | 2012

Extracting social events based on timeline and sentiment analysis in twitter corpus

Bayar Tsolmon; A-Rong Kwon; Kyung-Soon Lee

We propose a novel method for extracting social events based on timeline and sentiment analysis from social streams such as Twitter. When a big social issue or event occurs, it tends to dramatically increase in the number of tweets. Users write tweets to express their opinions. Our method uses these timeline and sentiment properties of social media streams to extract social events. On timelines term significance is calculated based on Chi-square measure. Evaluating the method on Korean tweet collection for 30 events, our method achieved 94.3% in average precision in the top 10 extracted events. The result indicates that our method is effective for social event extraction.

meeting of the association for computational linguistics | 2000

Term recognition using technical dictionary hierarchy

Jong-Hoon Oh; Kyung-Soon Lee; Key-Sun Choi

In recent years, statistical approaches on ATR (Automatic Term Recognition) have achieved good results. However, there are scopes to improve the performance in extracting terms still further. For example, domain dictionaries can improve the performance in ATR. This paper focuses on a method for extracting terms using a dictionary hierarchy. Our method produces relatively good results for this task.

international acm sigir conference on research and development in information retrieval | 2014

An event extraction model based on timeline and user analysis in Latent Dirichlet allocation

Bayar Tsolmon; Kyung-Soon Lee

Social media such as Twitter has come to reflect the reaction of the general public to major events. Since posts are short and noisy, it is hard to extract reliable events based on word frequency. Even though an event term appears in a particularly low frequency, as long as at least one reliable user mentions the term, it should be extracted. This paper proposes an event extraction method which combines user reliability and timeline analysis. The Latent Dirichlet Allocation (LDA) topic model is adapted with the weights of event terms on timeline and reliable users to extract social events. The reliable users are detected on Twitter according to their tweeting behaviors: socially well-known users and active users. Reliable and low-frequency events can be detected based on reliable users In order to see the effectiveness of the proposed method, experiments are conducted on a Korean tweet collection; the proposed model achieved 72% in precision. This shows that the LDA with timeline and reliable users is effective for extracting events on the Twitter test collection.

international conference on computational linguistics | 2002

Implicit ambiguity resolution using incremental clustering in Korean-to-English cross-language information retrieval

Kyung-Soon Lee; Kyo Kageura; Key-Sun Choi

This paper presents a method to implicitly resolve ambiguities using dynamic incremental clustering in Korean-to-English cross-language information retrieval. In the framework we propose, a query in Korean is first translated into English by looking up Korean-English dictionary, then documents are retrieved based on the vector space retrieval for the translated query terms. For the top-ranked retrieved documents, query-oriented document clusters are incrementally created and the weight of each retrieved document is re-calculated by using clusters. In experiment on TREC-6 CLIR test collection, our method achieved 28.29% performance improvement for translated queries without ambiguity resolution for queries. This corresponds to 97.27% of the monolingual performance for original queries. When we combine our method with query ambiguity resolution, our method even outperforms the monolingual retrieval.

grid and pervasive computing | 2013

Follower Classification through Social Network Analysis in Twitter

Jae-Wook Seol; Kwang-Yong Jeong; Kyung-Soon Lee

Through ‘Twitter’, one of the Social Network Service, people can have relationships by using ‘Follow’, a function of Twitter. Every user has different purposes, so there are various ‘Followers’, These Followers follow somebody in favor of them or just to support them without reasons or to criticize or watch one’s behavior or tweet(one’s comments). In this paper, a Model is suggested that why they follow certain users by using network relations between followers. User’s influential supporters and influential non-supporters are extracted and then supporters, neutrals, and non-supporters are classified by follower’s retweet information, profile and recent tweet sentiment analysis. In order to verify this suggestion’s validity, random 30,000 users who follow one of the 5 politicians are extracted to experiment. After the experiment, I got to know that supports from influential support-followers and influential nonsupport- followers and non-support-followers classification was effective.

Explore More