Jaehui Park | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jaehui Park is active.

Explore More

Publication

Featured researches published by Jaehui Park.

Knowledge and Information Systems | 2011

Keyword search in relational databases

Jaehui Park; Sang-goo Lee

This paper surveys research on enabling keyword search in relational databases. We present fundamental characteristics and discuss research dimensions, including data representation, ranking, efficient processing, query representation, and result presentation. Various approaches for developing the search system are described and compared within a common framework. We discuss the evolution of new research strategies to resolve the issues associated with probabilistic models, efficient top-k query processing, and schema analysis in relational databases.

web information and data management | 2008

Web content summarization using social bookmarks: a new approach for social summarization

Jaehui Park; Tomohiro Fukuhara; Ikki Ohmukai; Hideaki Takeda; Sang-goo Lee

An increasing number of Web applications are allowing users to play more active roles for enriching the source content. The enriched data can be used for various applications such as text summarization, opinion mining and ontology creation. In this paper, we propose a novel Web content summarization method that creates a text summary by exploiting user feedback (comments and tags) in a social bookmarking service. We had manually analyzed user feedback in several representative social services including del.icio.us, Digg, YouTube, and Amazon.com. We found that (1) user comments in each social service have its own characteristics with respect to summarization, and (2) a tag frequency rank does not necessarily represent its usefulness for summarization. Based on these observations, we conjecture that user feedback in social bookmarking services is more suitable for summarization than other type of social services. We implemented prototype system called SSNote that analyzes tags and user comments in del.icio.us, and extracts summaries. Performance evaluations of the system were conducted by comparing its output summary with manual summaries generated by human evaluators. Experimental results show that our approach highlights the potential benefits of user feedback in social bookmarking services.

conference on information and knowledge management | 2010

Probabilistic ranking for relational databases based on correlations

Jaehui Park; Sang-goo Lee

This paper proposes a ranking method to exploit statistical correlations among pairs of attribute values in relational databases. For a given query, the correlations of the query are aggregated with each of the attribute values in a tuple to estimate the relevance of that tuple to the query. We extend Bayesian network models to provide a probabilistic ranking function based on a limited assumption of value independence. Experimental results show that our model improves the retrieval effectiveness on real datasets and has a reasonable query processing time compared to related work.

Knowledge and Information Systems | 2014

A graph-theoretic approach to optimize keyword queries in relational databases

Jaehui Park; Sang-goo Lee

Keyword search can provide users an easy method to query large and complex databases without any knowledge of structured query languages or underlying database schema. Most of the existing studies have focused on generating candidate structured queries relevant to keywords. Due to the large size of generated queries, the execution costs may be prohibitive. However, existing studies lack the idea of a generalized method to optimize the plan of the large set of generated queries. In this paper, we introduce a graph-theoretic optimization approach. We propose a general graph model, Weighted Operator Graph, to address the costs of keyword query evaluation plans. The proposed model is flexible to integrate all of the cost-based plans in a uniform way. We define a Keyword Query Optimization Problem based on a theoretical cost model as a graph-theoretic problem and show it to be a NP-hard problem. We propose a greedy heuristic Maximum Propagation that reduces the size of the intermediate result as early as possible. The proposed algorithm allows us to achieve efficiency in terms of query evaluation costs. The experimental studies on both synthetic and real data set results show that our work outperforms the existing work.

Expert Systems With Applications | 2013

Pragmatic correlation analysis for probabilistic ranking over relational data

Jaehui Park; Sang-goo Lee

It is widely recognized that effective ranking methods for relational data (e.g., tuples) enable users to overcome the limitations of the traditional Boolean retrieval model and the hardness of structured query writing. To determine the rank of a tuple, term frequency-based methods, such as tfxidf (term frequencyxinverse document frequency) schemes, have been commonly adopted in the literature by simply considering a tuple as a single document. However, in many cases, we have noted that tfxidf schemes may not produce effective rankings or specific orderings for relational data with categorical attributes, which is pervasive today. To support fundamental aspects of relational data, we apply the notions of correlation analysis to estimate the extent of relationships between queries and data. This paper proposes a probabilistic ranking model to exploit statistical relationships that exist in relational data of categorical attributes. Given a set of query terms, information on correlative attribute values to the query terms is used to estimate the relevance of the tuple to the query. To quantify the information, we compute the extent of the dependency between correlative attribute values on a Bayesian network. Moreover, we avoid the prohibitive cost of computing insignificant ranking features based on a limited assumption of node independence. Our probabilistic ranking model is domain-independent and leverages only data statistics without any prior knowledge such as user query logs. Experimental results show that our work improves the effectiveness of rankings for real-world datasets and has a reasonable query processing efficiency compared to related work.

database systems for advanced applications | 2011

Exploiting correlation to rank database query results

Jaehui Park; Sang-goo Lee

In recent years, effective ranking strategies for relational databases have been extensively studied. Existing approaches have adopted empirical term-weighting strategies called tf×idf (term frequency times inverse document frequency) schemes from the field of information retrieval (IR) without careful consideration of relational model. This paper proposes a novel ranking scheme that exploits the statistical correlations, which represent the underlying semantics of the relational model. We extend Bayesian network models to provide dependence structure in relational databases. Furthermore, a limited assumption of value independence is defined to relax the unrealistic execution cost of the probabilistic model. Experimental results show that our model is competitive in terms of efficiency without losing the quality of query results.

database and expert systems applications | 2010

Ranking objects based on attribute value correlation

Jaehui Park; Sang-goo Lee

There has been a great deal of interest in recent years on ranking query results in relational databases. This paper presents a novel method to rank objects (e.g., tuples) by exploiting the correlations among their attribute values. Given a query, each attribute value is assigned a score according to mutual occurrences with the query and its distribution status in the columns of the attribute. These attribute value scores are aggregated to get a final score for an object. Furthermore, a concept vector is proposed to provide a synopsis of the attribute value in a given database. A concept vector is utilized to get the similar objects. Experimental results demonstrate the performance of our ranking method, RAVC (Ranking with Attribute Value Correlation), in terms of search quality and efficiency.

database and expert systems applications | 2010