Seung-won Hwang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Seung-won Hwang is active.

Explore More

Publication

Featured researches published by Seung-won Hwang.

string processing and information retrieval | 2007

Efficient text proximity search

Ralf Schenkel; Andreas Broschart; Seung-won Hwang; Martin Theobald; Gerhard Weikum

In addition to purely occurrence-based relevance models, term proximity has been frequently used to enhance retrieval quality of keyword-oriented retrieval systems. While there have been approaches on effective scoring functions that incorporate proximity, there has not been much work on algorithms or access methods for their efficient evaluation. This paper presents an efficient evaluation framework including a proximity scoring function integrated within a top-k query engine for text retrieval. We propose precomputed and materialized index structures that boost performance. The increased retrieval effectiveness and efficiency of our framework are demonstrated through extensive experiments on a very large text benchmark collection. In combination with static index pruning for the proximity lists, our algorithm achieves an improvement of two orders of magnitude compared to a term-based top-k evaluation, with a significantly improved result quality.

ACM Transactions on Information Systems | 2013

Enriching Documents with Examples: A Corpus Mining Approach

Jinhan Kim; Sanghoon Lee; Seung-won Hwang; Sunghun Kim

Software developers increasingly rely on information from the Web, such as documents or code examples on application programming interfaces (APIs), to facilitate their development processes. However, API documents often do not include enough information for developers to fully understand how to use the APIs, and searching for good code examples requires considerable effort. To address this problem, we propose a novel code example recommendation system that combines the strength of browsing documents and searching for code examples and returns API documents embedded with high-quality code example summaries mined from the Web. Our evaluation results show that our approach provides code examples with high precision and boosts programmer productivity.

international conference on data engineering | 2013

Attribute extraction and scoring: A probabilistic approach

Taesung Lee; Zhongyuan Wang; Haixun Wang; Seung-won Hwang

Knowledge bases, which consist of concepts, entities, attributes and relations, are increasingly important in a wide range of applications. We argue that knowledge about attributes (of concepts or entities) plays a critical role in inferencing. In this paper, we propose methods to derive attributes for millions of concepts and we quantify the typicality of the attributes with regard to their corresponding concepts. We employ multiple data sources such as web documents, search logs, and existing knowledge bases, and we derive typicality scores for attributes by aggregating different distributions derived from different sources using different methods. To the best of our knowledge, ours is the first approach to integrate concept- and instance-based patterns into probabilistic typicality scores that scale to broad concept space. We have conducted extensive experiments to show the effectiveness of our approach.

international conference on management of data | 2006

Boolean + ranking: querying a database by k-constrained optimization

Zhen Zhang; Seung-won Hwang; Kevin Chen Chuan Chang; Min Wang; Christian A. Lang; Yuan Chi Chang

The wide spread of databases for managing structured data, compounded with the expanded reach of the Internet, has brought forward interesting data retrieval and analysis scenarios to RDBMS. In such settings, queries often take the form of k-constrained optimization, with a Boolean constraint and a numeric optimization expression as the goal function, retrieving only the top-k tuples. This paper proposes the concept of supporting such queries, as their nature implies, by a functional optimization machinery over the search space of multiple indices. To realize this concept, we combine the dual perspectives of discrete state search (from the view of indices) and continuous function optimization (from the view of goal functions). We present, as the marriage of the two perspectives, the OPT* framework, which encodes k-constrained optimization as an A* search over the composite space of multiple indices, driven by functional optimization for providing tight heuristics. By processing queries as optimization, OPT* significantly outperforms baseline approaches, with up to 3 orders of magnitude margins.

international acm sigir conference on research and development in information retrieval | 2014

Predictive parallelization: taming tail latencies in web search

Myeongjae Jeon; Saehoon Kim; Seung-won Hwang; Yuxiong He; Sameh Elnikety; Alan L. Cox; Scott Rixner

Web search engines are optimized to reduce the high-percentile response time to consistently provide fast responses to almost all user queries. This is a challenging task because the query workload exhibits large variability, consisting of many short-running queries and a few long-running queries that significantly impact the high-percentile response time. With modern multicore servers, parallelizing the processing of an individual query is a promising solution to reduce query execution time, but it gives limited benefits compared to sequential execution since most queries see little or no speedup when parallelized. The root of this problem is that short-running queries, which dominate the workload, do not benefit from parallelization. They incur a large parallelization overhead, taking scarce resources from long-running queries. On the other hand, parallelization substantially reduces the execution time of long-running queries with low overhead and high parallelization efficiency. Motivated by these observations, we propose a predictive parallelization framework with two parts: (1) predicting long-running queries, and (2) selectively parallelizing them. For the first part, prediction should be accurate and efficient. For accuracy, we study a comprehensive feature set covering both term features (reflecting dynamic pruning efficiency) and query features (reflecting query complexity). For efficiency, to keep overhead low, we avoid expensive features that have excessive requirements such as large memory footprints. For the second part, we use the predicted query execution time to parallelize long-running queries and process short-running queries sequentially. We implement and evaluate the predictive parallelization framework in Microsoft Bing search. Our measurements show that under moderate to heavy load, the predictive strategy reduces the 99th-percentile response time by 50% (from 200 ms to 100 ms) compared with prior approaches that parallelize all queries.

extending database technology | 2010

BSkyTree: scalable skyline computation using a balanced pivot selection

Jongwuk Lee; Seung-won Hwang

Skyline queries have gained a lot of attention for multi-criteria analysis in large-scale datasets. While existing skyline algorithms have focused mostly on exploiting data dominance to achieve efficiency, we propose that data incomparability should be treated as another key factor in optimizing skyline computation. Specifically, to optimize both factors, we first identify common modules shared by existing non-index skyline algorithms, and then analyze them to develop a cost model to guide a balanced pivot point selection. Based on the cost model, we lastly implement our balanced pivot selection in two algorithms, BSkyTree-S and BSkyTree-P, treating both dominance and incomparability as key factors. Our experimental results demonstrate that proposed algorithms outperform state-of-the-art skyline algorithms up to two orders of magnitude.

IEEE Transactions on Knowledge and Data Engineering | 2007

Probe Minimization by Schedule Optimization: Supporting Top-K Queries with Expensive Predicates

Seung-won Hwang; Kevin Chen Chuan Chang

This paper addresses the problem of evaluating ranked top-k queries with expensive predicates. As major DBMSs now all support expensive user-defined predicates for Boolean queries, we believe such support for ranked queries can be even more important: first, ranked queries often need to model user-specific concepts of preference, relevance, or similarity, which call for dynamic user-defined functions. Second, middleware systems must incorporate external predicates for integrating autonomous sources typically accessible only by per-object queries. Third, ranked queries often accompany Boolean ranking conditions, which may turn predicates into expensive ones, as the index structure on the predicate built on the base table may be no longer effective in retrieving the filtered objects in order. Fourth, fuzzy joins are inherently expensive, as they are essentially user-defined operations that dynamically associate multiple relations. These predicates, being dynamically defined or externally accessed, cannot rely on index mechanisms to provide zero-time sorted output, and must instead require per-object probe to evaluate. To enable probe minimization, we develop the problem as cost-based optimization of searching over potential probe schedules. In particular, we decouple probe scheduling into object and predicate scheduling problems and develop an analytical object scheduling optimization and a dynamic predicate scheduling optimization, which combined together form a cost-effective probe schedule

symposium on large spatial databases | 2009

Spatial Skyline Queries: An Efficient Geometric Algorithm

Wanbin Son; Mu-Woong Lee; Hee-Kap Ahn; Seung-won Hwang

As more data-intensive applications emerge, advanced retrieval semantics, such as ranking and skylines, have attracted attention. Geographic information systems are such an application with massive spatial data. Our goal is to efficiently support skyline queries over massive spatial data. To achieve this goal, we first observe that the best known algorithm VS 2, despite its claim, may fail to deliver correct results. In contrast, we present a simple and efficient algorithm that computes the correct results. To validate the effectiveness and efficiency of our algorithm, we provide an extensive empirical comparison of our algorithm and VS 2 in several aspects.

ACM Transactions on Database Systems | 2007

Optimizing top-k queries for middleware access: A unified cost-based approach

Seung-won Hwang; Kevin Chen Chuan Chang

This article studies optimizing top-k queries in middlewares. While many assorted algorithms have been proposed, none is generally applicable to a wide range of possible scenarios. Existing algorithms lack both the “generality” to support a wide range of access scenarios and the systematic “adaptivity” to account for runtime specifics. To fulfill this critical lacking, we aim at taking a cost-based optimization approach: By runtime search over a space of algorithms, cost-based optimization is general across a wide range of access scenarios, yet adaptive to the specific access costs at runtime. While such optimization has been taken for granted for relational queries from early on, it has been clearly lacking for ranked queries. In this article, we thus identify and address the barriers of realizing such a unified framework. As the first barrier, we need to define a “comprehensive” space encompassing all possibly optimal algorithms to search over. As the second barrier and a conflicting goal, such a space should also be “focused” enough to enable efficient search. For SQL queries that are explicitly composed of relational operators, such a space, by definition, consists of schedules of relational operators (or “query plans”). In contrast, top-k queries do not have logical tasks, such as relational operators. We thus define the logical tasks of top-k queries as building blocks to identify a comprehensive and focused space for top-k queries. We then develop efficient search schemes over such space for identifying the optimal algorithm. Our study indicates that our framework not only unifies, but also outperforms existing algorithms specifically designed for their scenarios.

automated software engineering | 2009

Adding Examples into Java Documents

Jinhan Kim; Sanghoon Lee; Seung-won Hwang; Sunghun Kim

Code examples play an important role to explain the usage of Application Programming Interfaces (APIs), but most API documents do not provide sufficient code examples. For example, for the JDK 5 documents (JavaDocs), only 2% of APIs have code examples. In this paper, we propose a technique that automatically augments API documents with code examples. Our approach finds and embeds code examples for more than 75% of the APIs in JavaDocs 5.

Explore More