Hyunsouk Cho
Pohang University of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hyunsouk Cho.
very large data bases | 2013
Jongwuk Lee; Hyunsouk Cho; Jin-woo Park; Young-rok Cha; Seung-won Hwang; Zaiqing Nie; Ji-Rong Wen
Query result clustering has attracted considerable attention as a means of providing users with a concise overview of results. However, little research effort has been devoted to organizing the query results for entities which refer to real-world concepts, e.g., people, products, and locations. Entity-level result clustering is more challenging because diverse similarity notions between entities need to be supported in heterogeneous domains, e.g., image resolution is an important feature for cameras, but not for fruits. To address this challenge, we propose a hybrid relationship clustering algorithm, called Hydra, using co-occurrence and numeric features. Algorithm Hydra captures diverse user perceptions from co-occurrence and disambiguates different senses using feature-based similarity. In addition, we extend Hydra into
international conference on data engineering | 2012
Jongwuk Lee; Hyunsouk Cho; Seung-won Hwang
database systems for advanced applications | 2014
Kinda El Maarry; Wolf-Tilo Balke; Hyunsouk Cho; Seung-won Hwang; Yukino Baba
{\mathsf{Hydra }_\mathsf{gData }}
IEEE Transactions on Knowledge and Data Engineering | 2014
Jongwuk Lee; Hyunsouk Cho; Sunyou Lee; Seung-won Hwang
international conference on management of data | 2017
Jouyon Park; Hyunsouk Cho; Seung-won Hwang
HydragData with different sources, i.e., entity types and crowdsourcing. Experimental results show that the proposed algorithms achieve effectiveness and efficiency in real-life and synthetic datasets.
international conference on management of data | 2018
Hoyeop Lee; Jong-Seon Park; Hyungjun Kim; Hyunsouk Cho; Geonsoo Kim
Top-k queries have gained considerable attention as an effective means for narrowing down the overwhelming amount of data. This paper studies the problem of constructing an indexing structure that efficiently supports top-k queries for varying scoring functions and retrieval sizes. The existing work can be categorized into three classes: list-, layer-, and view-based approaches. This paper focuses on the layer-based approach, pre-materializing tuples into consecutive multiple layers. The layer-based index enables us to return top-k answers efficiently by restricting access to tuples in the k layers. However, we observe that the number of tuples accessed in each layer can be reduced further. For this purpose, we propose a dual-resolution layer structure. Specifically, we iteratively build coarse-level layers using skylines, and divide each coarse-level layer into fine-level sub layers using convex skylines. The dual-resolution layer is able to leverage not only the dominance relationship between coarse-level layers, named for all-dominance, but also a relaxed dominance relationship between fine-level sub layers, named exists-dominance. Our extensive evaluation results demonstrate that our proposed method significantly reduces the number of tuples accessed than the state-of-the-art methods.
pacific asia conference on knowledge discovery and data mining | 2017
Kyungjae Lee; Hyunsouk Cho; Seung-won Hwang
Crowdsourcing continues to gain more momentum as its potential becomes more recognized. Nevertheless, the associated quality aspect remains a valid concern, which introduces uncertainty in the results obtained from the crowd. We identify the different aspects that dynamically affect the overall quality of a crowdsourcing task. Accordingly, we propose a skill ontology-based model that caters for these aspects, as a management technique to be adopted by crowdsourcing platforms. The model maintains a dynamically evolving ontology of skills, with libraries of standardized and personalized assessments for awarding workers skills. Aligning a worker’s set of skills to that required by a task, boosts the ultimate resulting quality. We visualize the model’s components and workflow, and consider how to guard it against malicious or unqualified workers, whose responses introduce this uncertainty and degrade the overall quality.
IEEE Transactions on Knowledge and Data Engineering | 2017
Jinyoung Yeo; Hyunsouk Cho; Jin-woo Park; Seung-won Hwang
A top- k query retrieves the best \(k\) tuples by assigning scores for each tuple in a target relation with respect to a user-specific scoring function. This paper studies the problem of constructing an indexing structure for supporting top- k queries over varying scoring functions and retrieval sizes. The existing research efforts can be categorized into three approaches: list- , layer- , and view-based approaches. In this paper, we mainly focus on the layer-based approach that pre-materializes tuples into consecutive multiple layers. We first propose a dual-resolution layer that consists of coarse-level and fine-level layers. Specifically, we build coarse-level layers using skylines , and divide each coarse-level layer into fine-level sublayers using convex skylines . To make our proposed dual-resolution layer scalable , we then address the following optimization directions: 1) index construction; 2) disk-based storage scheme; 3) the design of the virtual layer; and 4) index maintenance for tuple updates. Our evaluation results show that our proposed method is more scalable than the state-of-the-art methods.
advances in social networks analysis and mining | 2016
Hyunsouk Cho; Seung-won Hwang
The Financial Entity Identification and Information Integration (FEIII) task aims at the question of understanding relationships among financial entities and their roles using three sentences extracted from each financial contract containing the target word. FEIII task has two challenges - 1) data sparseness: small training sets (9% of test data) and 2) context sparseness: limited context (three sentences). Existing statistical approaches, such as Bayes and TF-IDF, cannot evaluate the imporatance of words unobservged in training data, which is vulnerable to the above challenges. We overcome each challenge by considering 1) the concepts of words from knowledge bases (Probase) in addition to the words themselves (conceptual feature) and 2) word semantics from distributed representations such as word2vec (semantic feature). We empirically evaluate the proposed classification model on the four-class classification (highly relevant, relevant, neutral, and irrelevant), and show that the proposed model increases 18% of F1-score compared to the statistical baselines.
national conference on artificial intelligence | 2018
Jinyoung Yeo; Geungyu Wang; Hyunsouk Cho; Seungtaek Choi; Seung-won Hwang
The Financial Entity Identification and Information Integration (FEIII) is a competition for the understanding relationships between financial entities. To predict competitor relation between two entities, there are three challenges - 1) relevant feature extraction from the various released dataset, 2) missing entity information handling and 3) imbalance of train data handling. To solve these challenges, we propose a model named PREFER which considering 1) relation trend and context feature extraction from the release dataset, 2) K-NN estimation with concept graph of knowledge bases (Probase), and 3) oversampling from the true labeled data. From the model, we increase 34% of F1-score compared to the baseline method.