Is this you? Create Your Porfile

Hwanjo Yu

Pohang University of Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hwanjo Yu is active.

Explore More

Publication

Featured researches published by Hwanjo Yu.

international conference on data engineering | 2013

Scalable and parallelizable processing of influence maximization for large-scale social networks?

Jinha Kim; Seungkeol Kim; Hwanjo Yu

As social network services connect people across the world, influence maximization, i.e., finding the most influential nodes (or individuals) in the network, is being actively researched with applications to viral marketing. One crucial challenge in scalable influence maximization processing is evaluating influence, which is #P-hard and thus hard to solve in polynomial time. We propose a scalable influence approximation algorithm, Independent Path Algorithm (IPA) for Independent Cascade (IC) diffusion model. IPA efficiently approximates influence by considering an independent influence path as an influence evaluation unit. IPA are also easily parallelized by simply adding a few lines of OpenMP meta-programming expressions. Also, overhead of maintaining influence paths in memory is relieved by safely throwing away insignificant influence paths. Extensive experiments conducted on large-scale real social networks show that IPA is an order of magnitude faster and uses less memory than the state of the art algorithms. Our experimental results also show that parallel versions of IPA speeds up further as the number of CPU cores increases, and more speed-up is achieved for larger datasets. The algorithms have been implemented in our demo application for influence maximization (available at http://dm.postech.ac.kr/ipa demo), which efficiently finds the most influential nodes in a social network.

conference on recommender systems | 2016

Convolutional Matrix Factorization for Document Context-Aware Recommendation

Dong Hyun Kim; Chanyoung Park; Jinoh Oh; Sungyoung Lee; Hwanjo Yu

Sparseness of user-to-item rating data is one of the major factors that deteriorate the quality of recommender system. To handle the sparsity problem, several recommendation techniques have been proposed that additionally consider auxiliary information to improve rating prediction accuracy. In particular, when rating data is sparse, document modeling-based approaches have improved the accuracy by additionally utilizing textual data such as reviews, abstracts, or synopses. However, due to the inherent limitation of the bag-of-words model, they have difficulties in effectively utilizing contextual information of the documents, which leads to shallow understanding of the documents. This paper proposes a novel context-aware recommendation model, convolutional matrix factorization (ConvMF) that integrates convolutional neural network (CNN) into probabilistic matrix factorization (PMF). Consequently, ConvMF captures contextual information of documents and further enhances the rating prediction accuracy. Our extensive evaluations on three real-world datasets show that ConvMF significantly outperforms the state-of-the-art recommendation models even when the rating data is extremely sparse. We also demonstrate that ConvMF successfully captures subtle contextual difference of a word in a document. Our implementation and datasets are available at http://dm.postech.ac.kr/ConvMF.

knowledge discovery and data mining | 2011

Protecting location privacy using location semantics

Byoungyoung Lee; Jinoh Oh; Hwanjo Yu; Jong Kim

As the use of mobile devices increases, a location-based service (LBS) becomes increasingly popular because it provides more convenient context-aware services. However, LBS introduces problematic issues for location privacy due to the nature of the service. Location privacy protection methods based on k-anonymity and l-diversity have been proposed to provide anonymized use of LBS. However, the k-anonymity and l-diversity methods still can endanger the users privacy because location semantic information could easily be breached while using LBS. This paper presents a novel location privacy protection technique, which protects the location semantics from an adversary. In our scheme, location semantics are first learned from location data. Then, the trusted-anonymization server performs the anonymization using the location semantic information by cloaking with semantically heterogeneous locations. Thus, the location semantic information is kept secure as the cloaking is done with semantically heterogeneous locations and the true location information is not delivered to the LBS applications. This paper proposes algorithms for learning location semantics and achieving semantically secure cloaking.

international conference on data mining | 2011

Novel Recommendation Based on Personal Popularity Tendency

Jinoh Oh; Sun Park; Hwanjo Yu; Min Song; Seung-Taek Park

Recently, novel recommender systems have attracted considerable attention in the research community. Recommending popular items may not always satisfy users. For example, although most users likely prefer popular items, such items are often not very surprising or novel because users may already know about the items. Also, such recommender systems hardly satisfy a group of users who prefer relatively obscure items. Existing novel recommender systems, however, still recommend mainly popular items or degrade the quality of recommendation. They do so because they do not consider the balance between novelty and preference-based recommendation. This paper proposes an efficient novel-recommendation method called Personal Popularity Tendency Matching (PPTM) which recommends novel items by considering an individuals Personal Popularity Tendency (or PPT). Considering PPT helps to diversify recommendations by reasonably penalizing popular items while improving the recommendation accuracy. We experimentally show that the proposed method, PPTM, is better than other methods in terms of both novelty and accuracy.

international conference on management of data | 2014

OPT: a new framework for overlapped and parallel triangulation in large-scale graphs

Jin-Ha Kim; Wook-Shin Han; Sangyeon Lee; Kyungyeol Park; Hwanjo Yu

Graph triangulation, which finds all triangles in a graph, has been actively studied due to its wide range of applications in the network analysis and data mining. With the rapid growth of graph data size, disk-based triangulation methods are in demand but little researched. To handle a large-scale graph which does not fit in memory, we must iteratively load small parts of the graph. In the existing literature, achieving the ideal cost has been considered to be impossible for billion-scale graphs due to the memory size constraint. In this paper, we propose an overlapped and parallel disk-based triangulation framework for billion-scale graphs, OPT, which achieves the ideal cost by (1) full overlap of the CPU and I/O operations and (2) full parallelism of multi-core CPU and FlashSSD I/O. In OPT, triangles in memory are called the internal triangles while triangles constituting vertices in memory and vertices in external memory are called the external triangles. At the macro level, OPT overlaps the internal triangulation and the external triangulation, while it overlaps the CPU and I/O operations at the micro level. Thereby, the cost of OPT is close to the ideal cost. Moreover, OPT instantiates both vertex-iterator and edge-iterator models and benefits from multi-thread parallelism on both types of triangulation. Extensive experiments conducted on large-scale datasets showed that (1) OPT achieved the elapsed time close to that of the ideal method with less than 7% of overhead under the limited memory budget, (2) OPT achieved linear speed-up with an increasing number of CPU cores, (3) OPT outperforms the state-of-the-art parallel method by up to an order of magnitude with 6 CPU cores, and (4) for the first time in the literature, the triangulation results are reported for a billion-vertex scale real-world graph.

PLOS ONE | 2016

Personality Factors Predicting Smartphone Addiction Predisposition: Behavioral Inhibition and Activation Systems, Impulsivity, and Self-Control.

Yejin Kim; Jo-Eun Jeong; Hyun Cho; Dong-Jin Jung; Minjung Kwak; Mi Jung Rho; Hwanjo Yu; Dai-Jin Kim; In Young Choi

The purpose of this study was to identify personality factor-associated predictors of smartphone addiction predisposition (SAP). Participants were 2,573 men and 2,281 women (n = 4,854) aged 20–49 years (Mean ± SD: 33.47 ± 7.52); participants completed the following questionnaires: the Korean Smartphone Addiction Proneness Scale (K-SAPS) for adults, the Behavioral Inhibition System/Behavioral Activation System questionnaire (BIS/BAS), the Dickman Dysfunctional Impulsivity Instrument (DDII), and the Brief Self-Control Scale (BSCS). In addition, participants reported their demographic information and smartphone usage pattern (weekday or weekend average usage hours and main use). We analyzed the data in three steps: (1) identifying predictors with logistic regression, (2) deriving causal relationships between SAP and its predictors using a Bayesian belief network (BN), and (3) computing optimal cut-off points for the identified predictors using the Youden index. Identified predictors of SAP were as follows: gender (female), weekend average usage hours, and scores on BAS-Drive, BAS-Reward Responsiveness, DDII, and BSCS. Female gender and scores on BAS-Drive and BSCS directly increased SAP. BAS-Reward Responsiveness and DDII indirectly increased SAP. We found that SAP was defined with maximal sensitivity as follows: weekend average usage hours > 4.45, BAS-Drive > 10.0, BAS-Reward Responsiveness > 13.8, DDII > 4.5, and BSCS > 37.4. This study raises the possibility that personality factors contribute to SAP. And, we calculated cut-off points for key predictors. These findings may assist clinicians screening for SAP using cut-off points, and further the understanding of SA risk factors.

Information Sciences | 2011

Hessian matrix distribution for Bayesian policy gradient reinforcement learning

Ngo Anh Vien; Hwanjo Yu; TaeChoong Chung

Bayesian policy gradient algorithms have been recently proposed for modeling the policy gradient of the performance measure in reinforcement learning as a Gaussian process. These methods were known to reduce the variance and the number of samples needed to obtain accurate gradient estimates in comparison to the conventional Monte-Carlo policy gradient algorithms. In this paper, we propose an improvement over previous Bayesian frameworks for the policy gradient. We use the Hessian matrix distribution as a learning rate schedule to improve the performance of the Bayesian policy gradient algorithm in terms of the variance and the number of samples. As in computing the policy gradient distributions, the Bayesian quadrature method is used to estimate the Hessian matrix distributions. We prove that the posterior mean of the Hessian distribution estimate is symmetric, one of the important properties of the Hessian matrix. Moreover, we prove that with an appropriate choice of kernel, the computational complexity of Hessian distribution estimate is equal to that of the policy gradient distribution estimates. Using simulations, we show encouraging experimental results comparing the proposed algorithm to the Bayesian policy gradient and the Bayesian policy natural gradient algorithms described in Ghavamzadeh and Engel [10].

BMC Bioinformatics | 2011

Combining active learning and semi-supervised learning techniques to extract protein interaction sentences

Min Song; Hwanjo Yu; Wook-Shin Han

BackgroundProtein-protein interaction (PPI) extraction has been a focal point of many biomedical research and database curation tools. Both Active Learning and Semi-supervised SVMs have recently been applied to extract PPI automatically. In this paper, we explore combining the AL with the SSL to improve the performance of the PPI task.MethodsWe propose a novel PPI extraction technique called PPISpotter by combining Deterministic Annealing-based SSL and an AL technique to extract protein-protein interaction. In addition, we extract a comprehensive set of features from MEDLINE records by Natural Language Processing (NLP) techniques, which further improve the SVM classifiers. In our feature selection technique, syntactic, semantic, and lexical properties of text are incorporated into feature selection that boosts the system performance significantly.ResultsBy conducting experiments with three different PPI corpuses, we show that PPISpotter is superior to the other techniques incorporated into semi-supervised SVMs such as Random Sampling, Clustering, and Transductive SVMs by precision, recall, and F-measure.ConclusionsOur system is a novel, state-of-the-art technique for efficiently extracting protein-protein interaction pairs.

knowledge discovery and data mining | 2015

Fast and Robust Parallel SGD Matrix Factorization

Jinoh Oh; Wook-Shin Han; Hwanjo Yu; Xiaoqian Jiang

Matrix factorization is one of the fundamental techniques for analyzing latent relationship between two entities. Especially, it is used for recommendation for its high accuracy. Efficient parallel SGD matrix factorization algorithms have been developed for large matrices to speed up the convergence of factorization. However, most of them are designed for a shared-memory environment thus fail to factorize a large matrix that is too big to fit in memory, and their performances are also unreliable when the matrix is skewed. This paper proposes a fast and robust parallel SGD matrix factorization algorithm, called MLGF-MF, which is robust to skewed matrices and runs efficiently on block-storage devices (e.g., SSD disks) as well as shared-memory. MLGF-MF uses Multi-Level Grid File (MLGF) for partitioning the matrix and minimizes the cost for scheduling parallel SGD updates on the partitioned regions by exploiting partial match queries processing}. Thereby, MLGF-MF produces reliable results efficiently even on skewed matrices. MLGF-MF is designed with asynchronous I/O permeated in the algorithm such that CPU keeps executing without waiting for I/O to complete. Thereby, MLGF-MF overlaps the CPU and I/O processing, which eventually offsets the I/O cost and maximizes the CPU utility. Recent flash SSD disks support high performance parallel I/O, thus are appropriate for executing the asynchronous I/O. From our extensive evaluations, MLGF-MF significantly outperforms (or converges faster than) the state-of-the-art algorithms in both shared-memory and block-storage environments. In addition, the outputs of MLGF-MF is significantly more robust to skewed matrices. Our implementation of MLGF-MF is available at http://dm.postech.ac.kr/MLGF-MF as executable files.

international conference on data mining | 2012

CT-IC: Continuously Activated and Time-Restricted Independent Cascade Model for Viral Marketing

Wonyeol Lee; Jinha Kim; Hwanjo Yu

Influence maximization problem with applications to viral marketing has gained much attention. Underlying influence diffusion models affect influence maximizing nodes because they focus on difference aspect of influence diffusion. Nevertheless, existing diffusion models overlook two important aspects of real-world marketing - continuous trials and time restriction. This paper proposes a new realistic influence diffusion model called Continously activated and Time-restricted IC (CT-IC) model which generalizes the IC model by embedding the above two aspects. We first prove that CT-IC model satisfies two crucial properties - monotonicity and submodularity. We then provide an efficient method for calculating exact influence spread when a social network is restricted to a directed tree and a simple path. Finally, we propose a scalable algorithm for influence maximization under CT-IC model called CT-IPA. Our experiments show that CT-IC model provides seeds of higher influence spread than IC model and CT-IPA is four orders of magnitude faster than the greedy algorithm while providing similar influence spread to the greedy algorithm.

Explore More