Yongsub Lim | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yongsub Lim is active.

Explore More

Publication

Featured researches published by Yongsub Lim.

knowledge discovery and data mining | 2015

MASCOT: Memory-efficient and Accurate Sampling for Counting Local Triangles in Graph Streams

Yongsub Lim; U Kang

How can we estimate local triangle counts accurately in a graph stream without storing the whole graph? The local triangle counting which counts triangles for each node in a graph is a very important problem with wide applications in social network analysis, anomaly detection, web mining, etc. In this paper, we propose MASCOT, a memory-efficient and accurate method for local triangle estimation in a graph stream based on edge sampling. To develop MASCOT, we first present two naive local triangle counting algorithms in a graph stream: MASCOT-C and MASCOT-A. MASCOT-C is based on constant edge sampling, and MASCOT-A improves its accuracy by utilizing more memory spaces. MASCOT achieves both accuracy and memory-efficiency of the two algorithms by an unconditional triangle counting for a new edge, regardless of whether it is sampled or not. In contrast to the existing algorithm which requires prior knowledge on the target graph and appropriately set parameters, MASCOT requires only one simple parameter, the edge sampling probability. Through extensive experiments, we show that for the same number of edges sampled, MASCOT provides the best accuracy compared to the existing algorithm as well as MASCOT-C and MASCOT-A. Thanks to MASCOT, we also discover interesting anomalous patterns in real graphs, like core-peripheries in the web and ambiguous author names in DBLP.

european conference on computer vision | 2010

Energy minimization under constraints on label counts

Yongsub Lim; Kyomin Jung; Pushmeet Kohli

Many computer vision problems such as object segmentation or reconstruction can be formulated in terms of labeling a set of pixels or voxels. In certain scenarios, we may know the number of pixels or voxels which can be assigned to a particular label. For instance, in the reconstruction problem, we may know size of the object to be reconstructed. Such label count constraints are extremely powerful and have recently been shown to result in good solutions for many vision problems. Traditional energy minimization algorithms used in vision cannot handle label count constraints. This paper proposes a novel algorithm for minimizing energy functions under constraints on the number of variables which can be assigned to a particular label. Our algorithm is deterministic in nature and outputs e-approximate solutions for all possible counts of labels. We also develop a variant of the above algorithm which is much faster, produces solutions under almost all label count constraints, and can be applied to all submodular quadratic pseudoboolean functions. We evaluate the algorithm on the two-label (foreground/background) image segmentation problem and compare its performance with the state-of-the-art parametric maximum flow and max-sum diffusion based algorithms. Experimental results show that our method is practical and is able to generate impressive segmentation results in reasonable time.

conference on information and knowledge management | 2014

Fast, Accurate, and Space-efficient Tracking of Time-weighted Frequent Items from Data Streams

Yongsub Lim; Jihoon Choi; U Kang

How can we discover interesting patterns from time-evolving high speed data streams? How to analyze the data streams quickly and accurately, with little space overhead? High speed data stream has been receiving increasing attentions due to its wide applications such as sensors, network traffic, social networks, etc. One of the most fundamental tasks in the data stream is to find frequent items; especially, finding recently frequent items has become important in real world applications. In this paper, we propose TwMinSwap, a fast, accurate, and space-efficient method for tracking recent frequent items. TwMinSwap is a deterministic version of our motivating algorithm TwSample which is a sampling based randomized algorithm with nice theoretical guarantees. TwMinSwap improves TwSample in terms of speed, accuracy, and memory usage. Both require only O(k) memory spaces, and do not require any prior knowledge on the stream such as its length and the number of distinct items in the stream. Through extensive experiments, we demonstrate that TwMinSwap outperforms all competitors in terms of accuracy and memory usage, with fast running time. Thanks to TwMinSwap, we report interesting discoveries in real world data streams, including the difference of trends between the winner and the loser of U.S. presidential candidates, and doubly-active patterns of movies.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2014

Efficient Energy Minimization for Enforcing Label Statistics

Yongsub Lim; Kyomin Jung; Pushmeet Kohli

Energy minimization algorithms, such as graph cuts, enable the computation of the MAP solution under certain probabilistic models such as Markov random fields. However, for many computer vision problems, the MAP solution under the model is not the ground truth solution. In many problem scenarios, the system has access to certain statistics of the ground truth. For instance, in image segmentation, the area and boundary length of the object may be known. In these cases, we want to estimate the most probable solution that is consistent with such statistics, i.e., satisfies certain equality or inequality constraints. The above constrained energy minimization problem is NP-hard in general, and is usually solved using Linear Programming formulations, which relax the integrality constraints. This paper proposes a novel method that directly finds the discrete approximate solution of such problems by maximizing the corresponding Lagrangian dual. This method can be applied to any constrained energy minimization problem whose unconstrained version is polynomial time solvable, and can handle multiple, equality or inequality, and linear or non-linear constraints. One important advantage of our method is the ability to handle second order constraints with both-side inequalities with a weak restriction, not trivial in the relaxation based methods, and show that the restriction does not affect the accuracy in our cases.We demonstrate the efficacy of our method on the foreground/background image segmentation problem, and show that it produces impressive segmentation results with less error, and runs more than 20 times faster than the state-of-the-art LP relaxation based approaches.

international conference on big data and smart computing | 2015

Discovering large subsets with high quality partitions in real world graphs

Yongsub Lim; Won-Jo Lee; Ho-Jin Choi; U Kang

Given a real world graph, how can we find a large subgraph whose partition quality is much better than the original? Graph partitioning has received great attentions in graph mining, and especially balanced graph partitioning is required in many real world applications. However, the balanced graph partitioning is known to be NP-hard, and moreover it is known that there is no good cut at a large scale for real graphs. Due to this difficulty, in this paper, we propose a new paradigm for graph partitioning. Instead of dealing with the whole graph, our focus is on finding a large subgraph with high quality partitions, in terms of conductance. We show that removing problematic nodes, i.e. large degree nodes called hub nodes in real graphs, remarkably decreases conductance for the remaining giant connected component (GCC), while the number of nodes in the GCC is still significant. In experiments, we demonstrate that our method finds a subgraph of quite a large size with low conductance graph partitions, compared with competing methods. We also show that the competitors cannot find connected subgraphs while our method does, by construction. This improvement in partition quality for the subgraph is especially noticeable for large scale cuts - for a balanced partition, down to 14% of the original conductance with GCC size 70% of the total. As a result, the found subgraph has clear partitions at almost all scales compared with the original, and this result especially helps find communities which are well-formed, but hidden by hubs at various scales in real world graphs like social networks.

ACM Transactions on Knowledge Discovery From Data | 2018

Memory-Efficient and Accurate Sampling for Counting Local Triangles in Graph Streams: From Simple to Multigraphs

Yongsub Lim; Minsoo Jung; U Kang

How can we estimate local triangle counts accurately in a graph stream without storing the whole graph? How to handle duplicated edges in local triangle counting for graph stream? Local triangle counting, which computes the number of triangles attached to each node in a graph, is a very important problem with wide applications in social network analysis, anomaly detection, web mining, and the like. In this article, we propose algorithms for local triangle counting in a graph stream based on edge sampling: Mascot for a simple graph, and MultiBMascot and MultiWMascot for a multigraph. To develop Mascot, we first present two naive local triangle counting algorithms in a graph stream, called Mascot-C and Mascot-A. Mascot-C is based on constant edge sampling, and Mascot-A improves its accuracy by utilizing more memory spaces. Mascot achieves both accuracy and memory-efficiency of the two algorithms by unconditional triangle counting for a new edge, regardless of whether it is sampled or not. Extending the idea to a multigraph, we develop two algorithms MultiBMascot and MultiWMascot. MultiBMascot enables local triangle counting on the corresponding simple graph of a streamed multigraph without explicit graph conversion; MultiWMascot considers repeated occurrences of an edge as its weight and counts each triangle as the product of its three edge weights. In contrast to the existing algorithm that requires prior knowledge on the target graph and appropriately set parameters, our proposed algorithms require only one parameter of edge sampling probability. Through extensive experiments, we show that for the same number of edges sampled, Mascot provides the best accuracy compared to the existing algorithm as well as Mascot-C and Mascot-A. We also demonstrate that MultiBMascot on a multigraph is comparable to Mascot-C on the counterpart simple graph, and MultiWMascot becomes more accurate for higher degree nodes. Thanks to Mascot, we also discover interesting anomalous patterns in real graphs, including core-peripheries in the web, a bimodal call pattern in a phone call history, and intensive collaboration in DBLP.

World Wide Web | 2017

MTP: discovering high quality partitions in real world graphs

Yongsub Lim; Won-Jo Lee; Ho-Jin Choi; U Kang

Given a real world graph, how can we find a large subgraph whose partition quality is much better than the original? How can we use a partition of that subgraph to discover a high quality global partition? Although graph partitioning especially with balanced sizes has received attentions in various applications, it is known NP-hard, and also known that there is no good cut at a large scale for real graphs. In this paper, we propose a novel approach for graph partitioning. Our first focus is on finding a large subgraph with high quality partitions, in terms of conductance. Despite the difficulty of the task for the whole graph, we observe that there is a large connected subgraph whose partition quality is much better than the original. Our proposed method MTP finds such a subgraph by removing “hub” nodes with large degrees, and taking the remaining giant connected component. Further, we extend MTP to gbMTP (Global Balanced MTP) for discovering a global balanced partition. gbMTP attaches the excluded nodes in MTP to the partition found by MTP in a greedy way. In experiments, we demonstrate that MTP finds a subgraph of a large size with low conductance graph partitions, compared with competing methods. We also show that the competitors cannot find connected subgraphs while our method does, by construction. This improvement in partition quality for the subgraph is especially noticeable for large scale cuts—for a balanced partition, down to 14 % of the original conductance with the subgraph size 70 % of the total. As a result, the found subgraph has clear partitions at almost all scales compared with the original. Moreover, gbMTP generally discovers global balanced partitions whose conductance are lower than those found by METIS, the state-of-the-art graph partitioning method.

PLOS ONE | 2017

Hierarchical ordering with partial pairwise hierarchical relationships on the macaque brain data sets

Woosang Lim; Jungsoo Lee; Yongsub Lim; Doo-Hwan Bae; Haesun Park; Dae-Shik Kim; Kyomin Jung

Hierarchical organizations of information processing in the brain networks have been known to exist and widely studied. To find proper hierarchical structures in the macaque brain, the traditional methods need the entire pairwise hierarchical relationships between cortical areas. In this paper, we present a new method that discovers hierarchical structures of macaque brain networks by using partial information of pairwise hierarchical relationships. Our method uses a graph-based manifold learning to exploit inherent relationship, and computes pseudo distances of hierarchical levels for every pair of cortical areas. Then, we compute hierarchy levels of all cortical areas by minimizing the sum of squared hierarchical distance errors with the hierarchical information of few cortical areas. We evaluate our method on the macaque brain data sets whose true hierarchical levels are known as the FV91 model. The experimental results show that hierarchy levels computed by our method are similar to the FV91 model, and its errors are much smaller than the errors of hierarchical clustering approaches.

Knowledge and Information Systems | 2017

Time-weighted counting for recently frequent pattern mining in data streams

Yongsub Lim; U Kang

How can we discover interesting patterns from time-evolving high-speed data streams? How to analyze the data streams quickly and accurately, with little space overhead? How to guarantee the found patterns to be self-consistent? High-speed data stream has been receiving increasing attention due to its wide applications such as sensors, network traffic, social networks, etc. The most fundamental task on the data stream is frequent pattern mining; especially, focusing on recentness is important in real applications. In this paper, we develop two algorithms for discovering recently frequent patterns in data streams. First, we propose TwMinSwap to find top-k recently frequent items in data streams, which is a deterministic version of our motivating algorithm TwSample providing theoretical guarantees based on item sampling. TwMinSwap improves TwSample in terms of speed, accuracy, and memory usage. Both require only O(k) memory spaces and do not require any prior knowledge on the stream such as its length and the number of distinct items in the stream. Second, we propose TwMinSwap-Is to find top-k recently frequent itemsets in data streams. We especially focus on keeping self-consistency of the discovered itemsets, which is the most important property for reliable results, while using O(k) memory space with the assumption of a constant itemset size. Through extensive experiments, we demonstrate that TwMinSwap outperforms all competitors in terms of accuracy and memory usage, with fast running time. We also show that TwMinSwap-Is is more accurate than the competitor and discovers recently frequent itemsets with reasonably large sizes (at most 5–7) depending on datasets. Thanks to TwMinSwap and TwMinSwap-Is, we report interesting discoveries in real world data streams, including the difference of trends between the winner and the loser of U.S. presidential candidates, and temporal human contact patterns.

international conference on intelligent systems, modelling and simulation | 2013

Decentralized Control for Intelligent Robot System to Avoid Moving Obstacles

Yongsub Lim; Kyomin Jung

We propose a novel control law to deploy a decentralized multiagent mobile intelligent robot system in a dynamic environment containing moving obstacles. Since a collision to an obstacle causes failure of the robot and may break the whole system, obstacle avoidance is one of the fundamental task for the mobile robot system. Potential based approaches have been widely applied to deploying a mobile robot system for various goals, including obstacle avoidance, coverage maximization and target tracking. However, previous methods consider only a snapshot at each time to decide the dynamics of a robot. Therefore they have limitations in dynamic environments, possibly leading to frequent collisions to obstacles. In contrast, we show that by considering the velocities of the obstacles, the number of collision can be significantly reduced. In our method each robot records a few past histories of sensed obstacles, and based on the record the robot predicts a future trace of the obstacle. We introduce a new dynamics for robots by combining the current state information with the prediction. By extensive experiments, we compare our method with others and show that by our method robots rarely collide with moving obstacles while keeping the potential low.

Explore More