Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kijung Shin is active.

Publication


Featured researches published by Kijung Shin.


international conference on data mining | 2014

Distributed Methods for High-Dimensional and Large-Scale Tensor Factorization

Kijung Shin; U Kang

Given a high-dimensional and large-scale tensor, how can we decompose it into latent factors? Can we process it on commodity computers with limited memory? These questions are closely related to recommendation systems exploiting context information such as time and location. They require tensor factorization methods scalable with both the dimension and size of a tensor. In this paper, we propose two distributed tensor factorization methods, SALS and CDTF. Both methods are scalable with all aspects of data, and they show an interesting trade-off between convergence speed and memory requirements. SALS updates a subset of the columns of a factor matrix at a time, and CDTF, a special case of SALS, updates one column at a time. On our experiment, only our methods factorize a 5-dimensional tensor with 1B observable entries, 10M mode length, and 1K rank, while all other state-of-the-art methods fail. Moreover, our methods require several orders of magnitude less memory than the competitors. We implement our methods on MapReduce with two widely applicable optimization techniques: local disk caching and greedy row assignment.


international conference on management of data | 2015

BEAR: Block Elimination Approach for Random Walk with Restart on Large Graphs

Kijung Shin; Jinhong Jung; Sael Lee; U Kang

Given a large graph, how can we calculate the relevance between nodes fast and accurately? Random walk with restart (RWR) provides a good measure for this purpose and has been applied to diverse data mining applications including ranking, community detection, link prediction, and anomaly detection. Since calculating RWR from scratch takes long, various preprocessing methods, most of which are related to inverting adjacency matrices, have been proposed to speed up the calculation. However, these methods do not scale to large graphs because they usually produce large and dense matrices which do not fit into memory. In this paper, we propose BEAR, a fast, scalable, and accurate method for computing RWR on large graphs. BEAR comprises the preprocessing step and the query step. In the preprocessing step, BEAR reorders the adjacency matrix of a given graph so that it contains a large and easy-to-invert submatrix, and precomputes several matrices including the Schur complement of the submatrix. In the query step, BEAR computes the RWR scores for a given query node quickly using a block elimination approach with the matrices computed in the preprocessing step. Through extensive experiments, we show that BEAR significantly outperforms other state-of-the-art methods in terms of preprocessing and query speed, space efficiency, and accuracy.


IEEE Transactions on Knowledge and Data Engineering | 2017

Fully Scalable Methods for Distributed Tensor Factorization

Kijung Shin; Lee Sael; U Kang

Given a high-order large-scale tensor, how can we decompose it into latent factors? Can we process it on commodity computers with limited memory? These questions are closely related to recommender systems, which have modeled rating data not as a matrix but as a tensor to utilize contextual information such as time and location. This increase in the order requires tensor-factorization methods scalable with both the order and size of a tensor. In this paper, we propose two distributed tensor factorization methods, CDTF and SALS. Both methods are scalable with all aspects of data and show a trade-off between convergence speed and memory requirements. CDTF, based on coordinate descent, updates one parameter at a time, while SALS generalizes on the number of parameters updated at a time. In our experiments, only our methods factorized a five-order tensor with 1 billion observable entries, 10 M mode length, and 1 K rank, while all other state-of-the-art methods failed. Moreover, our methods required several orders of magnitude less memory than their competitors. We implemented our methods on MAPREDUCE with two widely-applicable optimization techniques: local disk caching and greedy row assignment. They speeded up our methods up to 98.2× and also the competitors up to 5.9×.


international conference on data mining | 2016

CoreScope: Graph Mining Using k-Core Analysis — Patterns, Anomalies and Algorithms

Kijung Shin; Tina Eliassi-Rad; Christos Faloutsos

How do the k-core structures of real-world graphs look like? What are the common patterns and the anomalies? How can we use them for algorithm design and applications? A k-core is the maximal subgraph where all vertices have degree at least k. This concept has been applied to such diverse areas as hierarchical structure analysis, graph visualization, and graph clustering. Here, we explore pervasive patterns that are related to k-cores and emerging in graphs from several diverse domains. Our discoveries are as follows: (1) Mirror Pattern: coreness of vertices (i.e., maximum k such that each vertex belongs to the k-core) is strongly correlated to their degree. (2) Core-Triangle Pattern: degeneracy of a graph (i.e., maximum k such that the k-core exists in the graph) obeys a 3-to-1 power law with respect to the count of triangles. (3) Structured Core Pattern: degeneracy-cores are not cliques but have non-trivial structures such as core-periphery and communities. Our algorithmic contributions show the usefulness of these patterns. (1) Core-A, which measures the deviation from Mirror Pattern, successfully finds anomalies in real-world graphs complementing densest-subgraph based anomaly detection methods. (2) Core-D, a single-pass streaming algorithm based on Core-Triangle Pattern, accurately estimates the degeneracy of billion-scale graphs up to 7× faster than a recent multipass algorithm.(3) Core-S, inspired by Structured Core Pattern, identifies influential spreaders up to 17× faster than top competitors with comparable accuracy.


international conference on management of data | 2016

Random Walk with Restart on Large Graphs Using Block Elimination

Jinhong Jung; Kijung Shin; Lee Sael; U Kang

Given a large graph, how can we calculate the relevance between nodes fast and accurately? Random walk with restart (RWR) provides a good measure for this purpose and has been applied to diverse data mining applications including ranking, community detection, link prediction, and anomaly detection. Since calculating RWR from scratch takes a long time, various preprocessing methods, most of which are related to inverting adjacency matrices, have been proposed to speed up the calculation. However, these methods do not scale to large graphs because they usually produce large dense matrices that do not fit into memory. In addition, the existing methods are inappropriate when graphs dynamically change because the expensive preprocessing task needs to be computed repeatedly. In this article, we propose Bear, a fast, scalable, and accurate method for computing RWR on large graphs. Bear has two versions: a preprocessing method BearS for static graphs and an incremental update method BearD for dynamic graphs. BearS consists of the preprocessing step and the query step. In the preprocessing step, BearS reorders the adjacency matrix of a given graph so that it contains a large and easy-to-invert submatrix, and precomputes several matrices including the Schur complement of the submatrix. In the query step, BearS quickly computes the RWR scores for a given query node using a block elimination approach with the matrices computed in the preprocessing step. For dynamic graphs, BearD efficiently updates the changed parts in the preprocessed matrices of BearS based on the observation that only small parts of the preprocessed matrices change when few edges are inserted or deleted. Through extensive experiments, we show that BearS significantly outperforms other state-of-the-art methods in terms of preprocessing and query speed, space efficiency, and accuracy. We also show that BearD quickly updates the preprocessed matrices and immediately computes queries when the graph changes.


european conference on machine learning | 2016

M-Zoom: Fast Dense-Block Detection in Tensors with Quality Guarantees

Kijung Shin; Bryan Hooi; Christos Faloutsos

Given a large-scale and high-order tensor, how can we find dense blocks in it? Can we find them in near-linear time but with a quality guarantee? Extensive previous work has shown that dense blocks in tensors as well as graphs indicate anomalous or fraudulent behavior e.g., lockstep behavior in social networks. However, available methods for detecting such dense blocks are not satisfactory in terms of speed, accuracy, or flexibility. In this work, we propose M-Zoom, a flexible framework for finding dense blocks in tensors, which works with a broad class of density measures. M-Zoom has the following properties: 1 Scalable: M-Zoom scales linearly with all aspects of tensors and is upi¾źto 114


web search and data mining | 2017

D-Cube: Dense-Block Detection in Terabyte-Scale Tensors

Kijung Shin; Bryan Hooi; Jisu Kim; Christos Faloutsos


international world wide web conferences | 2016

Incorporating Side Information in Tensor Completion

Hemank Lamba; Vaishnavh Nagarajan; Kijung Shin; Naji Shajarisales

\times


Knowledge and Information Systems | 2018

Patterns and anomalies in k -cores of real-world graphs with applications

Kijung Shin; Tina Eliassi-Rad; Christos Faloutsos


conference on information and knowledge management | 2014

Data/Feature Distributed Stochastic Coordinate Descent for Logistic Regression

Dongyeop Kang; Woosang Lim; Kijung Shin; Lee Sael; U Kang

faster than state-of-the-art methods with similar accuracy. 2 Provably accurate: M-Zoom provides a guarantee on the lowest density of the blocks it finds. 3 Flexible: M-Zoom supports multi-block detection and size bounds as well as diverse density measures. 4 Effective: M-Zoom successfully detected edit wars and bot activities in Wikipedia, and spotted network attacks from a TCP dump with near-perfect accuracy AUCi¾ź=i¾ź0.98. The data and software related to this paper are available at http://www.cs.cmu.edu/~kijungs/codes/mzoom/.

Collaboration


Dive into the Kijung Shin's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bryan Hooi

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

U Kang

Seoul National University

View shared research outputs
Top Co-Authors

Avatar

Euiwoong Lee

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Lee Sael

State University of New York System

View shared research outputs
Top Co-Authors

Avatar

Jinoh Oh

Pohang University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Alex Beutel

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Hemank Lamba

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Hyun Ah Song

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Jisu Kim

Carnegie Mellon University

View shared research outputs
Researchain Logo
Decentralizing Knowledge