Kijung Shin
Carnegie Mellon University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kijung Shin.
international conference on data mining | 2014
Kijung Shin; U Kang
Given a high-dimensional and large-scale tensor, how can we decompose it into latent factors? Can we process it on commodity computers with limited memory? These questions are closely related to recommendation systems exploiting context information such as time and location. They require tensor factorization methods scalable with both the dimension and size of a tensor. In this paper, we propose two distributed tensor factorization methods, SALS and CDTF. Both methods are scalable with all aspects of data, and they show an interesting trade-off between convergence speed and memory requirements. SALS updates a subset of the columns of a factor matrix at a time, and CDTF, a special case of SALS, updates one column at a time. On our experiment, only our methods factorize a 5-dimensional tensor with 1B observable entries, 10M mode length, and 1K rank, while all other state-of-the-art methods fail. Moreover, our methods require several orders of magnitude less memory than the competitors. We implement our methods on MapReduce with two widely applicable optimization techniques: local disk caching and greedy row assignment.
international conference on management of data | 2015
Kijung Shin; Jinhong Jung; Sael Lee; U Kang
Given a large graph, how can we calculate the relevance between nodes fast and accurately? Random walk with restart (RWR) provides a good measure for this purpose and has been applied to diverse data mining applications including ranking, community detection, link prediction, and anomaly detection. Since calculating RWR from scratch takes long, various preprocessing methods, most of which are related to inverting adjacency matrices, have been proposed to speed up the calculation. However, these methods do not scale to large graphs because they usually produce large and dense matrices which do not fit into memory. In this paper, we propose BEAR, a fast, scalable, and accurate method for computing RWR on large graphs. BEAR comprises the preprocessing step and the query step. In the preprocessing step, BEAR reorders the adjacency matrix of a given graph so that it contains a large and easy-to-invert submatrix, and precomputes several matrices including the Schur complement of the submatrix. In the query step, BEAR computes the RWR scores for a given query node quickly using a block elimination approach with the matrices computed in the preprocessing step. Through extensive experiments, we show that BEAR significantly outperforms other state-of-the-art methods in terms of preprocessing and query speed, space efficiency, and accuracy.
IEEE Transactions on Knowledge and Data Engineering | 2017
Kijung Shin; Lee Sael; U Kang
Given a high-order large-scale tensor, how can we decompose it into latent factors? Can we process it on commodity computers with limited memory? These questions are closely related to recommender systems, which have modeled rating data not as a matrix but as a tensor to utilize contextual information such as time and location. This increase in the order requires tensor-factorization methods scalable with both the order and size of a tensor. In this paper, we propose two distributed tensor factorization methods, CDTF and SALS. Both methods are scalable with all aspects of data and show a trade-off between convergence speed and memory requirements. CDTF, based on coordinate descent, updates one parameter at a time, while SALS generalizes on the number of parameters updated at a time. In our experiments, only our methods factorized a five-order tensor with 1 billion observable entries, 10 M mode length, and 1 K rank, while all other state-of-the-art methods failed. Moreover, our methods required several orders of magnitude less memory than their competitors. We implemented our methods on MAPREDUCE with two widely-applicable optimization techniques: local disk caching and greedy row assignment. They speeded up our methods up to 98.2× and also the competitors up to 5.9×.
international conference on data mining | 2016
Kijung Shin; Tina Eliassi-Rad; Christos Faloutsos
How do the k-core structures of real-world graphs look like? What are the common patterns and the anomalies? How can we use them for algorithm design and applications? A k-core is the maximal subgraph where all vertices have degree at least k. This concept has been applied to such diverse areas as hierarchical structure analysis, graph visualization, and graph clustering. Here, we explore pervasive patterns that are related to k-cores and emerging in graphs from several diverse domains. Our discoveries are as follows: (1) Mirror Pattern: coreness of vertices (i.e., maximum k such that each vertex belongs to the k-core) is strongly correlated to their degree. (2) Core-Triangle Pattern: degeneracy of a graph (i.e., maximum k such that the k-core exists in the graph) obeys a 3-to-1 power law with respect to the count of triangles. (3) Structured Core Pattern: degeneracy-cores are not cliques but have non-trivial structures such as core-periphery and communities. Our algorithmic contributions show the usefulness of these patterns. (1) Core-A, which measures the deviation from Mirror Pattern, successfully finds anomalies in real-world graphs complementing densest-subgraph based anomaly detection methods. (2) Core-D, a single-pass streaming algorithm based on Core-Triangle Pattern, accurately estimates the degeneracy of billion-scale graphs up to 7× faster than a recent multipass algorithm.(3) Core-S, inspired by Structured Core Pattern, identifies influential spreaders up to 17× faster than top competitors with comparable accuracy.
international conference on management of data | 2016
Jinhong Jung; Kijung Shin; Lee Sael; U Kang
Given a large graph, how can we calculate the relevance between nodes fast and accurately? Random walk with restart (RWR) provides a good measure for this purpose and has been applied to diverse data mining applications including ranking, community detection, link prediction, and anomaly detection. Since calculating RWR from scratch takes a long time, various preprocessing methods, most of which are related to inverting adjacency matrices, have been proposed to speed up the calculation. However, these methods do not scale to large graphs because they usually produce large dense matrices that do not fit into memory. In addition, the existing methods are inappropriate when graphs dynamically change because the expensive preprocessing task needs to be computed repeatedly. In this article, we propose Bear, a fast, scalable, and accurate method for computing RWR on large graphs. Bear has two versions: a preprocessing method BearS for static graphs and an incremental update method BearD for dynamic graphs. BearS consists of the preprocessing step and the query step. In the preprocessing step, BearS reorders the adjacency matrix of a given graph so that it contains a large and easy-to-invert submatrix, and precomputes several matrices including the Schur complement of the submatrix. In the query step, BearS quickly computes the RWR scores for a given query node using a block elimination approach with the matrices computed in the preprocessing step. For dynamic graphs, BearD efficiently updates the changed parts in the preprocessed matrices of BearS based on the observation that only small parts of the preprocessed matrices change when few edges are inserted or deleted. Through extensive experiments, we show that BearS significantly outperforms other state-of-the-art methods in terms of preprocessing and query speed, space efficiency, and accuracy. We also show that BearD quickly updates the preprocessed matrices and immediately computes queries when the graph changes.
european conference on machine learning | 2016
Kijung Shin; Bryan Hooi; Christos Faloutsos
Given a large-scale and high-order tensor, how can we find dense blocks in it? Can we find them in near-linear time but with a quality guarantee? Extensive previous work has shown that dense blocks in tensors as well as graphs indicate anomalous or fraudulent behavior e.g., lockstep behavior in social networks. However, available methods for detecting such dense blocks are not satisfactory in terms of speed, accuracy, or flexibility. In this work, we propose M-Zoom, a flexible framework for finding dense blocks in tensors, which works with a broad class of density measures. M-Zoom has the following properties: 1 Scalable: M-Zoom scales linearly with all aspects of tensors and is upi¾źto 114
web search and data mining | 2017
Kijung Shin; Bryan Hooi; Jisu Kim; Christos Faloutsos
international world wide web conferences | 2016
Hemank Lamba; Vaishnavh Nagarajan; Kijung Shin; Naji Shajarisales
\times
Knowledge and Information Systems | 2018
Kijung Shin; Tina Eliassi-Rad; Christos Faloutsos
conference on information and knowledge management | 2014
Dongyeop Kang; Woosang Lim; Kijung Shin; Lee Sael; U Kang
faster than state-of-the-art methods with similar accuracy. 2 Provably accurate: M-Zoom provides a guarantee on the lowest density of the blocks it finds. 3 Flexible: M-Zoom supports multi-block detection and size bounds as well as diverse density measures. 4 Effective: M-Zoom successfully detected edit wars and bot activities in Wikipedia, and spotted network attacks from a TCP dump with near-perfect accuracy AUCi¾ź=i¾ź0.98. The data and software related to this paper are available at http://www.cs.cmu.edu/~kijungs/codes/mzoom/.