Is this you? Create Your Porfile

Sangkyum Kim

University of Illinois at Urbana–Champaign

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sangkyum Kim is active.

Explore More

Publication

Featured researches published by Sangkyum Kim.

international conference on data mining | 2010

Tru-Alarm: Trustworthiness Analysis of Sensor Networks in Cyber-Physical Systems

Lu An Tang; Xiao Yu; Sangkyum Kim; Jiawei Han; Chih-Chieh Hung; Wen-Chih Peng

A Cyber-Physical System (CPS) integrates physical devices (e.g., sensors, cameras) with cyber (or informational)components to form a situation-integrated analytical system that responds intelligently to dynamic changes of the real-world scenarios. One key issue in CPS research is trustworthiness analysis of the observed data: Due to technology limitations and environmental influences, the CPS data are inherently noisy that may trigger many false alarms. It is highly desirable to sift meaningful information from a large volume of noisy data. In this paper, we propose a method called Tru-Alarm which finds out trustworthy alarms and increases the feasibility of CPS. Tru-Alarm estimates the locations of objects causing alarms, constructs an object-alarm graph and carries out trustworthiness inferences based on linked information in the graph. Extensive experiments show that Tru-Alarm filters out noises and false information efficiently and guarantees not missing any meaningful alarms.

Journal of Computer and System Sciences | 2013

Trustworthiness analysis of sensor data in cyber-physical systems

Lu An Tang; Xiao Yu; Sangkyum Kim; Quanquan Gu; Jiawei Han; Alice Leung; Thomas F. La Porta

A Cyber-Physical System (CPS) is an integration of sensor networks with informational devices. CPS can be used for many promising applications, such as traffic observation, battlefield surveillance, and sensor-network-based monitoring. One key issue in CPS research is trustworthiness analysis of sensor data. Due to technology limitations and environmental influences, the sensor data collected by CPS are inherently noisy and may trigger many false alarms. It is highly desirable to sift meaningful information from a large volume of noisy data. In this study, we propose a method called Tru-Alarm, which increases the capability of a CPS to recognize trustworthy alarms. Tru-Alarm estimates the locations of objects causing alarms, constructs an object-alarm graph and carries out trustworthiness inference based on the graph links. The study also reveals that the alarm trustworthiness and sensor reliability could be mutually enhanced. The property is used to help prune the large search space of object-alarm graph, filter out the alarms generated by unreliable sensors and improve the algorithm@?s efficiency. Extensive experiments are conducted on both real and synthetic datasets, and the results show that Tru-Alarm filters out noise and false information efficiently and effectively, while ensuring that no meaningful alarms are missed.

intelligence and security informatics | 2006

Motion-Alert: automatic anomaly detection in massive moving objects

Xiaolei Li; Jiawei Han; Sangkyum Kim

With recent advances in sensory and mobile computing technology, enormous amounts of data about moving objects are being collected. With such data, it becomes possible to automatically identify suspicious behavior in object movements. Anomaly detection in massive sets of moving objects has many important applications, especially in surveillance, law enforcement, and homeland security. Due to the sheer volume of spatiotemporal and non-spatial data (such as weather and object type) associated with moving objects, it is challenging to develop a method that can efficiently and effectively detect anomalies in complex scenarios. The problem is further complicated by the fact that anomalies may occur at various levels of abstraction and be associated with different time and location granularities. In this paper, we analyze the problem of anomaly detection in moving objects and propose an efficient and scalable classification method, Motion-Alert, which proceeds with the following three steps. Object movement features, called motifs, are extracted from the object paths. Each path consists of a sequence of motif expressions, associated with the values related to time and location. To discover anomalies in object movements, motif-based generalization is performed that clusters similar object movement fragments and generalizes the movements based on the associated motifs. With motif-based generalization, objects are put into a multi-level feature space and are classified by a classifier that can handle high-dimensional feature spaces. We implemented the above method as one of the core components in our moving-object anomaly detection system, motion-alert. Our experiments show that the system is more accurate than traditional classification techniques.

international acm sigir conference on research and development in information retrieval | 2011

Authorship classification: a discriminative syntactic tree mining approach

Sangkyum Kim; Hyungsul Kim; Tim Weninger; Jiawei Han; Hyun Duk Kim

In the past, there have been dozens of studies on automatic authorship classification, and many of these studies concluded that the writing style is one of the best indicators for original authorship. From among the hundreds of features which were developed, syntactic features were best able to reflect an authors writing style. However, due to the high computational complexity for extracting and computing syntactic features, only simple variations of basic syntactic features such as function words, POS(Part of Speech) tags, and rewrite rules were considered. In this paper, we propose a new feature set of k-embedded-edge subtree patterns that holds more syntactic information than previous feature sets. We also propose a novel approach to directly mining them from a given set of syntactic trees. We show that this approach reduces the computational burden of using complex syntactic structures as the feature set. Comprehensive experiments on real-world datasets demonstrate that our approach is reliable and more accurate than previous studies.

european conference on machine learning | 2010

NDPMine: efficiently mining discriminative numerical features for pattern-based classification

Hyungsul Kim; Sangkyum Kim; Tim Weninger; Jiawei Han; Tarek F. Abdelzaher

Pattern-based classification has demonstrated its power in recent studies, but because the cost of mining discriminative patterns as features in classification is very expensive, several efficient algorithms have been proposed to rectify this problem. These algorithms assume that feature values of the mined patterns are binary, i.e., a pattern either exists or not. In some problems, however, the number of times a pattern appears is more informative than whether a pattern appears or not. To resolve these deficiencies, we propose a mathematical programming method that directly mines discriminative patterns as numerical features for classification. We also propose a novel search space shrinking technique which addresses the inefficiencies in iterative pattern mining algorithms. Finally, we show that our method is an order of magnitude faster, significantly more memory efficient and more accurate than current approaches.

european conference on machine learning | 2011

Efficient mining of top correlated patterns based on null-invariant measures

Sangkyum Kim; Marina Barsky; Jiawei Han

Mining strong correlations from transactional databases often leads to more meaningful results than mining association rules. In such mining, null (transaction)-invariance is an important property of the correlation measures. Unfortunately, some useful null-invariant measures such as Kulczynski and Cosine, which can discover correlations even for the very unbalanced cases, lack the (anti)-monotonicity property. Thus, they could only be applied to frequent itemsets as the post-evaluation step. For large datasets and for low supports, this approach is computationally prohibitive. This paper presents new properties for all known null-invariant measures. Based on these properties, we develop efficient pruning techniques and design the Apriori-like algorithm NICOMINER for mining strongly correlated patterns directly. We develop both the threshold-bounded and the top-k variations of the algorithm, where top-k is used when the optimal correlation threshold is not known in advance and to give user control over the output size. We test NICOMINER on real-life datasets from different application domains, using Cosine as an example of the null-invariant correlation measure. We show that NICOMINER outperforms support-based approach more than an order of magnitude, and that it is very useful for discovering top correlations in itemsets with low support.

Proceedings of the Tenth International Workshop on Multimedia Data Mining | 2010

DisIClass: discriminative frequent pattern-based image classification

Sangkyum Kim; Xin Jin; Jiawei Han

Owing to the rapid mounting of massive image data, image classification has attracted lots of research efforts. Several diverse research disciplines have been confluent on this important theme, looking for more powerful solutions. In this paper, we propose a novel image representation method B2S (Bag to Set) that keeps all frequency information and is more discriminative than traditional histogram based bag representation. Based on B2S, we construct two different image classification approaches. First, we apply B2S to a state-of-the-art image classification algorithm SPM in computer vision. Second, we design a framework DisIClass (Discriminative Frequent Pattern-Based Image Classification) to utilize data mining algorithms to classify images, which was hardly done before due to the intrinsic differences between the data of computer vision and data mining fields. DisIClass adapts the locality property of image data, and apply sequential covering method to induce the most discriminative feature sets from a closed frequent item set mining method. Our experiments with real image data show the high accuracy and good scalability of both approaches.

International Journal of Distributed Sensor Networks | 2012

Multidimensional Sensor Data Analysis in Cyber-Physical System: An Atypical Cube Approach

Lu An Tang; Xiao Yu; Sangkyum Kim; Jiawei Han; Wen-Chih Peng; Yizhou Sun; Alice Leung; Thomas F. La Porta

Cyber-Physical System (CPS) is an integration of distributed sensor networks with computational devices. CPS claims many promising applications, such as traffic observation, battlefield surveillance, and sensor-network-based monitoring. One important topic in CPS research is about the atypical event analysis, that is, retrieving the events from massive sensor data and analyzing them with spatial, temporal, and other multidimensional information. Many traditional methods are not feasible for such analysis since they cannot describe the complex atypical events. In this paper, we propose a novel model of atypical cluster to effectively represent such events and efficiently retrieve them from massive data. The basic cluster is designed to summarize an individual event, and the macrocluster is used to integrate the information from multiple events. To facilitate scalable, flexible, and online analysis, the atypical cube is constructed, and a guided clustering algorithm is proposed to retrieve significant clusters in an efficient manner. We conduct experiments on real sensor datasets with the size of more than 50 GB; the results show that the proposed method can provide more accurate information with only 15% to 20% time cost of the baselines.

international conference on data engineering | 2012

Multidimensional Analysis of Atypical Events in Cyber-Physical Data

Lu An Tang; Xiao Yu; Sangkyum Kim; Jiawei Han; Wen-Chih Peng; Yizhou Sun; Hector Gonzalez; Sebastian Seith

A Cyber-Physical System (CPS) integrates physical devices (e.g., sensors, cameras) with cyber (or informational) components to form a situation-integrated analytical system that may respond intelligently to dynamic changes of the real-world situations. CPS claims many promising applications, such as traffic observation, battlefield surveillance and sensor-network based monitoring. One important research topic in CPS is about the atypical event analysis, i.e., retrieving the events from large amount of data and analyzing them with spatial, temporal and other multi-dimensional information. Many traditional approaches are not feasible for such analysis since they use numeric measures and cannot describe the complex atypical events. In this study, we propose a new model of atypical cluster to effectively represent those events and efficiently retrieve them from massive data. The micro-cluster is designed to summarize individual events, and the macro-cluster is used to integrate the information from multiple event. To facilitate scalable, flexible and online analysis, the concept of significant cluster is defined and a guided clustering algorithm is proposed to retrieve significant clusters in an efficient manner. We conduct experiments on real datasets with the size of more than 50 GB, the results show that the proposed method can provide more accurate information with only 15% to 20% time cost of the baselines.

knowledge discovery and data mining | 2010

Authorship classification: a syntactic tree mining approach

Sangkyum Kim; Hyungsul Kim; Tim Weninger; Jiawei Han

In the past, there have been dozens of studies on automatic authorship classification, and many of these studies concluded that the writing style is one of the best indicators of original authorship. From among the hundreds of features which were developed, syntactic features were best able to reflect an authors writing style. However, due to the high computational complexity of extracting and computing syntactic features, only simple variations of basic syntactic features of function words and part-of-speech tags were considered. In this paper, we propose a novel approach to mining discriminative k-embedded-edge subtree patterns from a given set of syntactic trees that reduces the computational burden of using complex syntactic structures as a feature set. This method is shown to increase the classification accuracy. We also design a new kernel based on these features. Comprehensive experiments on real datasets of news articles and movie reviews demonstrate that our approach is reliable and more accurate than previous studies.

Explore More