Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Chang-shing Perng is active.

Publication


Featured researches published by Chang-shing Perng.


Ibm Systems Journal | 2002

Discovering actionable patterns in event data

Joseph L. Hellerstein; Sheng Ma; Chang-shing Perng

Applications such as those for systems management and intrusion detection employ an automated real-time operation system in which sensor data are collected and processed in real time. Although such a system effectively reduces the need for operation staff, it requires constructing and maintaining correlation rules. Currently, rule construction requires experts to identify problem patterns, a process that is time-consuming and error-prone. In this paper, we propose reducing this burden by mining historical data that are readily available. Specifically, we first present efficient algorithms to mine three types of important patterns from historical event data: event bursts, periodic patterns, and mutually dependent patterns. We then discuss a framework for efficiently mining events that have multiple attributes. Last, we present Event Correlation Constructor--a tool that validates and extends correlation knowledge.


international conference on data mining | 2011

ASAP: A Self-Adaptive Prediction System for Instant Cloud Resource Demand Provisioning

Yexi Jiang; Chang-shing Perng; Tao Li; Rong N. Chang

The promise of cloud computing is to provide computing resources instantly whenever they are needed. The state-of-art virtual machine (VM) provisioning technology can provision a VM in tens of minutes. This latency is unacceptable for jobs that need to scale out during computation. To truly enable on-the-fly scaling, new VM needs to be ready in seconds upon request. In this paper, We present an online temporal data mining system called ASAP, to model and predict the cloud VM demands. ASAP aims to extract high level characteristics from VM provisioning request stream and notify the provisioning system to prepare VMs in advance. For quantification issue, we propose Cloud Prediction Cost to encodes the cost and constraints of the cloud and guide the training of prediction algorithms. Moreover, we utilize a two-level ensemble method to capture the characteristics of the high transient demands time series. Experimental results using historical data from an IBM cloud in operation demonstrate that ASAP significantly improves the cloud service quality and provides possibility for on-the-fly provisioning.


knowledge discovery and data mining | 2007

Event summarization for system management

Wei Peng; Chang-shing Perng; Tao Li; Haixun Wang

In system management applications, an overwhelming amount of data are generated and collected in the form of temporal events. While mining temporal event data to discover interesting and frequent patterns has obtained rapidly increasing research efforts, users of the applications are overwhelmed by the mining results. The extracted patterns are generally of large volume and hard to interpret, they may be of no emphasis, intricate and meaningless to non-experts, even to domain experts. While traditional research efforts focus on finding interesting patterns, in this paper, we take a novel approach called event summarization towards the understanding of the seemingly chaotic temporal data. Event summarization aims at providing a concise interpretation of the seemingly chaotic data, so that domain experts may take actions upon the summarized models. Event summarization decomposes the temporal information into many independent subsets and finds well fitted models to describe each subset.


international conference on web services | 2013

Ranking Services by Service Network Structure and Service Attributes

Yang Zhou; Ling Liu; Chang-shing Perng; Anca Sailer; Ignacio Silva-Lepe; Zhiyuan Su

Service network analysis is an essential aspect of web service discovery, search, mining and recommendation. Many popular web service networks are content-rich in terms of heterogeneous types of entities, attributes and links. A main challenge for ranking services is how to incorporate multiple complex and heterogeneous factors, such as service attributes, relationships between services, relationships between services and service providers or service consumers, into the design of service ranking functions. In this paper, we model services, attributes, and the associated entities, such as providers, consumers, by a heterogeneous service network. We propose a unified neighborhood random walk distance measure, which integrates various types of links and vertex attributes by a local optimal weight assignment. Based on this unified distance measure, a reinforcement algorithm, ServiceRank, is provided to tightly integrate ranking and clustering by mutually and simultaneously enhancing each other such that the performance of both can be improved. An additional clustering matching strategy is proposed to efficiently align clusters from different types of objects. Our extensive evaluation on both synthetic and real service networks demonstrates the effectiveness of ServiceRank in terms of the quality of both clustering and ranking among multiple types of entity, link and attribute similarities in a service network.


knowledge discovery and data mining | 2003

Data-driven validation, completion and construction of event relationship networks

Chang-shing Perng; David Thoenen; Genady Grabarnik; Sheng Ma; Joseph L. Hellerstein

Event management is a focal point in building and maintaining high quality information infrastructures. We have witnessed the shift of the paradigm of event management in practice from root cause analysis (RCA) to action-oriented analysis (AOA). IBM has developed a pioneer event management methodology (EMD) based on the AOA paradigm and applied it to more than two hundred production sites with success. Foreseeably, more and more event management professionals will apply AOA in different incarnations in building proactive management facilities. By that, building correct and effective Event Relationship Networks (ERNs) becomes the dominating activity in AOA service design process. Currently, the quality of ERNs and the cost of building them largely depend on the knowledge of domain experts. We believe that we can utilize historical event logs in shortening the ERNs design process and perfecting the quality of ERNs. In this paper, we describe in detail how to apply this data-driven approach in ERN validation, completion and construction.


conference on information and knowledge management | 2011

LogSig: generating system events from raw textual logs

Liang Tang; Tao Li; Chang-shing Perng

Modern computing systems generate large amounts of log data. System administrators or domain experts utilize the log data to understand and optimize system behaviors. Most system logs are raw textual and unstructured. One main fundamental challenge in automated log analysis is the generation of system events from raw textual logs. Log messages are relatively short text messages but may have a large vocabulary, which often result in poor performance when applying traditional text clustering techniques to the log data. Other related methods have various limitations and only work well for some particular system logs. In this paper, we propose a message signature based algorithm logSig to generate system events from textual log messages. By searching the most representative message signatures, logSig categorizes log messages into a set of event types. logSig can handle various types of log data, and is able to incorporate humans domain knowledge to achieve a high performance. We conduct experiments on five real system log data. Experiments show that logSig outperforms other alternative algorithms in terms of the overall performance.


ieee international conference on services computing | 2012

Self-Adaptive Cloud Capacity Planning

Yexi Jiang; Chang-shing Perng; Tao Li; Rong N. Chang

The popularity of cloud service spurs the increasing demands of cloud resources to the cloud service providers. Along with the new business opportunities, the pay-as-you-go model drastically changes the usage pattern and brings technology challenges to effective capacity planning. In this paper, we propose a new method for cloud capacity planning with the goal of fully utilizing the physical resources, as we believe this is one of the emerging problems for cloud providers. To solve this problem, we present an integrated system with intelligent cloud capacity prediction. Considering the unique characteristics of the cloud service that virtual machines are provisioned and de-provisioned frequently to meet the business needs, we propose an asymmetric and heterogeneous measure for modeling the over-estimation, and under-estimation of the capacity. To accurately forecast the capacity, we first divide the change of cloud capacity demand into provisioning and de-provisioning components, and then estimate the individual components respectively. The future provisioning demand is predicted by an ensemble time-series prediction method, while the future de-provisioning is inferred based on the life span distribution and the number of active virtual machines. Our proposed solution is simple and computational efficient, which make it practical for development and deployment. Our solution also has the advantages for generating interpretable predictions. The experimental results on the IBM Smart Cloud Enterprise trace data demonstrate the effectiveness, accuracy and efficiency of our solution.


european conference on principles of data mining and knowledge discovery | 2002

A Classification Approach for Prediction of Target Events in Temporal Sequences

Carlotta Domeniconi; Chang-shing Perng; Ricardo Vilalta; Sheng Ma

Learning to predict significant events from sequences of data with categorical features is an important problem in many application areas. We focus on events for system management, and formulate the problem of prediction as a classification problem. We perform co-occurrence analysis of events by means of Singular Value Decomposition (SVD) of the examples constructed from the data. This process is combined with Support Vector Machine (SVM) classification, to obtain efficient and accurate predictions. We conduct an analysis of statistical properties of event data, which explains why SVM classification is suitable for such data, and perform an empirical study using real data.


Sigkdd Explorations | 2002

Discovery in multi-attribute data with user-defined constraints

Chang-shing Perng; Haixun Wang; Sheng Ma; Joseph L. Hellerstein

There has been a growing interest in mining frequent itemsets in relational data with multiple attributes. A key step in this approach is to select a set of attributes that group data into transactions and a separate set of attributes that labels data into items. Unsupervised and unrestricted mining, however, is stymied by the combinatorial complexity and the quantity of patterns as the number of attributes grows. In this paper, we focus on leveraging the semantics of the underlying data for mining frequent itemsets. For instance, there are usually taxonomies in the data schema and functional dependencies among the attributes. Domain knowledge and user preferences often have the potential to significantly reduce the exponentially growing mining space. These observations motivate the design of a user-directed data mining framework that allows such domain knowledge to guide the mining process and control the mining strategy. We show examples of tremendous reduction in computation by using domain knowledge in mining relational data with multiple attributes.


conference on information and knowledge management | 2011

Natural event summarization

Yexi Jiang; Chang-shing Perng; Tao Li

Event mining is a useful way to understand computer system behaviors. The focus of recent works on event mining has been shifted to event summarization from discovering frequent patterns. Event summarization seeks to provide a comprehensible explanation of the event sequence on certain aspects. Previous methods have several limitations such as ignoring temporal information, generating the same set of boundaries for all event patterns, and providing a summary which is difficult for human to understand. In this paper, we propose a novel framework called natural event summarization that summarizes an event sequence using inter-arrival histograms to capture the temporal relationship among events. Our framework uses the minimum description length principle to guide the process in order to balance between accuracy and brevity. Also, we use multi-resolution analysis for pruning the problem space. We demonstrate how the principles can be applied to generate summaries with periodic patterns and correlation patterns in the framework. Experimental results on synthetic and real data show our method is capable of producing usable event summary, robust to noises, and scalable.

Collaboration


Dive into the Chang-shing Perng's collaboration.

Researchain Logo
Decentralizing Knowledge