Guofei Jiang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Guofei Jiang is active.

Explore More

Publication

Featured researches published by Guofei Jiang.

international conference on distributed computing systems | 2009

Modeling Probabilistic Measurement Correlations for Problem Determination in Large-Scale Distributed Systems

Jing Gao; Guofei Jiang; Haifeng Chen; Jiawei Han

With the growing complexity in computer systems, it has been a real challenge to detect and diagnose problems in todays large-scale distributed systems. Usually, the correlations between measurements collected across the distributed system contain rich information about the system behaviors, and thus a reasonable model to describe such correlations is crucially important in detecting and locating system problems. In this paper, we propose a transition probability model based on markov properties to characterize pair-wise measurement correlations. The proposed method can discover both the spatial (across system measurements) and temporal (across observation time) correlations, and thus such a model can successfully represent the system normal profiles. Problem determination and localization under this framework is fast and convenient. The framework is general enough to discover any types of correlations (e.g. linear or non-linear). Also, model updating, system problem detection and diagnosis can be conducted effectively and efficiently. Experimental results show that, the proposed method can detect the anomalous events and locate the problematic sources by analyzing the real monitoring data collected from three companies infrastructures.

knowledge discovery and data mining | 2014

Temporal skeletonization on sequential data: patterns, categorization, and visualization

Chuanren Liu; Kai Zhang; Hui Xiong; Guofei Jiang; Qiang Yang

Sequential pattern analysis aims at finding statistically relevant temporal structures where the values are delivered in a sequence. With the growing complexity of real-world dynamic scenarios, more and more symbols are often needed to encode the sequential values. This is so-called “curse of cardinality”, which can impose significant challenges to the design of sequential analysis methods in terms of computational efficiency and practical use. Indeed, given the overwhelming scale and the heterogeneous nature of the sequential data, new visions and strategies are needed to face the challenges. To this end, in this paper, we propose a “temporal skeletonization” approach to proactively reduce the cardinality of the representation for sequences by uncovering significant, hidden temporal structures. The key idea is to summarize the temporal correlations in an undirected graph, and use the “skeleton” of the graph as a higher granularity on which hidden temporal patterns are more likely to be identified. As a consequence, the embedding topology of the graph allows us to translate the rich temporal content into a metric space. This opens up new possibilities to explore, quantify, and visualize sequential data. Our approach has shown to greatly alleviate the curse of cardinality in challenging tasks of sequential pattern mining and clustering. Evaluation on a business-to-business (B2B) marketing application demonstrates that our approach can effectively discover critical buying paths from noisy customer event data.

Tsinghua Science & Technology | 2014

Mining sensor data in cyber-physical systems

Lu An Tang; Jiawei Han; Guofei Jiang

A Cyber-Physical System (CPS) integrates physical devices (i.e., sensors) with cyber (i.e., informational) components to form a context sensitive system that responds intelligently to dynamic changes in real-world situations. Such a system has wide applications in the scenarios of traffic control, battlefield surveillance, environmental monitoring, and so on. A core element of CPS is the collection and assessment of information from noisy, dynamic, and uncertain physical environments integrated with many types of cyber-space resources. The potential of this integration is unbounded. To achieve this potential the raw data acquired from the physical world must be transformed into useable knowledge in real-time. Therefore, CPS brings a new dimension to knowledge discovery because of the emerging synergism of the physical and the cyber. The various properties of the physical world must be addressed in information management and knowledge discovery. This paper discusses the problems of mining sensor data in CPS: With a large number of wireless sensors deployed in a designated area, the task is real time detection of intruders that enter the area based on noisy sensor data. The framework of IntruMine is introduced to discover intruders from untrustworthy sensor data. IntruMine first analyzes the trustworthiness of sensor data, then detects the intruders locations, and verifies the detections based on a graph model of the relationships between sensors and intruders.

international joint conference on artificial intelligence | 2017

A dual-stage attention-based recurrent neural network for time series prediction

Yao Qin; Dongjin Song; Haifeng Cheng; Wei Cheng; Guofei Jiang; Garrison W. Cottrell

The Nonlinear autoregressive exogenous (NARX) model, which predicts the current value of a time series based upon its previous values as well as the current and past values of multiple driving (exogenous) series, has been studied for decades. Despite the fact that various NARX models have been developed, few of them can capture the long-term temporal dependencies appropriately and select the relevant driving series to make predictions. In this paper, we propose a dual-stage attention-based recurrent neural network (DA-RNN) to address these two issues. In the first stage, we introduce an input attention mechanism to adaptively extract relevant driving series (a.k.a., input features) at each time step by referring to the previous encoder hidden state. In the second stage, we use a temporal attention mechanism to select relevant encoder hidden states across all time steps. With this dual-stage attention scheme, our model can not only make predictions effectively, but can also be easily interpreted. Thorough empirical studies based upon the SML 2010 dataset and the NASDAQ 100 Stock dataset demonstrate that the DA-RNN can outperform state-of-the-art methods for time series prediction.

ACM Transactions on Knowledge Discovery From Data | 2014

Ranking Metric Anomaly in Invariant Networks

Yong Ge; Guofei Jiang; Min Ding; Hui Xiong

The management of large-scale distributed information systems relies on the effective use and modeling of monitoring data collected at various points in the distributed information systems. A traditional approach to model monitoring data is to discover invariant relationships among the monitoring data. Indeed, we can discover all invariant relationships among all pairs of monitoring data and generate invariant networks, where a node is a monitoring data source (metric) and a link indicates an invariant relationship between two monitoring data. Such an invariant network representation can help system experts to localize and diagnose the system faults by examining those broken invariant relationships and their related metrics, since system faults usually propagate among the monitoring data and eventually lead to some broken invariant relationships. However, at one time, there are usually a lot of broken links (invariant relationships) within an invariant network. Without proper guidance, it is difficult for system experts to manually inspect this large number of broken links. To this end, in this article, we propose the problem of ranking metrics according to the anomaly levels for a given invariant network, while this is a nontrivial task due to the uncertainties and the complex nature of invariant networks. Specifically, we propose two types of algorithms for ranking metric anomaly by link analysis in invariant networks. Along this line, we first define two measurements to quantify the anomaly level of each metric, and introduce the mRank algorithm. Also, we provide a weighted score mechanism and develop the gRank algorithm, which involves an iterative process to obtain a score to measure the anomaly levels. In addition, some extended algorithms based on mRank and gRank algorithms are developed by taking into account the probability of being broken as well as noisy links. Finally, we validate all the proposed algorithms on a large number of real-world and synthetic data sets to illustrate the effectiveness and efficiency of different algorithms.

knowledge discovery and data mining | 2016

Ranking Causal Anomalies via Temporal and Dynamical Analysis on Vanishing Correlations

Wei Cheng; Kai Zhang; Haifeng Chen; Guofei Jiang; Zhengzhang Chen; Wei Wang

Modern world has witnessed a dramatic increase in our ability to collect, transmit and distribute real-time monitoring and surveillance data from large-scale information systems and cyber-physical systems. Detecting system anomalies thus attracts significant amount of interest in many fields such as security, fault management, and industrial optimization. Recently, invariant network has shown to be a powerful way in characterizing complex system behaviours. In the invariant network, a node represents a system component and an edge indicates a stable, significant interaction between two components. Structures and evolutions of the invariance network, in particular the vanishing correlations, can shed important light on locating causal anomalies and performing diagnosis. However, existing approaches to detect causal anomalies with the invariant network often use the percentage of vanishing correlations to rank possible casual components, which have several limitations: 1) fault propagation in the network is ignored; 2) the root casual anomalies may not always be the nodes with a high-percentage of vanishing correlations; 3) temporal patterns of vanishing correlations are not exploited for robust detection. To address these limitations, in this paper we propose a network diffusion based framework to identify significant causal anomalies and rank them. Our approach can effectively model fault propagation over the entire invariant network, and can perform joint inference on both the structural, and the time-evolving broken invariance patterns. As a result, it can locate high-confidence anomalies that are truly responsible for the vanishing correlations, and can compensate for unstructured measurement noise in the system. Extensive experiments on synthetic datasets, bank information system datasets, and coal plant cyber-physical system datasets demonstrate the effectiveness of our approach.

IEEE Transactions on Knowledge and Data Engineering | 2016

Temporal Skeletonization on Sequential Data: Patterns, Categorization, and Visualization

Chuanren Liu; Kai Zhang; Hui Xiong; Guofei Jiang; Qiang Yang

knowledge discovery and data mining | 2015

Efficient Long-Term Degradation Profiling in Time Series for Complex Physical Systems

Liudmila Ulanova; Tan Yan; Haifeng Chen; Guofei Jiang; Eamonn J. Keogh; Kai Zhang

The long term operation of physical systems inevitably leads to their wearing out, and may cause degradations in performance or the unexpected failure of the entire system. To reduce the possibility of such unanticipated failures, the system must be monitored for tell-tale symptoms of degradation that are suggestive of imminent failure. In this work, we introduce a novel time series analysis technique that allows the decomposition of the time series into trend and fluctuation components, providing the monitoring software with actionable information about the changes of the systems behavior over time. We analyze the underlying problem and formulate it to a Quadratic Programming (QP) problem that can be solved with existing QP-solvers. However, when the profiling resolution is high, as generally required by real-world applications, such a decomposition becomes intractable to general QP-solvers. To speed up the problem solving, we further transform the problem and present a novel QP formulation, Non-negative QP, for the problem and demonstrate a tractable solution that bypasses the use of slow general QP-solvers. We demonstrate our ideas on both synthetic and real datasets, showing that our method allows us to accurately extract the degradation phenomenon of time series. We further demonstrate the generality of our ideas by applying them beyond classic machine prognostics to problems in identifying the influence of news events on currency exchange rates and stock prices. We fully implement our profiling system and deploy it into several physical systems, such as chemical plants and nuclear power plants, and it greatly helps detect the degradation phenomenon, and diagnose the corresponding components.

siam international conference on data mining | 2016

Integrating Community and Role Detection in Information Networks.

Ting Chen; Lu An Tang; Yizhou Sun; Zhengzhang Chen; Haifeng Chen; Guofei Jiang

Community detection and role detection in information networks have received wide attention recently, where the former aims to detect the groups of nodes that are closely connected to each other and the latter aims to discover the underlying roles of nodes in the network. Traditional studies treat these two problems as orthogonal issues and propose algorithms for these two tasks separately. In this paper, we propose to integrate communities and roles in a unified model and detect both of them simultaneously for information networks. Intuitively, (1) correctly detecting the communities in a network will lead to the success of detecting roles of nodes, such as opinion leaders and followers in social networks; and (2) correctly identifying the roles of the nodes will lead to a better network modeling and thus a better detection of communities. A novel probabilistic network model, the Mixed Membership Community and Role model (MMCR), is then proposed, which models the latent community and role of each node at the same time, and the probability of links are defined accordingly. By testing our model on synthetic networks and two real-world networks, we demonstrate that our approach leads to better performance for both community detection and role detection. Moreover, our model has a better interpretation for link generation in networks according to the link prediction task.

international conference on data mining | 2013

Efficient Invariant Search for Distributed Information Systems

Yong Ge; Guofei Jiang; Yuan Ge

In todays distributed information systems, a large amount of monitoring data such as log files have been collected. These monitoring data at various points of a distributed information system provide unparallel opportunities for us to characterize and track the information system via effectively correlating all monitoring data across the distributed system. Jiang1 proposed a concept named flow intensity to measure the intensity with which the monitoring data reacts to the volume of different user requests. The Autoregressive model with exogenous inputs (ARX) was used to quantify the relationship between each pair of flow intensity measured at various points across distributed systems. If such relationships hold all the time, they are considered as invariants of the underlying systems. Such invariants have been successfully used to characterize complex systems and support various system management tasks, such as system fault detection and localization. However, it is very time-consuming to search the complete set of invariants of large scale systems and existing algorithms are not scalable for thousands of flow intensity measurements. To this end, in this paper, we develop effective pruning techniques based on the identified upper bounds. Accordingly, two efficient algorithms are proposed to search the complete set of invariants based on the pruning techniques. Finally we demonstrate the efficiency and effectiveness of our algorithms with both real-world and synthetic data sets.

Explore More