Zhengzhang Chen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zhengzhang Chen is active.

Explore More

Publication

Featured researches published by Zhengzhang Chen.

intelligent information systems | 2012

Community-based anomaly detection in evolutionary networks

Zhengzhang Chen; William Hendrix; Nagiza F. Samatova

Networks of dynamic systems, including social networks, the World Wide Web, climate networks, and biological networks, can be highly clustered. Detecting clusters, or communities, in such dynamic networks is an emerging area of research; however, less work has been done in terms of detecting community-based anomalies. While there has been some previous work on detecting anomalies in graph-based data, none of these anomaly detection approaches have considered an important property of evolutionary networks—their community structure. In this work, we present an approach to uncover community-based anomalies in evolutionary networks characterized by overlapping communities. We develop a parameter-free and scalable algorithm using a proposed representative-based technique to detect all six possible types of community-based anomalies: grown, shrunken, merged, split, born, and vanished communities. We detail the underlying theory required to guarantee the correctness of the algorithm. We measure the performance of the community-based anomaly detection algorithm by comparison to a non–representative-based algorithm on synthetic networks, and our experiments on synthetic datasets show that our algorithm achieves a runtime speedup of 11–46 over the baseline algorithm. We have also applied our algorithm to two real-world evolutionary networks, Food Web and Enron Email. Significant and informative community-based anomaly dynamics have been detected in both cases.

Scientific Reports | 2015

A predictive machine learning approach for microstructure optimization and materials design.

Ruoqian Liu; Abhishek Kumar; Zhengzhang Chen; Ankit Agrawal; Veera Sundararaghavan; Alok N. Choudhary

This paper addresses an important materials engineering question: How can one identify the complete space (or as much of it as possible) of microstructures that are theoretically predicted to yield the desired combination of properties demanded by a selected application? We present a problem involving design of magnetoelastic Fe-Ga alloy microstructure for enhanced elastic, plastic and magnetostrictive properties. While theoretical models for computing properties given the microstructure are known for this alloy, inversion of these relationships to obtain microstructures that lead to desired properties is challenging, primarily due to the high dimensionality of microstructure space, multi-objective design requirement and non-uniqueness of solutions. These challenges render traditional search-based optimization methods incompetent in terms of both searching efficiency and result optimality. In this paper, a route to address these challenges using a machine learning methodology is proposed. A systematic framework consisting of random data generation, feature selection and classification algorithms is developed. Experiments with five design problems that involve identification of microstructures that satisfy both linear and nonlinear property constraints show that our framework outperforms traditional optimization methods with the average running time reduced by as much as 80% and with optimality that would not be achieved otherwise.

international conference on data mining | 2010

Detecting and Tracking Community Dynamics in Evolutionary Networks

Zhengzhang Chen; Kevin A. Wilson; Ye Jin; William Hendrix; Nagiza F. Samatova

Community structure or clustering is ubiquitous in many evolutionary networks including social networks, biological networks and financial market networks. Detecting and tracking community deviations in evolutionary networks can uncover important and interesting behaviors that are latent if we ignore the dynamic information. In biological networks, for example, a small variation in a gene community may indicate an event, such as gene fusion, gene fission, or gene decay. In contrast to the previous work on detecting communities in static graphs or tracking conserved communities in time-varying graphs, this paper first introduces the concept of community dynamics, and then shows that the baseline approach by enumerating all communities in each graph and comparing all pairs of communities between consecutive graphs is infeasible and impractical. We propose an efficient method for detecting and tracking community dynamics in evolutionary networks by introducing graph representatives and community representatives to avoid generating redundant communities and limit the search space. We measure the performance of the representative-based algorithm by comparison to the baseline algorithm on synthetic networks, and our experiments show that our algorithm achieves a runtime speedup of 11â€“46. The method has also been applied to two real-world evolutionary networks including Food Web and Enron Email. Significant and informative community dynamics have been detected in both cases.

IEEE Intelligent Systems | 2014

MuSES: Multilingual Sentiment Elicitation System for Social Media Data

Yusheng Xie; Zhengzhang Chen; Kunpeng Zhang; Yu Cheng; Daniel Honbo; Ankit Agrawal; Alok N. Choudhary

A multilingual sentiment identification system (MuSES) implements three different sentiment identification algorithms. The first algorithm augments previous compositional semantic rules by adding rules specific to social media. The second algorithm defines a scoring function that measures the degree of a sentiment, instead of simply classifying a sentiment into binary polarities. All such scores are calculated based on a large volume of customer reviews. Due to the special characteristics of social media texts, a third algorithm takes emoticons, negation word position, and domain-specific words into account. In addition, a proposed label-free process transfers multilingual sentiment knowledge between different languages. The authors conduct their experiments on user comments from Facebook, tweets from Twitter, and multilingual product reviews from Amazon.

ieee international conference on high performance computing data and analytics | 2014

NUMARCK: machine learning algorithm for resiliency and checkpointing

Zhengzhang Chen; Seung Woo Son; William Hendrix; Ankit Agrawal; Wei-keng Liao; Alok N. Choudhary

Data check pointing is an important fault tolerance technique in High Performance Computing (HPC) systems. As the HPC systems move towards exascale, the storage space and time costs of check pointing threaten to overwhelm not only the simulation but also the post-simulation data analysis. One common practice to address this problem is to apply compression algorithms to reduce the data size. However, traditional lossless compression techniques that look for repeated patterns are ineffective for scientific data in which high-precision data is used and hence common patterns are rare to find. This paper exploits the fact that in many scientific applications, the relative changes in data values from one simulation iteration to the next are not very significantly different from each other. Thus, capturing the distribution of relative changes in data instead of storing the data itself allows us to incorporate the temporal dimension of the data and learn the evolving distribution of the changes. We show that an order of magnitude data reduction becomes achievable within guaranteed user-defined error bounds for each data point. We propose NUMARCK, North western University Machine learning Algorithm for Resiliency and Check pointing, that makes use of the emerging distributions of data changes between consecutive simulation iterations and encodes them into an indexing space that can be concisely represented. We evaluate NUMARCK using two production scientific simulations, FLASH and CMIP5, and demonstrate a superior performance in terms of compression ratio and compression accuracy. More importantly, our algorithm allows users to specify the maximum tolerable error on a per point basis, while compressing the data by an order of magnitude.

knowledge discovery and data mining | 2013

JobMiner: a real-time system for mining job-related patterns from social media

Yu Cheng; Yusheng Xie; Zhengzhang Chen; Ankit Agrawal; Alok N. Choudhary; Songtao Guo

The various kinds of booming social media not only provide a platform where people can communicate with each other, but also spread useful domain information, such as career and job market information. For example, LinkedIn publishes a large amount of messages either about people who want to seek jobs or companies who want to recruit new members. By collecting information, we can have a better understanding of the job market and provide insights to job-seekers, companies and even decision makers. In this paper, we analyze the job information from the social network point of view. We first collect the job-related information from various social media sources. Then we construct an inter-company job-hopping network, with the vertices denoting companies and the edges denoting flow of personnel between companies. We subsequently employ graphmining techniques to mine influential companies and related company groups based on the job-hopping network model. Demonstration on LinkedIn data shows that our system JobMiner can provide a better understanding of the dynamic processes and a more accurate identification of important entities in the job market.

Data Mining and Knowledge Discovery | 2013

Discovery of extreme events-related communities in contrasting groups of physical system networks

Zhengzhang Chen; William Hendrix; Hang Guan; Isaac K. Tetteh; Alok N. Choudhary; Fredrick H. M. Semazzi; Nagiza F. Samatova

The latent behavior of a physical system that can exhibit extreme events such as hurricanes or rainfalls, is complex. Recently, a very promising means for studying complex systems has emerged through the concept of complex networks. Networks representing relationships between individual objects usually exhibit community dynamics. Conventional community detection methods mainly focus on either mining frequent subgraphs in a network or detecting stable communities in time-varying networks. In this paper, we formulate a novel problem—detection of predictive and phase-biased communities in contrasting groups of networks, and propose an efficient and effective machine learning solution for finding such anomalous communities. We build different groups of networks corresponding to different system’s phases, such as higher or low hurricane activity, discover phase-related system components as seeds to help bound the search space of community generation in each network, and use the proposed contrast-based technique to identify the changing communities across different groups. The detected anomalous communities are hypothesized (1) to play an important role in defining the target system’s state(s) and (2) to improve the predictive skill of the system’s states when used collectively in the ensemble of predictive models. When tested on the two important extreme event problems—identification of tropical cyclone-related and of African Sahel rainfall-related climate indices—our algorithm demonstrated the superior performance in terms of various skill and robustness metrics, including 8–16 % accuracy increase, as well as physical interpretability of detected communities. The experimental results also show the efficiency of our algorithm on synthetic datasets.

BMC Systems Biology | 2012

Spice: discovery of phenotype-determining component interplays

Zhengzhang Chen; Kanchana Padmanabhan; Andrea M. Rocha; Yekaterina Shpanskaya; James R. Mihelcic; Kathleen M. Scott; Nagiza F. Samatova

BackgroundA latent behavior of a biological cell is complex. Deriving the underlying simplicity, or the fundamental rules governing this behavior has been the Holy Grail of systems biology. Data-driven prediction of the system components and their component interplays that are responsible for the target system’s phenotype is a key and challenging step in this endeavor.ResultsThe proposed approach, which we call System Phenotype-related Interplaying Components Enumerator (Spice), iteratively enumerates statistically significant system components that are hypothesized (1) to play an important role in defining the specificity of the target system’s phenotype(s); (2) to exhibit a functionally coherent behavior, namely, act in a coordinated manner to perform the phenotype-specific function; and (3) to improve the predictive skill of the system’s phenotype(s) when used collectively in the ensemble of predictive models. Spice can be applied to both instance-based data and network-based data. When validated, Spice effectively identified system components related to three target phenotypes: biohydrogen production, motility, and cancer. Manual results curation agreed with the known phenotype-related system components reported in literature. Additionally, using the identified system components as discriminatory features improved the prediction accuracy by 10% on the phenotype-classification task when compared to a number of state-of-the-art methods applied to eight benchmark microarray data sets.ConclusionWe formulate a problem—enumeration of phenotype-determining system component interplays—and propose an effective methodology (Spice) to address this problem. Spice improved identification of cancer-related groups of genes from various microarray data sets and detected groups of genes associated with microbial biohydrogen production and motility, many of which were reported in literature. Spice also improved the predictive skill of the system’s phenotype determination compared to individual classifiers and/or other ensemble methods, such as bagging, boosting, random forest, nearest shrunken centroid, and random forest variable selection method.

conference on information and knowledge management | 2013

Feedback-driven multiclass active learning for data streams

Yu Cheng; Zhengzhang Chen; Lu Liu; Jiang Wang; Ankit Agrawal; Alok N. Choudhary

Active learning is a promising way to efficiently build up training sets with minimal supervision. Most existing methods consider the learning problem in a pool-based setting. However, in a lot of real-world learning tasks, such as crowdsourcing, the unlabeled samples, arrive sequentially in the form of continuous rapid streams. Thus, preparing a pool of unlabeled data for active learning is impractical. Moreover, performing exhaustive search in a data pool is expensive, and therefore unsuitable for supporting on-the-fly interactive learning in large scale data. In this paper, we present a systematic framework for stream-based multi-class active learning. Following the reinforcement learning framework, we propose a feedback-driven active learning approach by adaptively combining different criteria in a time-varying manner. Our method is able to balance exploration and exploitation during the learning process. Extensive evaluation on various benchmark and real-world datasets demonstrates the superiority of our framework over existing methods.

knowledge discovery and data mining | 2016

Ranking Causal Anomalies via Temporal and Dynamical Analysis on Vanishing Correlations

Wei Cheng; Kai Zhang; Haifeng Chen; Guofei Jiang; Zhengzhang Chen; Wei Wang

Modern world has witnessed a dramatic increase in our ability to collect, transmit and distribute real-time monitoring and surveillance data from large-scale information systems and cyber-physical systems. Detecting system anomalies thus attracts significant amount of interest in many fields such as security, fault management, and industrial optimization. Recently, invariant network has shown to be a powerful way in characterizing complex system behaviours. In the invariant network, a node represents a system component and an edge indicates a stable, significant interaction between two components. Structures and evolutions of the invariance network, in particular the vanishing correlations, can shed important light on locating causal anomalies and performing diagnosis. However, existing approaches to detect causal anomalies with the invariant network often use the percentage of vanishing correlations to rank possible casual components, which have several limitations: 1) fault propagation in the network is ignored; 2) the root casual anomalies may not always be the nodes with a high-percentage of vanishing correlations; 3) temporal patterns of vanishing correlations are not exploited for robust detection. To address these limitations, in this paper we propose a network diffusion based framework to identify significant causal anomalies and rank them. Our approach can effectively model fault propagation over the entire invariant network, and can perform joint inference on both the structural, and the time-evolving broken invariance patterns. As a result, it can locate high-confidence anomalies that are truly responsible for the vanishing correlations, and can compensate for unstructured measurement noise in the system. Extensive experiments on synthetic datasets, bank information system datasets, and coal plant cyber-physical system datasets demonstrate the effectiveness of our approach.

Explore More