William Hendrix | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where William Hendrix is active.

Explore More

Publication

Featured researches published by William Hendrix.

intelligent information systems | 2012

Community-based anomaly detection in evolutionary networks

Zhengzhang Chen; William Hendrix; Nagiza F. Samatova

Networks of dynamic systems, including social networks, the World Wide Web, climate networks, and biological networks, can be highly clustered. Detecting clusters, or communities, in such dynamic networks is an emerging area of research; however, less work has been done in terms of detecting community-based anomalies. While there has been some previous work on detecting anomalies in graph-based data, none of these anomaly detection approaches have considered an important property of evolutionary networks—their community structure. In this work, we present an approach to uncover community-based anomalies in evolutionary networks characterized by overlapping communities. We develop a parameter-free and scalable algorithm using a proposed representative-based technique to detect all six possible types of community-based anomalies: grown, shrunken, merged, split, born, and vanished communities. We detail the underlying theory required to guarantee the correctness of the algorithm. We measure the performance of the community-based anomaly detection algorithm by comparison to a non–representative-based algorithm on synthetic networks, and our experiments on synthetic datasets show that our algorithm achieves a runtime speedup of 11–46 over the baseline algorithm. We have also applied our algorithm to two real-world evolutionary networks, Food Web and Enron Email. Significant and informative community-based anomaly dynamics have been detected in both cases.

international conference on data mining | 2010

Detecting and Tracking Community Dynamics in Evolutionary Networks

Zhengzhang Chen; Kevin A. Wilson; Ye Jin; William Hendrix; Nagiza F. Samatova

Community structure or clustering is ubiquitous in many evolutionary networks including social networks, biological networks and financial market networks. Detecting and tracking community deviations in evolutionary networks can uncover important and interesting behaviors that are latent if we ignore the dynamic information. In biological networks, for example, a small variation in a gene community may indicate an event, such as gene fusion, gene fission, or gene decay. In contrast to the previous work on detecting communities in static graphs or tracking conserved communities in time-varying graphs, this paper first introduces the concept of community dynamics, and then shows that the baseline approach by enumerating all communities in each graph and comparing all pairs of communities between consecutive graphs is infeasible and impractical. We propose an efficient method for detecting and tracking community dynamics in evolutionary networks by introducing graph representatives and community representatives to avoid generating redundant communities and limit the search space. We measure the performance of the representative-based algorithm by comparison to the baseline algorithm on synthetic networks, and our experiments show that our algorithm achieves a runtime speedup of 11â€“46. The method has also been applied to two real-world evolutionary networks including Food Web and Enron Email. Significant and informative community dynamics have been detected in both cases.

Data Mining and Knowledge Discovery | 2013

Discovery of extreme events-related communities in contrasting groups of physical system networks

Zhengzhang Chen; William Hendrix; Hang Guan; Isaac K. Tetteh; Alok N. Choudhary; Fredrick H. M. Semazzi; Nagiza F. Samatova

The latent behavior of a physical system that can exhibit extreme events such as hurricanes or rainfalls, is complex. Recently, a very promising means for studying complex systems has emerged through the concept of complex networks. Networks representing relationships between individual objects usually exhibit community dynamics. Conventional community detection methods mainly focus on either mining frequent subgraphs in a network or detecting stable communities in time-varying networks. In this paper, we formulate a novel problem—detection of predictive and phase-biased communities in contrasting groups of networks, and propose an efficient and effective machine learning solution for finding such anomalous communities. We build different groups of networks corresponding to different system’s phases, such as higher or low hurricane activity, discover phase-related system components as seeds to help bound the search space of community generation in each network, and use the proposed contrast-based technique to identify the changing communities across different groups. The detected anomalous communities are hypothesized (1) to play an important role in defining the target system’s state(s) and (2) to improve the predictive skill of the system’s states when used collectively in the ensemble of predictive models. When tested on the two important extreme event problems—identification of tropical cyclone-related and of African Sahel rainfall-related climate indices—our algorithm demonstrated the superior performance in terms of various skill and robustness metrics, including 8–16 % accuracy increase, as well as physical interpretability of detected communities. The experimental results also show the efficiency of our algorithm on synthetic datasets.

Journal of Physics: Conference Series | 2008

Coupling graph perturbation theory with scalable parallel algorithms for large-scale enumeration of maximal cliques in biological graphs

Nagiza F. Samatova; Matthew C. Schmidt; William Hendrix; Paul Breimyer; Kevin Thomas; Byung-Hoon Park

Data-driven construction of predictive models for biological systems faces challenges from data intensity, uncertainty, and computational complexity. Data-driven model inference is often considered a combinatorial graph problem where an enumeration of all feasible models is sought. The data-intensive and the NP-hard nature of such problems, however, challenges existing methods to meet the required scale of data size and uncertainty, even on modern supercomputers. Maximal clique enumeration (MCE) in a graph derived from such biological data is often a rate-limiting step in detecting protein complexes in protein interaction data, finding clusters of co-expressed genes in microarray data, or identifying clusters of orthologous genes in protein sequence data. We report two key advances that address this challenge. We designed and implemented the first (to the best of our knowledge) parallel MCE algorithm that scales linearly on thousands of processors running MCE on real-world biological networks with thousands and hundreds of thousands of vertices. In addition, we proposed and developed the Graph Perturbation Theory (GPT) that establishes a foundation for efficiently solving the MCE problem in perturbed graphs, which model the uncertainty in the data. GPT formulates necessary and sufficient conditions for detecting the differences between the sets of maximal cliques in the original and perturbed graphs and reduces the enumeration time by more than 80% compared to complete recomputation.

knowledge discovery and data mining | 2009

On perturbation theory and an algorithm for maximal clique enumeration in uncertain and noisy graphs

William Hendrix; Matthew C. Schmidt; Paul Breimyer; Nagiza F. Samatova

The maximal clique enumeration (MCE) problem can be used to find very tightly-coupled collections of objects inside a network or graph of relationships. However, when such networks are based on noisy or uncertain data, the solutions to the MCE problem for several closely related graphs may be necessary to accurately define the collections. Thus, we propose an algorithm that efficiently solves the MCE problem on altered, or perturbed, graphs. The algorithm utilizes the enumeration of a baseline graph and identifies only those maximal cliques that the perturbation adds and/or removes. We detail the algorithm and the underlying theory required to guarantee correctness. Further, we report average runtime speedups of 7 and 9 for our algorithm over traditional enumeration techniques in the cases of adding and removing edges, respectively, from graphs constructed from protein interaction data.

international conference on data mining | 2010

The Multiple Alignment Algorithm for Metabolic Pathways without Abstraction

Wenbin Chen; Andrea M. Rocha; William Hendrix; Matthew C. Schmidt; Nagiza F. Samatova

Computational problems associated with metabolic pathways have been extensively studied in computational biology. The problem of aligning multiple metabolic pathways is very challenging. Tohsato et al.’s algorithm for aligning multiple metabolic pathways is based on similarities between enzymes, however, a metabolic pathway consists of three types of entities: reactions, compounds, and enzymes. In this paper, we propose the first algorithm for the problem of aligning multiple metabolic pathways based on the similarities among reactions, compounds, enzymes, and pathway topology. First, we compute a weight between each pair of like entities in different input pathways based on the entities’ similarity score and topological structure using the methods by Ferhat Ay et al.. We then construct a weighted k-partite graph for the reactions, compounds, and enzymes. We extract a mapping between these entities by solving the maximum-weighted k-partite matching problem by applying a novel heuristic algorithm. By analyzing the alignment results of multiple pathways in different organisms, we show that the alignments found by our algorithm correctly identify common sub networks among multiple pathways.

2008 Workshop on Ultrascale Visualization | 2008

An outlook into ultra-scale visualization of large-scale biological data

Nagiza F. Samatova; Paul Breimyer; William Hendrix; Matthew C. Schmidt; Theresa-Marie Rhyne

As bioinformatics has evolved from a reductionistic approach to a complementary multi-scale integrative approach, new challenges in ultra-scale visualization have arisen. Even though visualization is a critical component to large-scale biological data analysis, the ultra-scale nature of systems biology has given rise to novel problems in visualization that are not addressed by existing methods. Visualization is a rich and actively researched domain, and there are many open research questions pertaining to the increasing demands of visualization in bioinformatics. In this paper, we present several broadly important ultra-scale visualization challenges and discuss specific examples of ultra-scale applications in systems biology.

international joint conference on artificial intelligence | 2011

Biclustering-driven ensemble of Bayesian belief network classifiers for underdetermined problems

Tatdow Pansombut; William Hendrix; Zekai Jacob Gao; Brent Harrison; Nagiza F. Samatova

In this paper, we present BENCH (Biclustering-driven ENsemble of Classifiers), an algorithm to construct an ensemble of classifiers through concurrent feature and data point selection guided by unsupervised knowledge obtained from biclustering. BENCH is designed for underdetermined problems. In our experiments, we use Bayesian Belief Network (BBN) classifiers as base classifiers in the ensemble; however, BENCH can be applied to other classification models as well. We show that BENCH is able to increase prediction accuracy of a single classifier and traditional ensemble of classifiers by up to 15% on three microarray datasets using various weighting schemes for combining individual predictions in the ensemble.

cluster computing and the grid | 2015

Proceedings - 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015

Chen Jin; Qiang Fu; Huahua Wang; William Hendrix; Zhengzhang Chen; Ankit Agrawal; Arindam Banerjee; Alok N. Choudhary

An important problem in discrete graphical models is the maximum a posterior (MAP) inference problem. Recent research has been focusing on the development of parallel MAP inference algorithm, which scales to graphical models of millions of nodes. In this paper, we introduce a parallel implementation of the recently proposed Bethe-ADMM algorithm using Message Passing Interface (MPI), which allows us to fully utilize the computing power provided by the modern supercomputers with thousands of cores. Experimental results demonstrate that for a broad class of problems, our parallel implementation of Bethe-ADMM scales almost linearly even with thousands of cores.

cluster computing and the grid | 2015