Mohammad Ahmad Munawar

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mohammad Ahmad Munawar is active.

Explore More

Publication

Featured researches published by Mohammad Ahmad Munawar.

international conference on autonomic computing | 2009

System monitoring with metric-correlation models: problems and solutions

Miao Jiang; Mohammad Ahmad Munawar; Thomas Reidemeister; Paul Ward

Correlations among management metrics in software systems allow errors to be detected and their cause localized. Prior research shows that linear models can capture many of these correlations. However, our research shows that several factors may prevent linear models from accurately describing correlations, even if the underlying relationship is linear. Two common phenomena we have observed are relationships that evolve, typically with time, and heterogeneous variance of the correlated metrics. Two-variable linear models proposed thus far fail to capture these phenomena, and thus fail to describe system dynamics correctly. Often, these phenomena are caused by a missing variable. However, searching for three-variable correlations is O(n3) for n metrics, which is costly for systems with many metrics. In this paper we address the above challenges by improving on two-variable Ordinary Least Squares regression models. We validate our models using a realistic Java-Enterprise-Edition application. Using fault-injection experiments we show that our improved models capture system behavior accurately. We detect errors within 8 sample periods on average from the injection of the fault, which is less than half the time required by the current linear-model approach.

dependable systems and networks | 2009

Automatic fault detection and diagnosis in complex software systems by information-theoretic monitoring

Miao Jiang; Mohammad Ahmad Munawar; Thomas Reidemeister; Paul Ward

Management metrics of complex software systems exhibit stable correlations which can enable fault detection and diagnosis. Current approaches use specific analytic forms, typically linear, for modeling correlations. In this paper we use Normalized Mutual Information as a similarity measure to identify clusters of correlated metrics, without knowing the specific form. We show how we can apply the Wilcoxon Rank-Sum test to identify anomalous behaviour. We present two diagnosis algorithms to locate faulty components: RatioScore, based on the Jaccard Coefficient, and SigScore, which incorporates knowledge of component dependencies. We evaluate our mechanisms in the context of a complex enterprise application. Through fault-injection experiments, we show that we can detect 17 out of 22 faults without any false positives. We diagnose the faulty component in the top five anomaly scores 7 times out of 17 using SigScore, which is 40% better than when system structure is ignored.

conference of the centre for advanced studies on collaborative research | 2007

A comparative study of pairwise regression techniques for problem determination

Mohammad Ahmad Munawar; Paul Ward

Many runtime metrics can be collected from modern software systems. Stable statistical relationships exist among these metrics. Deviation from these stable relationships indicates potential problems, allowing diagnosis of failures. There exist many modeling techniques to represent these relationships. However, which one to use is a question that has yet to be studied. In this paper we compare the use of simple linear regression (SLR) to some of its more complex variants, including autoregressive regression and locally weighted regression. We consider the component coverage, model robustness, accuracy of diagnosis, and computation cost. Our study finds that while more flexible models can improve diagnosis accuracy, they achieve it at the cost of reduced robust-ness. In particular, we found the autoregressive regression model with exogenous input (ARX) to provide the most accurate diagnosis; however, it is the least robust of the techniques considered and the second most expensive. This study also finds that smoothing and other data transformations can noticeably improve results of SLR, thus providing an efficient alternative to ARX.

IEEE Transactions on Dependable and Secure Computing | 2011

Efficient Fault Detection and Diagnosis in Complex Software Systems with Information-Theoretic Monitoring

Miao Jiang; Mohammad Ahmad Munawar; Thomas Reidemeister; Paul Ward

Management metrics of complex software systems exhibit stable correlations which can enable fault detection and diagnosis. Current approaches use specific analytic forms, typically linear, for modeling correlations. In practice, more complex nonlinear relationships exist between metrics. Moreover, most intermetric correlations form clusters rather than simple pairwise correlations. These clusters provide additional information and offer the possibility for optimization. In this paper, we address these issues by using Normalized Mutual Information (NMI) as a similarity measure to identify clusters of correlated metrics, without assuming any specific form for the metric relationships. We show how to apply the Wilcoxon Rank-Sum test on the entropy measures to detect errors in the system. We also present three diagnosis algorithms to locate faulty components: RatioScore, based on the Jaccard coefficient, SigScore, which incorporates knowledge of component dependencies, and BayesianScore, which uses Bayesian inference to assign a fault probability to each component. We evaluate our approach in the context of a complex enterprise application, and show that 1) stable, nonlinear correlations exist and can be captured with our approach; 2) we can detect a large fraction of faults with a low false positive rate (we detect up to 18 of the 22 faults we injected); and 3) we improve the diagnosis with our new diagnosis algorithms.

conference of the centre for advanced studies on collaborative research | 2009

Diagnosis of recurrent faults using log files

Thomas Reidemeister; Mohammad Ahmad Munawar; Miao Jiang; Paul Ward

Enterprise software systems (ESS) are becoming larger and increasingly complex. Failure in business-critical systems is expensive, leading to consequences such as loss of critical data, loss of sales, customer dissatisfaction, even law suits. Therefore, detecting failures and diagnosing their root-cause in a timely manner is essential. Many studies suggest that a large fraction of failures encountered in practice are recurrent (i.e., they have been seen before). Fast and accurate detection of these failures can accelerate problem determination, and thereby improve system reliability. To this effect, we explore machine learning techniques, including the Naïve Bayes classifier, partially-supervised learning, and decision trees (using C4.5), to automatically recognize symptoms of recurrent faults and to derive detection rules from samples of log data. This work focuses on log files, since they are readily available and they do not put any additional computational burden on the component generating the data. The methods explored in this work can aid the development of tools to assist support personnel in problem determination tasks. Instead of requiring the operators to manually define patterns for identifying recurrent problems, such tools can be trained using prior, solved and unsolved cases from existing support databases.

conference of the centre for advanced studies on collaborative research | 2008

Information-theoretic modeling for tracking the health of complex software systems

Miao Jiang; Mohammad Ahmad Munawar; Thomas Reidemeister; Paul A. S. Ward

Stable correlation models are effective in detecting errors in complex software systems. However, most studies assume a specific mathematical form, typically linear, for the underlying correlations. In practice, more complex non-linear relationships exist between metrics. Moreover, most inter-metric correlations form clusters rather than simple pairwise correlations. These clusters provide additional information for error detection and offer the possibility for optimization. We address these issues by adopting the Normalized Mutual Information as a similarity measure. We also employ the entropy of metrics in clusters to monitor system state. Our approach does not require learning specific correlation models, thus reducing computation overhead. We have implemented the proposed approach and show, through experiments with a multi-tier enterprise software system, that it is effective. Our evaluation shows that (i) stable non-linear correlations exist in practice; (ii) the entropy of system metrics in clusters can efficiently detect anomalies caused by faults and provide information for diagnosis; and (iii) we can detect errors which were not captured by previous linear-correlation approaches.

international symposium on parallel and distributed processing and applications | 2007

Leveraging many simple statistical models to adaptively monitor software systems

Mohammad Ahmad Munawar; Paul A. S. Ward

Self-managing systems require continuous monitoring to ensure correct operation. Detailed monitoring is often too costly to use in production. An alternative is adaptive monitoring, whereby monitoring is kept to a minimal level while the system behaves as expected, and the monitoring level is increased if a problem is suspected. To enable such an approach, we must model the system, both at a minimal level to ensure correct operation, and at a detailed level, to diagnose faulty components. To avoid the complexity of developing an explicit model based on the system structure, we employ simple statistical techniques to identify relationships in the monitored data. These relationships are used to characterize normal operation and identify problematic areas. We develop and evaluate a prototype for the adaptive monitoring of J2EE applications. We experiment with 29 different fault scenarios of three general types, and show that we are able to detect the presence of faults in 80% of cases, where all but one instance of non-detection is attributable to a single fault type. We are able to shortlist the faulty component in 65% of cases where anomalies are observed.

software engineering for adaptive and self managing systems | 2008

Monitoring multi-tier clustered systems with invariant metric relationships

Mohammad Ahmad Munawar; Michael Jiang; Paul Ward

To ensure high availability, self-managing systems require self-monitoring and a system model against which to analyze monitoring data. Characterizing relationships between system metrics has been shown to model simple multi-tier transaction systems effectively, enabling failure detection and fault diagnosis. In this paper we show how to extend this invariant metric-relationships approach to clustered multi-tier systems. We show through analysis and experimentation that naive application of the approach increases cost dramatically while reducing diagnosis accuracy. We demonstrate that randomization at the load balancer during the invariant-identification phase will improve diagnosis accuracy, though it neither completely eliminates the problem nor reduces the cost; indeed, it may increase the cost, as this approach will require a long learning phase to remove all accidental correlations. Finally, we argue that knowing the system structure is necessary to effectively apply invariants to the clustered environment.

high-assurance systems engineering | 2008

Detection and Diagnosis of Recurrent Faults in Software Systems by Invariant Analysis

Miao Jiang; Mohammad Ahmad Munawar; Thomas Reidemeister; Paul A. S. Ward

A correctly functioning enterprise-software system exhibits long-term, stable correlations between many of its monitoring metrics. Some of these correlations no longer hold when there is an error in the system, potentially enabling error detection and fault diagnosis. However, existing approaches are inefficient, requiring a large number of metrics to be monitored and ignoring the relative discriminative properties of different metric correlations. In enterprise-software systems, similar faults tend to reoccur. It is therefore possible to significantly improve existing correlation-analysis approaches by learning the effects of common recurrent faults on correlations. We present methods to determine the most significant correlations to track for efficient error detection, and the correlations that contribute the most to diagnosis accuracy. We apply machine learning to identify the relevant correlations, removing the need for manually configured correlation thresholds, as used in the prior approaches. We validate our work on a multi-tier enterprise-software system. We are able to detect and correctly diagnose 8 of 10 injected faults to within three possible causes, and to within two in 7 out of 8 cases. This compares favourably with the existing approaches whose diagnosis accuracy is 3 out of 10 to within 3 possible causes. We achieve a precision of at least 95%.

military communications conference | 2003

Parameterized neighborhood based flooding for ad hoc wireless networks

Vijay Dheap; Mohammad Ahmad Munawar; Sagar Naik; Paul Ward

Flooding is a simple routing technique that can be used to transmit data from one node to every other node in a network. The focus of this paper is to investigate improvements to flooding techniques used in ad hoc wireless networks. Recent work has focused on using topological information to reduce the number of broadcasts. The number of broadcasts necessary to flood the network was the major performance metric used to compare previous neighborhood-based flooding algorithms. We build upon this foundation by first presenting a parameterized neighborhood-based flooding (PNBF) algorithm, which provides a single platform for the performance comparison of various multihop neighborhood-based flooding algorithms. We also introduce and motivate the use of additional performance metrics, including total number of collisions and percentage of nodes that receive the message, for comparing flooding algorithms. An analysis is given of how different network properties, such as average node degree, communication patterns, affect the performance of the different neighborhood-based flooding algorithms. Our simulation results demonstrate that our algorithm is capable of handling a wide variety of situations where properties of ad hoc networks along with the relative importance of the performance criteria are taken into consideration.

Explore More