Microprocess. Microsystems | 2019
Exploring operational profiles and anomalies in computer performance logs
Abstract
Abstract Operational/functional problems in computer systems can be identified by monitoring and exploring performance metrics. These metrics can also be used to evaluate system activity profiles and manage relevant infrastructure (hardware and software). The critical point is finding features that make it possible to distinguish normal from abnormal system behaviour and to reveal emerging trends. This paper proposes a systematic methodology for deriving such features based on diverse observation perspectives in time (direct and aggregated) and defined specific data objects. We introduce a novel data model which combines collected samples into higher level observation objects (pulses and their compositions). This model is supported with original analysis algorithms for evaluating system behaviour. These provide useful sample/object statistics significantly enhanced with derived correlation formulas and periodicity properties. Compared with classical approaches, our model assures deeper and more accurate insight into system operation under real workload conditions as it uses new evaluation metrics and a wider scope of observation features (e.g. those related to pulse objects). In the paper, we perform exploratory studies covering real performance logs from several university servers. The results are interpreted in light of various statistical data views, including multidimensional correlation findings covering performance, event and process logs.