Ioana Giurgiu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ioana Giurgiu is active.

Explore More

Publication

Featured researches published by Ioana Giurgiu.

dependable systems and networks | 2014

Failure Analysis of Virtual and Physical Machines: Patterns, Causes and Characteristics

Robert Birke; Ioana Giurgiu; Lydia Y. Chen; Dorothea Wiesmann; Ton Engbersen

In todays commercial data centers, the computation density grows continuously as the number of hardware components and workloads in units of virtual machines increase. The service availability guaranteed by data centers heavily depends on the reliability of the physical and virtual servers. In this study, we conduct an analysis on 10K virtual and physical machines hosted on five commercial data centers over an observation period of one year. Our objective is to establish a sound understanding of the differences and similarities between failures of physical and virtual machines. We first capture their failure patterns, i.e., the failure rates, the distributions of times between failures and of repair times, as well as, the time and space dependency of failures. Moreover, we correlate failures with the resource capacity and run-time usage to identify the characteristics of failing servers. Finally, we discuss how virtual machine management actions, i.e., consolidation and on/off frequency, impact virtual machine failures.

conference on network and service management | 2013

Classifying server behavior and predicting impact of modernization actions

Jasmina Bogojeska; David Lanyi; Ioana Giurgiu; George E. Stark; Dorothea Wiesmann

Today the decision of when to modernize which elements of the server HW/SW stack is often done manually based on simple business rules. In this paper we alleviate this problem by supporting the decision process with an automated approach based on incident tickets and server attributes data. As a first step we identify and rank servers with problematic behavior as candidates for modernization using a random forest classifier. Second, this predictive model is used to evaluate the impact of different modernization actions and suggest the most effective ones. We show that our chosen model yields high quality predictions and outperforms traditional linear regression models on a large set of real data.

knowledge discovery and data mining | 2016

Predicting Disk Replacement towards Reliable Data Centers

Mirela Botezatu; Ioana Giurgiu; Jasmina Bogojeska; Dorothea Wiesmann

Disks are among the most frequently failing components in todays IT environments. Despite a set of defense mechanisms such as RAID, the availability and reliability of the system are still often impacted severely. In this paper, we present a highly accurate SMART-based analysis pipeline that can correctly predict the necessity of a disk replacement even 10-15 days in advance. Our method has been built and evaluated on more than 30000 disks from two major manufacturers, monitored over 17 months. Our approach employs statistical techniques to automatically detect which SMART parameters correlate with disk replacement and uses them to predict the replacement of a disk with even 98% accuracy.

knowledge discovery and data mining | 2015

Multi-View Incident Ticket Clustering for Optimal Ticket Dispatching

Mirela Botezatu; Jasmina Bogojeska; Ioana Giurgiu; Hagen Voelzer; Dorothea Wiesmann

We present a novel technique that optimizes the dispatching of incident tickets to the agents in an IT Service Support Environment. Unlike the common skill-based dispatching, our approach also takes empirical evidence on the agents speed from historical data into account. Our solution consists of two parts. First, a novel technique clusters historic tickets into incident categories that are discriminative in terms of agents performance. Second, a dispatching policy selects, for an incoming ticket, the fastest available agent according to the target cluster. We show that, for ticket data collected from several Service Delivery Units, our new dispatching technique can reduce service time between

network operations and management symposium | 2014

Impact of HW and OS type and currency on server availability derived from problem ticket analysis

Jasmina Bogojeska; Ioana Giurgiu; David Lanyi; George E. Stark; Dorothea Wiesmann

35\%

conference on information and knowledge management | 2015

Comprehensible Models for Reconfiguring Enterprise Relational Databases to Avoid Incidents

Ioana Giurgiu; Mirela Botezatu; Dorothea Wiesmann

and

integrated network management | 2015

Do you know how to configure your enterprise relational database to reduce incidents

Ioana Giurgiu; Adela-Diana Almasi; Dorothea Wiesmann

44\%

Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference on Industrial Track | 2017

Predicting DRAM reliability in the field with machine learning

Ioana Giurgiu; Jacint Szabo; Dorothea Wiesmann; John J. Bird

Ibm Journal of Research and Development | 2017

On the adoption and impact of predictive analytics for server incident reduction

Ioana Giurgiu; Dorothea Wiesmann; Jasmina Bogojeska; David Lanyi; George E. Stark; Rodney B. Wallace; M. M. Pereira; A. A. Hidalgo

Technology refresh is an important component in data center management. The goal of this paper is to assess the impact of HW and OS currency on server availability based on a large set of incident tickets and server attributes data collected from several different IT environments. In order to achieve this we first identify the server failure incidents using a machine learning method for automatic ticket classification. Then we conduct the data analysis to inspect the impact of HW and OS type along with their currency on the rates of server failures. This can further be used to derive guidelines to support the technology refresh decisions in the data centers.

cluster computing and the grid | 2014

Analysis of Labor Efforts and their Impact Factors to Solve Server Incidents in Datacenters

Ioana Giurgiu; Jasmina Bogojeska; Sergii Nikolaiev; George E. Stark; Dorothea Wiesmann

Configuring enterprise database management systems is a notoriously hard problem. The combinatorial parameter space makes it intractable to run and observe the DBMS behavior in all scenarios. Thus, the database administrator has the difficult task of choosing DBMS configurations that potentially lead to critical incidents, thus hindering its availability or performance. We propose using machine learning to understand how configuring a DBMS can lead to such high risk incidents. We collect historical data from three IT environments that run both IBM DB2 and Oracle DBMS. Then, we implement several linear and non-linear multivariate models to identify and learn from high risk configurations. We analyze their performance, in terms of accuracy, cost, generalization and interpretability. Results show that high risk configurations can be identified with extremely high accuracy and that the database administrator can potentially benefit from the rules extracted to reconfigure in order to prevent incidents.

Explore More