Juan F. Pérez | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Juan F. Pérez is active.

Explore More

Publication

Featured researches published by Juan F. Pérez.

Journal of Internet Services and Applications | 2014

Quality-of-service in cloud computing: modeling techniques and their applications

Danilo Ardagna; Giuliano Casale; Michele Ciavotta; Juan F. Pérez; Weikun Wang

Recent years have seen the massive migration of enterprise applications to the cloud. One of the challenges posed by cloud applications is Quality-of-Service (QoS) management, which is the problem of allocating resources to the application to guarantee a service level along dimensions such as performance, availability and reliability. This paper aims at supporting research in this area by providing a survey of the state of the art of QoS modeling approaches suitable for cloud systems. We also review and classify their early application to some decision-making problems arising in cloud QoS management.

symbolic and numeric algorithms for scientific computing | 2013

Assessing SLA Compliance from Palladio Component Models

Juan F. Pérez; Giuliano Casale

Service providers face the challenge of meeting service-level agreements (SLAs) under uncertainty on the application actual performance. The performance heavily depends on the characteristics of the hardware on which the application is deployed, on the application architecture, as well as on the user workload. Although many models have been proposed for the performance prediction of software applications, most of them focus on average measures, e.g., mean response times. However, SLAs are often set in terms of percentiles, such that a given portion of requests receive a predefined service level, e.g., 95% of the requests should face a response time of at most 10 ms. To enable the effective prediction of this type of measures, in this paper we use fluid models for the computation of the probability distribution of performance measures relevant for SLAs. Our models are automatically built from a Palladio Component Model (PCM) instance, thus allowing the SLA assessment directly from the PCM specification. This provides an scalable alternative for SLA assessment within the PCM framework, as currently this is supported by means of simulation only.

IEEE Transactions on Software Engineering | 2015

Estimating Computational Requirements in Multi-Threaded Applications

Juan F. Pérez; Giuliano Casale; Sergio Pacheco-Sanchez

Performance models provide effective support for managing quality-of-service (QoS) and costs of enterprise applications. However, expensive high-resolution monitoring would be needed to obtain key model parameters, such as the CPU consumption of individual requests, which are thus more commonly estimated from other measures. However, current estimators are often inaccurate in accounting for scheduling in multi-threaded application servers. To cope with this problem, we propose novel linear regression and maximum likelihood estimators. Our algorithms take as inputs response time and resource queue measurements and return estimates of CPU consumption for individual request types. Results on simulated and real application datasets indicate that our algorithms provide accurate estimates and can scale effectively with the threading levels.

modeling, analysis, and simulation on computer and telecommunication systems | 2013

An Offline Demand Estimation Method for Multi-threaded Applications

Juan F. Pérez; Sergio Pacheco-Sanchez; Giuliano Casale

Parameterizing performance models for multi-threaded enterprise applications requires finding the service rates offered by worker threads to the incoming requests. Statistical inference on monitoring data is here helpful to reduce the overheads of application profiling and to infer missing information. While linear regression of utilization data is often used to estimate service rates, it suffers erratic performance and also ignores a large part of application monitoring data, e.g., response times. Yet inference from other metrics, such as response times or queue-length samples, is complicated by the dependence on scheduling policies. To address these issues, we propose novel scheduling-aware estimation approaches for multi-threaded applications based on linear regression and maximum likelihood estimators. The proposed methods estimate demands from samples of the number of requests in execution in the worker threads at the admission instant of a new request. Validation results are presented on simulated and real application datasets for systems with multi-class requests, class switching, and admission control.

modeling, analysis, and simulation on computer and telecommunication systems | 2014

Assessing the Impact of Concurrent Replication with Canceling in Parallel Jobs

Zhan Qiu; Juan F. Pérez

Parallel job processing has become a key feature of many software applications, e.g., in scientific computing. Parallelization allows these applications to exploit large resource pools, such as cloud or grid data centers. However, a job composed of a large number of parallel tasks will suffer a failure if any of its tasks fail, requiring reprocessing and additional delays. In this paper, we explore the effect that the replication of parallel jobs has on the job reliability and response time, as well as on resource utilization. The replication mechanism consists of concurrently processing replicas, at either the job or the task level, retrieving the results of the replica that finishes first, if any, and canceling any remaining replica in process. We propose a stochastic model that explicitly considers parallel job processing, replication at both the job and the task level, and handles general arrival processes. We develop a numerically-efficient algorithm to solve large-scale instances of the model and compute key performance metrics. We observe that the task cancellation mechanism offers an effective way of limiting the increase in resource utilization, allowing the use of replicas that not only increase the job reliability, but have the potential to reduce the response times.

international conference on computer communications | 2015

Enhancing reliability and response times via replication in computing clusters

Zhan Qiu; Juan F. Pérez

Computing clusters have been widely deployed for scientific and engineering applications to support intensive computation and massive data operations. As applications and resources in a cluster are subject to failures, fault-tolerance strategies are commonly adopted, sometimes at the expense of additional delays in job response times, or unnecessarily increasing resource usage. In this paper, we explore concurrent replication with canceling, a fault-tolerance approach where jobs and their replicas are processed concurrently, and the successful completion of either triggers the removals of its replica. We propose a stochastic model to study how this approach affects the cluster service level objectives (SLOs), particularly the offered response time percentiles. In addition to the expected gains in reliability, the proposed model allows us to determine the regions of the utilization where introducing replication with canceling effectively reduces the response times. Moreover, we show how this model can support resource provisioning decisions with reliability and response time guarantees.

ieee/acm international symposium cluster, cloud and grid computing | 2015

Evaluating the Effectiveness of Replication for Tail-Tolerance

Zhan Qiu; Juan F. Pérez

Computing clusters (CC) are a cost-effective high-performance platform for computation-intensive scientific and engineering applications. A key challenge in managing CCs is to consistently achieve low response times. In particular, tail-tolerant methods aim to keep the tail of the response-time distribution short. In this paper we explore concurrent replication with cancelling, a tail-tolerant approach that involves processing requests and their replicas concurrently, retrieving the result from the first replica that completes, and cancelling all other replicas. We propose a stochastic model that considers any number of replicas, general processing and inter-arrival times, and computes the response time distribution. We show that replication can be very effective in keeping the response-time tail short, but these benefits highly depend on the processing-time distribution, as well as on the CC utilization and the statistical characteristics of the arrival process. We also exploit the model to support the selection of the optimal number of replicas, and a resource provisioning strategy that meets service-level objectives on the response-time percentiles.

workshop on software and performance | 2015

Towards a DevOps Approach for Software Quality Engineering

Juan F. Pérez; Weikun Wang; Giuliano Casale

DevOps is a novel trend in software engineering that aims at bridging the gap between development and operations, putting in particular the developer in greater control of deployment and application runtime. Here we consider the problem of designing a tool capable of providing feedback to the developer on the performance, reliability, and in general quality characteristics of the application at runtime. This raises a number of questions related to what measurement information should be carried back from runtime to design-time and what degrees of freedom should be provided to the developer in the evaluation of performance data. To answer these questions, we describe the design of a filling-the-gap (FG) tool, a software system capable of automatically analyzing performance data either directly or through statistical inference. A natural application of the FG tool is the continuous training of stochastic performance models, such as layered queueing networks, that can inform developers on how to refactor the software architecture.

ieee international conference computer and communications | 2016

Variability-aware request replication for latency curtailment

Zhan Qiu; Juan F. Pérez; Peter G. Harrison

Processing time variability is commonplace in distributed systems, where resources display disparate performance due to, e.g., different workload levels, background processes, and contention in virtualized environments. However, it is paramount for service providers to keep variability in response time under control in order to offer responsive services. We investigate how request replication can be used to exploit processing time variability to reduce response times, considering not only mean values but also the tail of the response time distribution. We focus on the distributed setup, where replication is achieved by running copies of requests on multiple servers that otherwise evolve independently, and waiting for the first replica to complete service. We construct models that capture the evolution of a system with replicated requests using approximate methods and observe that highly variable service times offer the best opportunities for replication - reducing the response time tail in particular. Further, the effect of replication is non-uniform over the response time distribution: gains in one metric, e.g., the mean, can be at the cost of another, e.g., the tail percentiles. This is demonstrated in wide range of numerical virtual experiments. It can be seen that capturing service time variability is key to the evaluation of latency tolerance strategies and in their design.

IEEE Transactions on Parallel and Distributed Systems | 2016

Evaluating Replication for Parallel Jobs: An Efficient Approach

Zhan Qiu; Juan F. Pérez

Many modern software applications rely on parallel job processing to exploit large resource pools available in cloud and grid infrastructures. The response time of a parallel job, made of many subtasks, is determined by the last subtask that finishes. Thus, a single laggard subtask or a failure, requiring re-processing, may increase the response time substantially. To overcome these issues, we explore concurrent replication with canceling. This mechanism executes two job replicas concurrently, and retrieves the result of the first replica that completes, immediately canceling the other one. To analyze this mechanism we propose a stochastic model that considers replication at both job-level and task-level. We find that task-level replication achieves a much higher reliability and shorter response times than job-level replication. We also observe that the impact of replication depends on the system utilization, the subtask reliability, and the correlation among replica failures. Based on the model, we propose a resource-provisioning strategy that determines the minimum number of computing nodes needed to achieve a service-level objective (SLO) defined as a response-time percentile. This strategy is evaluated by considering realistic traffic patterns from a parallel cluster, where task-level replication shows the potential to reduce the resource requirements for tight response-time SLOs.

Explore More