Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Zhan Qiu is active.

Publication


Featured researches published by Zhan Qiu.


modeling, analysis, and simulation on computer and telecommunication systems | 2014

Assessing the Impact of Concurrent Replication with Canceling in Parallel Jobs

Zhan Qiu; Juan F. Pérez

Parallel job processing has become a key feature of many software applications, e.g., in scientific computing. Parallelization allows these applications to exploit large resource pools, such as cloud or grid data centers. However, a job composed of a large number of parallel tasks will suffer a failure if any of its tasks fail, requiring reprocessing and additional delays. In this paper, we explore the effect that the replication of parallel jobs has on the job reliability and response time, as well as on resource utilization. The replication mechanism consists of concurrently processing replicas, at either the job or the task level, retrieving the results of the replica that finishes first, if any, and canceling any remaining replica in process. We propose a stochastic model that explicitly considers parallel job processing, replication at both the job and the task level, and handles general arrival processes. We develop a numerically-efficient algorithm to solve large-scale instances of the model and compute key performance metrics. We observe that the task cancellation mechanism offers an effective way of limiting the increase in resource utilization, allowing the use of replicas that not only increase the job reliability, but have the potential to reduce the response times.


international conference on computer communications | 2015

Enhancing reliability and response times via replication in computing clusters

Zhan Qiu; Juan F. Pérez

Computing clusters have been widely deployed for scientific and engineering applications to support intensive computation and massive data operations. As applications and resources in a cluster are subject to failures, fault-tolerance strategies are commonly adopted, sometimes at the expense of additional delays in job response times, or unnecessarily increasing resource usage. In this paper, we explore concurrent replication with canceling, a fault-tolerance approach where jobs and their replicas are processed concurrently, and the successful completion of either triggers the removals of its replica. We propose a stochastic model to study how this approach affects the cluster service level objectives (SLOs), particularly the offered response time percentiles. In addition to the expected gains in reliability, the proposed model allows us to determine the regions of the utilization where introducing replication with canceling effectively reduces the response times. Moreover, we show how this model can support resource provisioning decisions with reliability and response time guarantees.


ieee/acm international symposium cluster, cloud and grid computing | 2015

Evaluating the Effectiveness of Replication for Tail-Tolerance

Zhan Qiu; Juan F. Pérez

Computing clusters (CC) are a cost-effective high-performance platform for computation-intensive scientific and engineering applications. A key challenge in managing CCs is to consistently achieve low response times. In particular, tail-tolerant methods aim to keep the tail of the response-time distribution short. In this paper we explore concurrent replication with cancelling, a tail-tolerant approach that involves processing requests and their replicas concurrently, retrieving the result from the first replica that completes, and cancelling all other replicas. We propose a stochastic model that considers any number of replicas, general processing and inter-arrival times, and computes the response time distribution. We show that replication can be very effective in keeping the response-time tail short, but these benefits highly depend on the processing-time distribution, as well as on the CC utilization and the statistical characteristics of the arrival process. We also exploit the model to support the selection of the optimal number of replicas, and a resource provisioning strategy that meets service-level objectives on the response-time percentiles.


Performance Evaluation | 2015

Beyond the mean in fork-join queues

Zhan Qiu; Juan F. Prez; Peter G. Harrison

Fork-join queues are natural models for various computer and communications systems that involve parallel multitasking and the splitting and resynchronizing of data, such as parallel computing, query processing in distributed databases, and parallel disk access. Job response time in a fork-join queue is a critical performance indicator but its exact analysis is challenging. We introduce a stochastic model for K-node homogeneous fork-join queues (K2) that focuses on the difference in length between any node-queue and the shortest one, truncating the state space such that the maximum difference is at most a constant C. Whilst most previous methods focus on the mean response time, our model is also able to evaluate the response time distribution, as well as accommodating phase-type processing times and Markovian arrival processes. In order to tackle scenarios with high loads, which require a large value of C to provide sufficient accuracy, we develop an efficient algorithm using matrix-analytic methods. Tests against simulation show that the proposed model yields accurate results for 2-node fork-join queues. As the model becomes numerically intractable for large values of K, we further propose an approximate approach, based on properties of order statistics and extreme values. The approximation gives a high degree of accuracy on response time tails, and has the advantage of being efficient and scalable, requiring only the analytical results for a single-node and 2-node fork-join queues, which we obtain with the aforementioned matrix-analytic model. Comparison with simulation results shows that our approximation yields good fits for the tails, even in very large cases with general processing and inter-arrival times.


ieee international conference computer and communications | 2016

Variability-aware request replication for latency curtailment

Zhan Qiu; Juan F. Pérez; Peter G. Harrison

Processing time variability is commonplace in distributed systems, where resources display disparate performance due to, e.g., different workload levels, background processes, and contention in virtualized environments. However, it is paramount for service providers to keep variability in response time under control in order to offer responsive services. We investigate how request replication can be used to exploit processing time variability to reduce response times, considering not only mean values but also the tail of the response time distribution. We focus on the distributed setup, where replication is achieved by running copies of requests on multiple servers that otherwise evolve independently, and waiting for the first replica to complete service. We construct models that capture the evolution of a system with replicated requests using approximate methods and observe that highly variable service times offer the best opportunities for replication - reducing the response time tail in particular. Further, the effect of replication is non-uniform over the response time distribution: gains in one metric, e.g., the mean, can be at the cost of another, e.g., the tail percentiles. This is demonstrated in wide range of numerical virtual experiments. It can be seen that capturing service time variability is key to the evaluation of latency tolerance strategies and in their design.


European Workshop on Performance Engineering | 2013

Performance Enhancement by Means of Task Replication

Peter G. Harrison; Zhan Qiu

In order for systems in which tasks may fail to be fault-tolerant, traditional methods deploy multiple servers as replicas to perform the same task. Further, in real time systems, computations have to meet strict time-constraints, a delayed output being unacceptable, even if correct. The effectiveness of sending task-replicas to multiple servers simultaneously, and using the results from whichever one responds first, is considered in this paper as a means of reducing response time and improving fault-tolerance. Once a request completes execution in one server successfully, it immediately cancels (kills) its replicas that remain at other servers. We assume a Markovian system and use the generating function method to determine the Laplace transform of the response time probability distribution, jointly with the probability that not all replicas fail, in the case of two replicas. When the failure rate of each task is greater than the service rate of the server, we make the approximation that the queues are independent, each with geometric queue length probability distributions at equilibrium. We compare our approximation with simulation results as well as with the exact solution in a truncated state space and find that for failure rates in that region, the approximation is generally good. At lower failure rates, the method of spectral expansion provides an excellent approximation in a truncated, multi-mode, two-dimensional Markov process.


IEEE Transactions on Parallel and Distributed Systems | 2016

Evaluating Replication for Parallel Jobs: An Efficient Approach

Zhan Qiu; Juan F. Pérez

Many modern software applications rely on parallel job processing to exploit large resource pools available in cloud and grid infrastructures. The response time of a parallel job, made of many subtasks, is determined by the last subtask that finishes. Thus, a single laggard subtask or a failure, requiring re-processing, may increase the response time substantially. To overcome these issues, we explore concurrent replication with canceling. This mechanism executes two job replicas concurrently, and retrieves the result of the first replica that completes, immediately canceling the other one. To analyze this mechanism we propose a stochastic model that considers replication at both job-level and task-level. We find that task-level replication achieves a much higher reliability and shorter response times than job-level replication. We also observe that the impact of replication depends on the system utilization, the subtask reliability, and the correlation among replica failures. Based on the model, we propose a resource-provisioning strategy that determines the minimum number of computing nodes needed to achieve a service-level objective (SLO) defined as a response-time percentile. This strategy is evaluated by considering realistic traffic patterns from a parallel cluster, where task-level replication shows the potential to reduce the resource requirements for tight response-time SLOs.


international conference on computer communications | 2017

Power of redundancy: Designing partial replication for multi-tier applications

Robert Birke; Juan F. Pérez; Zhan Qiu; Mathias Bjorkqvist; Lydia Y. Chen

Replicating redundant requests has been shown to be an effective mechanism to defend application performance from high capacity variability — the common pitfall in the cloud. While the prior art centers on single-tier systems, it still remains an open question how to design replication strategies for distributed multi-tier systems, where interference from neighboring workloads is entangled with complex tier interdependency. In this paper, we design a first of its kind PArtial REplication system, sPARE, that replicates and dispatches read-only workloads for multi-tier web applications, determining replication factors per tier. The two key components of sPARE are (i) the variability-aware replicator that coordinates the replication levels on all tiers via an iterative searching algorithm, and (ii) the replication-aware arbiter that uses a novel token-based arbitration algorithm (TAD) to dispatch requests in each tier. We evaluate sPARE on web serving and web searching applications, i.e., MediaWiki and Solr, deployed on our private cloud testbed. Our results based on various interference patterns and traffic loads show that sPARE is able to improve the tail latency of MediaWiki and Solr by a factor of almost 2.7x and 2.9x, respectively.


international conference on performance engineering | 2016

Tackling Latency via Replication in Distributed Systems

Zhan Qiu; Juan F. Pérez; Peter G. Harrison

Consistently high reliability and low latency are twin requirements common to many forms of distributed processing; for example, server farms and mirrored storage access. To address them, we consider replication of requests with canceling - i.e. initiate multiple concurrent replicas of a request and use the first successful result returned, canceling all outstanding replicas. This scheme has been studied recently, but mostly for systems with a single central queue, while server farms exploit distributed resources for scalability and robustness. We develop an approximate stochastic model to determine the response-time distribution in a system with distributed queues, and compare its performance against its centralized counterpart. Validation against simulation indicates that our model is accurate for not only the mean response time but also its percentiles, which are particularly relevant for deadline-driven applications. Further, we show that in the distributed set-up, replication with canceling has the potential to reduce response times, even at relatively high utilization. We also find that it offers response times close to those of the centralized system, especially at medium-to-high request reliability. These findings support the use of replication with canceling as an effective mechanism for both fault- and delay-tolerance.


international conference on computer communications | 2016

Defeating variability in cloud applications by multi-tier workload redundancy.

Robert Birke; Zhan Qiu; Juan F. Pérez; Lydia Y. Chen

Workload redundancy emerges as an effective method to guarantee quality of service (QoS) targets, especially tail latency, in environments with strong capacity variability such as clouds. Nevertheless mostly single-tier replication strategies have been studied, while multi-tier architectures are a very popular choice in cloud deployed applications. In this poster we raise the awareness on the challenges of devising a multi-tier replication strategy and its possible benefits through a 2.3X tail latency improvement on an exemplary real-world application.

Collaboration


Dive into the Zhan Qiu's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Juan F. Prez

Imperial College London

View shared research outputs
Researchain Logo
Decentralizing Knowledge