Featured Researches

Performance

Anonymity Mixes as (Partial) Assembly Queues: Modeling and Analysis

Anonymity platforms route the traffic over a network of special routers that are known as mixes and implement various traffic disruption techniques to hide the communicating users' identities. Batch mixes in particular anonymize communicating peers by allowing message exchange to take place only after a sufficient number of messages (a batch) accumulate, thus introducing delay. We introduce a queueing model for batch mix and study its delay properties. Our analysis shows that delay of a batch mix grows quickly as the batch size gets close to the number of senders connected to the mix. We then propose a randomized batch mixing strategy and show that it achieves much better delay scaling in terms of the batch size. However, randomization is shown to reduce the anonymity preserving capabilities of the mix. We also observe that queueing models are particularly useful to study anonymity metrics that are more practically relevant such as the time-to-deanonymize metric.

Read more
Performance

Application of the Computer Capacity to the Analysis of Processors Evolution

The notion of computer capacity was proposed in 2012, and this quantity has been estimated for computers of different kinds. In this paper we show that, when designing new processors, the manufacturers change the parameters that affect the computer capacity. This allows us to predict the values of parameters of future processors. As the main example we use Intel processors, due to the accessibility of detailed description of all their technical characteristics.

Read more
Performance

Approximate Solution Approach and Performability Evaluation of Large Scale Beowulf Clusters

Beowulf clusters are very popular and deployed worldwide in support of scientific computing, because of the high computational power and performance. However, they also pose several challenges, and yet they need to provide high availability. The practical large-scale Beowulf clusters result in unpredictable, fault-tolerant, often detrimental outcomes. Successful development of high performance in storing and processing huge amounts of data in large-scale clusters necessitates accurate quality of service (QoS) evaluation. This leads to develop as well as design, analytical models to understand and predict of complex system behaviour in order to ensure availability of large-scale systems. Exact modelling of such clusters is not feasible due to the nature of the large scale nodes and the diversity of user requests. An analytical model for QoS of large-scale server farms and solution approaches are necessary. In this paper, analytical modelling of large-scale Beowulf clusters is considered together with availability issues. A generic and flexible approximate solution approach is developed to handle large number of nodes for performability evaluation. The proposed analytical model and the approximate solution approach provide flexibility to evaluate the QoS measurements for such systems. In order to show the efficacy and the accuracy of the proposed approach, the results obtained from the analytical model are validated with the results obtained from the discrete event simulations.

Read more
Performance

Approximation of LRU Caches Miss Rate: Application to Power-law Popularities

Building on the 1977 pioneering work of R. Fagin, we give a closed-form expression for the approximated Miss Rate (MR) of LRU Caches assuming a power-law popularity. Asymptotic behavior of this expression is an already known result when power-law parameter is above 1. It is extended to any value of the parameter. In addition, we bring a new analysis of the conditions (cache relative size, popularity parameter) under which the ratio of LRU MR to Static MR is worst-case.

Read more
Performance

Approximations and Bounds for (n, k) Fork-Join Queues: A Linear Transformation Approach

Compared to basic fork-join queues, a job in (n, k) fork-join queues only needs its k out of all n sub-tasks to be finished. Since (n, k) fork-join queues are prevalent in popular distributed systems, erasure coding based cloud storages, and modern network protocols like multipath routing, estimating the sojourn time of such queues is thus critical for the performance measurement and resource plan of computer clusters. However, the estimating keeps to be a well-known open challenge for years, and only rough bounds for a limited range of load factors have been given. In this paper, we developed a closed-form linear transformation technique for jointly-identical random variables: An order statistic can be represented by a linear combination of maxima. This brand-new technique is then used to transform the sojourn time of non-purging (n, k) fork-join queues into a linear combination of the sojourn times of basic (k, k), (k+1, k+1), ..., (n, n) fork-join queues. Consequently, existing approximations for basic fork-join queues can be bridged to the approximations for non-purging (n, k) fork-join queues. The uncovered approximations are then used to improve the upper bounds for purging (n, k) fork-join queues. Simulation experiments show that this linear transformation approach is practiced well for moderate n and relatively large k.

Read more
Performance

Architecture-Aware Configuration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors

Asymmetric multicore processors (AMPs) have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for low-power high performance computing, this type of architectures is also being investigated as a means to improve the throughput-per-Watt of complex scientific applications. In this paper, we design and embed several architecture-aware optimizations into a multi-threaded general matrix multiplication (gemm), a key operation of the BLAS, in order to obtain a high performance implementation for ARM big.LITTLE AMPs. Our solution is based on the reference implementation of gemm in the BLIS library, and integrates a cache-aware configuration as well as asymmetric--static and dynamic scheduling strategies that carefully tune and distribute the operation's micro-kernels among the big and LITTLE cores of the target processor. The experimental results on a Samsung Exynos 5422, a system-on-chip with ARM Cortex-A15 and Cortex-A7 clusters that implements the big.LITTLE model, expose that our cache-aware versions of gemm with asymmetric scheduling attain important gains in performance with respect to its architecture-oblivious counterparts while exploiting all the resources of the AMP to deliver considerable energy efficiency.

Read more
Performance

Asymptotic Miss Ratio of LRU Caching with Consistent Hashing

To efficiently scale data caching infrastructure to support emerging big data applications, many caching systems rely on consistent hashing to group a large number of servers to form a cooperative cluster. These servers are organized together according to a random hash function. They jointly provide a unified but distributed hash table to serve swift and voluminous data item requests. Different from the single least-recently-used (LRU) server that has already been extensively studied, theoretically characterizing a cluster that consists of multiple LRU servers remains yet to be explored. These servers are not simply added together; the random hashing complicates the behavior. To this end, we derive the asymptotic miss ratio of data item requests on a LRU cluster with consistent hashing. We show that these individual cache spaces on different servers can be effectively viewed as if they could be pooled together to form a single virtual LRU cache space parametrized by an appropriate cache size. This equivalence can be established rigorously under the condition that the cache sizes of the individual servers are large. For typical data caching systems this condition is common. Our theoretical framework provides a convenient abstraction that can directly apply the results from the simpler single LRU cache to the more complex LRU cluster with consistent hashing.

Read more
Performance

Asymptotic Performance Evaluation of Battery Swapping and Charging Station for Electric Vehicles

A battery swapping and charging station (BSCS) is an energy refueling station, where i) electric vehicles (EVs) with depleted batteries (DBs) can swap their DBs for fully-charged ones, and ii) the swapped DBs are then charged until they are fully-charged. Successful deployment of a BSCS system necessitates a careful planning of swapping- and charging-related infrastructures, and thus a comprehensive performance evaluation of the BSCS is becoming crucial. This paper studies such a performance evaluation problem with a novel mixed queueing network (MQN) model and validates this model with extensive numerical simulation. We adopt the EVs' blocking probability as our quality-of-service measure and focus on studying the impact of the key parameters of the BSCS (e.g., the numbers of parking spaces, swapping islands, chargers, and batteries) on the blocking probability. We prove a necessary and sufficient condition for showing the ergodicity of the MQN when the number of batteries approaches infinity, and further prove that the blocking probability has two different types of asymptotic behaviors. Meanwhile, for each type of asymptotic behavior, we analytically derive the asymptotic lower bound of the blocking probability.

Read more
Performance

Asymptotically Optimal Load Balancing in Large-scale Heterogeneous Systems with Multiple Dispatchers

We consider the load balancing problem in large-scale heterogeneous systems with multiple dispatchers. We introduce a general framework called Local-Estimation-Driven (LED). Under this framework, each dispatcher keeps local (possibly outdated) estimates of queue lengths for all the servers, and the dispatching decision is made purely based on these local estimates. The local estimates are updated via infrequent communications between dispatchers and servers. We derive sufficient conditions for LED policies to achieve throughput optimality and delay optimality in heavy-traffic, respectively. These conditions directly imply delay optimality for many previous local-memory based policies in heavy traffic. Moreover, the results enable us to design new delay optimal policies for heterogeneous systems with multiple dispatchers. Finally, the heavy-traffic delay optimality of the LED framework directly resolves a recent open problem on how to design optimal load balancing schemes using delayed information.

Read more
Performance

Asymptotics of Insensitive Load Balancing and Blocking Phases

We address the problem of giving robust performance bounds based on the study of the asymptotic behavior of the insensitive load balancing schemes when the number of servers and the load scales jointly. These schemes have the desirable property that the stationary distribution of the resulting stochastic network depends on the distribution of job sizes only through its mean. It was shown that they give good estimates of performance indicators for systems with finite buffers, generalizing henceforth Erlang's formula whereas optimal policies are already theoretically and computationally out of reach for networks of moderate size. We study a single class of traffic acting on a symmetric set of processor sharing queues with finite buffers and we consider the case where the load scales with the number of servers. We characterize central limit theorems and large deviations, the response of symmetric systems under those schemes at different scales and show that three amplitudes of deviations can be identified. A central limit scaling takes place for a sub-critical load; for ρ=1 , the number of free servers scales like n θ θ+1 ( θ being the buffer depth and n being the number of servers) and is of order 1 for super-critical loads. This further implies the existence of different phases for the blocking probability, Before a (refined) critical load ρ c (n)=1−a n − θ θ+1 , the blocking is exponentially small and becomes of order n − θ θ+1 at ρ c (n) . This generalizes the well-known Quality and Efficiency Driven (QED) regime or Halfin-Whitt regime for a one-dimensional queue, and leads to a generalized staffing rule for a given target blocking probability.

Read more

Ready to get started?

Join us today