Computer Science Performance - Researchain

Featured Researches

An approach to define Very High Capacity Networks with improved quality at an affordable cost

This paper aims to propose one possible approach in the setting of VHCNs (Very High Capacity Networks) performance targets that should be capable of promoting efficient investments for operators and, at the same time, improving the benefits for end-users. To this aim, we suggest relying on some specific KPIs (Key Performance Indicators), especially throughput - i.e., the bandwidth as perceived by the customer - valid at the application layer, instead of the physical layer data-rate. In this regard, the paper underlines that the bandwidth perceived is strictly linked to the latency. The most important implication is that some of the most demanding services envisaged for the future (e.g., mobile virtual and augmented reality, tactile internet) cannot be met by merely increasing the low-level protocol data-rate. Therefore, for the VHCNs reducing latency through Edge Cloud Computing (ECC) is a mandatory pre-requisite.

Performance

An infinite-server queueing model MMAPkGk in semi-Markov random environment with marked MAP arrival and subject to catastrophes

In the present paper the infinite-server MMAPkGk queueing model with random resource vector of customers, marked MAP arrival and semi-Markov (SM) arrival of catastrophes is considered. The joint generating functions (PGF) of transient and stationary distributions of number of busy servers and numbers of different types served customers, as well as Laplace transformations (LT) of joint distributions of total accumulated resources in the model at moment and total accumulated resources of served customers during time interval are found. The basic differential and renewal equations for transient and stationary PGF of queue sizes of customers are found.

Performance

An optimal scheduling architecture for accelerating batch algorithms on Neural Network processor architectures

In neural network topologies, algorithms are running on batches of data tensors. The batches of data are typically scheduled onto the computing cores which execute in parallel. For the algorithms running on batches of data, an optimal batch scheduling architecture is very much needed by suitably utilizing hardware resources - thereby resulting in significant reduction training and inference time. In this paper, we propose to accelerate the batch algorithms for neural networks through a scheduling architecture enabling optimal compute power utilization. The proposed optimal scheduling architecture can be built into HW or can be implemented in SW alone which can be leveraged for accelerating batch algorithms. The results demonstrate that the proposed architecture speeds up the batch algorithms compared to the previous solutions. The proposed idea applies to any HPC architecture meant for neural networks.

Performance

Analysis of Interference between RDMA and Local Access on Hybrid Memory System

We can use a hybrid memory system consisting of DRAM and Intel Optane DC Persistent Memory (We call it DCPM in this paper) as DCPM is now commercially available since April 2019. Even if the latency for DCPM is several times higher than that for DRAM, the capacity for DCPM is several times higher than that for DRAM and the cost of DCPM is also several times lower than that for DRAM. In addition, DCPM is non-volatile. A Server with this hybrid memory system could improve the performance for in-memory database systems and virtual machine (VM) systems because these systems often consume a large amount of memory. Moreover, a high-speed shared storage system can be implemented by accessing DCPM via remote direct memory access (RDMA). I assume that some of the DCPM is often assigned as a shared area among other remote servers because applications executed on a server with a hybrid memory system often cannot use the entire capacity of DCPM. This paper evaluates the interference between local memory access and RDMA from a remote server. As a result, I indicate that the interference on this hybrid memory system is significantly different from that on a conventional DRAM-only memory system. I also believe that some kind of throttling implementation is needed when this interference occures.

Performance

Analysis of an M/G/1 system for the optimization of the RTG performances in the delivery of containers in Abidjan Terminal

In front of the major challenges to increase its productivity while satisfying its customer, it is today important to establish in advance the operational performances of the RTG Abidjan Terminal. In this article, by using an M/G/1 retrial queue system, we obtained the average number of parked delivery trucks and as well as their waiting time. Finally, we used Matlab to represent them graphically then analyze the RTG performances according to the traffic rate.

Performance

Analysis of the Leakage Queue: A Queueing Model for Energy Storage Systems with Self-discharge

Energy storage is a crucial component of the smart grid, since it provides the ability to buffer transient fluctuations of the energy supply from renewable sources. Even without a load, energy storage systems experience a reduction of the stored energy through self-discharge. In some storage technologies, the rate of self-discharge can exceed 50% of the stored energy per day. In this paper, we investigate the self-discharge phenomenon in energy storage using a queueing system model, which we refer to as leakage queue. When the average net charge is positive, we discover that the leakage queue operates in one of two regimes: a leakage-dominated regime and a capacity-dominated regime. We find that in the leakage-dominated regime, the stored energy stabilizes at a point that is below the storage capacity. Under suitable independence assumptions for energy supply and demand, the stored energy in this regime closely follows a normal distribution. We present two methods for computing probabilities of underflow and overflow at a leakage queue. The methods are validated in a numerical example where the energy supply resembles a wind energy source.

Performance

Analytic Performance Modeling and Analysis of Detailed Neuron Simulations

Big science initiatives are trying to reconstruct and model the brain by attempting to simulate brain tissue at larger scales and with increasingly more biological detail than previously thought possible. The exponential growth of parallel computer performance has been supporting these developments, and at the same time maintainers of neuroscientific simulation code have strived to optimally and efficiently exploit new hardware features. Current state of the art software for the simulation of biological networks has so far been developed using performance engineering practices, but a thorough analysis and modeling of the computational and performance characteristics, especially in the case of morphologically detailed neuron simulations, is lacking. Other computational sciences have successfully used analytic performance engineering and modeling methods to gain insight on the computational properties of simulation kernels, aid developers in performance optimizations and eventually drive co-design efforts, but to our knowledge a model-based performance analysis of neuron simulations has not yet been conducted. We present a detailed study of the shared-memory performance of morphologically detailed neuron simulations based on the Execution-Cache-Memory (ECM) performance model. We demonstrate that this model can deliver accurate predictions of the runtime of almost all the kernels that constitute the neuron models under investigation. The gained insight is used to identify the main governing mechanisms underlying performance bottlenecks in the simulation. The implications of this analysis on the optimization of neural simulation software and eventually co-design of future hardware architectures are discussed. In this sense, our work represents a valuable conceptual and quantitative contribution to understanding the performance properties of biological networks simulations.

Performance

Analytical Cost Metrics : Days of Future Past

As we move towards the exascale era, the new architectures must be capable of running the massive computational problems efficiently. Scientists and researchers are continuously investing in tuning the performance of extreme-scale computational problems. These problems arise in almost all areas of computing, ranging from big data analytics, artificial intelligence, search, machine learning, virtual/augmented reality, computer vision, image/signal processing to computational science and bioinformatics. With Moore's law driving the evolution of hardware platforms towards exascale, the dominant performance metric (time efficiency) has now expanded to also incorporate power/energy efficiency. Therefore, the major challenge that we face in computing systems research is: "how to solve massive-scale computational problems in the most time/power/energy efficient manner?" The architectures are constantly evolving making the current performance optimizing strategies less applicable and new strategies to be invented. The solution is for the new architectures, new programming models, and applications to go forward together. Doing this is, however, extremely hard. There are too many design choices in too many dimensions. We propose the following strategy to solve the problem: (i) Models - Develop accurate analytical models (e.g. execution time, energy, silicon area) to predict the cost of executing a given program, and (ii) Complete System Design - Simultaneously optimize all the cost models for the programs (computational problems) to obtain the most time/area/power/energy efficient solution. Such an optimization problem evokes the notion of codesign.

Performance

Analytical Performance Modeling of NoCs under Priority Arbitration and Bursty Traffic

Networks-on-Chip (NoCs) used in commercial many-core processors typically incorporate priority arbitration. Moreover, they experience bursty traffic due to application workloads. However, most state-of-the-art NoC analytical performance analysis techniques assume fair arbitration and simple traffic models. To address these limitations, we propose an analytical modeling technique for priority-aware NoCs under bursty traffic. Experimental evaluations with synthetic and bursty traffic show that the proposed approach has less than 10% modeling error with respect to cycle-accurate NoC simulator.

Performance

Analytical Performance Models for NoCs with Multiple Priority Traffic Classes

Networks-on-chip (NoCs) have become the standard for interconnect solutions in industrial designs ranging from client CPUs to many-core chip-multiprocessors. Since NoCs play a vital role in system performance and power consumption, pre-silicon evaluation environments include cycle-accurate NoC simulators. Long simulations increase the execution time of evaluation frameworks, which are already notoriously slow, and prohibit design-space exploration. Existing analytical NoC models, which assume fair arbitration, cannot replace these simulations since industrial NoCs typically employ priority schedulers and multiple priority classes. To address this limitation, we propose a systematic approach to construct priority-aware analytical performance models using micro-architecture specifications and input traffic. Our approach consists of developing two novel transformations of queuing system and designing an algorithm which iteratively uses these two transformations to estimate end-to-end latency. Our approach decomposes the given NoC into individual queues with modified service time to enable accurate and scalable latency computations. Specifically, we introduce novel transformations along with an algorithm that iteratively applies these transformations to decompose the queuing system. Experimental evaluations using real architectures and applications show high accuracy of 97% and up to 2.5x speedup in full-system simulation.

Ready to get started?

Join us today

Archive Your Research