Featured Researches

Performance

Cultivating Software Performance in Cloud Computing

There exist multitudes of cloud performance metrics, including workload performance, application placement, software/hardware optimization, scalability, capacity, reliability, agility and so on. In this paper, we consider jointly optimizing the performance of the software applications in the cloud. The challenges lie in bringing a diversity of raw data into tidy data format, unifying performance data from multiple systems based on timestamps, and assessing the quality of the processed performance data. Even after verifying the quality of cloud performance data, additional challenges block optimizing cloud computing. In this paper, we identify the challenges of cloud computing from the perspectives of computing environment, data collection, performance analytics and production environment.

Read more
Performance

DINAMITE: A modern approach to memory performance profiling

Diagnosing and fixing performance problems on multicore machines with deep memory hierarchies is extremely challenging. Certain problems are best addressed when we can analyze the entire trace of program execution, e.g., every memory access. Unfortunately such detailed execution logs are very large and cannot be analyzed by direct inspection. We present DINAMITE: a toolkit for Dynamic INstrumentation and Analysis for MassIve Trace Exploration. DINAMITE is a collection of tools for end-to-end performance analysis: from the LLVM compiler pass that instruments the program to plug-and-play tools that use a modern data analytics engine Spark Streaming for trace introspection. Using DINAMITE we found opportunities to improve data layout in several applications that resulted in 15-20% performance improvements and found a shared-variable bottleneck in a popular key-value store, whose elimination improved performance by 20x.

Read more
Performance

DRESS: Dynamic RESource-reservation Scheme for Congested Data-intensive Computing Platforms

In the past few years, we have envisioned an increasing number of businesses start driving by big data analytics, such as Amazon recommendations and Google Advertisements. At the back-end side, the businesses are powered by big data processing platforms to quickly extract information and make decisions. Running on top of a computing cluster, those platforms utilize scheduling algorithms to allocate resources. An efficient scheduler is crucial to the system performance due to limited resources, e.g. CPU and Memory, and a large number of user demands. However, besides requests from clients and current status of the system, it has limited knowledge about execution length of the running jobs, and incoming jobs' resource demands, which make assigning resources a challenging task. If most of the resources are occupied by a long-running job, other jobs will have to keep waiting until it releases them. This paper presents a new scheduling strategy, named DRESS that particularly aims to optimize the allocation among jobs with various demands. Specifically, it classifies the jobs into two categories based on their requests, reserves a portion of resources for each of category, and dynamically adjusts the reserved ratio by monitoring the pending requests and estimating release patterns of running jobs. The results demonstrate DRESS significantly reduces the completion time for one category, up to 76.1% in our experiments, and in the meanwhile, maintains a stable overall system performance.

Read more
Performance

DV-DVFS: Merging Data Variety and DVFS Technique to Manage the Energy Consumption of Big Data Processing

Data variety is one of the most important features of Big Data. Data variety is the result of aggregating data from multiple sources and uneven distribution of data. This feature of Big Data causes high variation in the consumption of processing resources such as CPU consumption. This issue has been overlooked in previous works. To overcome the mentioned problem, in the present work, we used Dynamic Voltage and Frequency Scaling (DVFS) to reduce the energy consumption of computation. To this goal, we consider two types of deadlines as our constraint. Before applying the DVFS technique to computer nodes, we estimate the processing time and the frequency needed to meet the deadline. In the evaluation phase, we have used a set of data sets and applications. The experimental results show that our proposed approach surpasses the other scenarios in processing real datasets. Based on the experimental results in this paper, DV-DVFS can achieve up to 15% improvement in energy consumption.

Read more
Performance

Data-Unit-Size Distribution Model with Retransmitted Packet Size Preservation Property and Its Application to Goodput Analysis for Stop-and-Wait Protocol: Case of Independent Packet Losses

This paper proposes a data-unit-size distribution model to represent the retransmitted packet size preservation (RPSP) property in a scenario where independently lost packets are retransmitted by a stop-and-wait protocol. RPSP means that retransmitted packets with the same sequence number are equal in size to the packet of the original transmission, which is identical to the packet generated from a message through the segmentation function, namely, generated packet. Furthermore, we derive goodput formula using an approach to derive the data-unit-size distribution. We investigate the effect of RPSP on frame size distributions and goodput in a simple case when no collision happens over the bit-error prone wireless network equipped with IEEE 802.11 Distributed Coordination Function, which is a typical example of the stop-and-wait protocol. Numerical results show that the effect gets stronger as bit error rate increases and the maximum size of the generated packets is larger than the mean size for large enough packet retry limits because longer packets will be repeatedly corrupted and retransmitted more times as a result of RPSP.

Read more
Performance

Deadline-aware Scheduling for Maximizing Information Freshness in Industrial Cyber-Physical System

Age of Information is an interesting metric that captures the freshness of information in the underlying applications. It is a combination of both packets inter-arrival time and packet transmission delay. In recent times, advanced real-time systems rely on this metric for delivering status updates as timely as possible. This paper aims to accomplish optimal transmission scheduling policy to maintain the information freshness of real-time updates in the industrial cyber-physical systems. Here the coexistence of both cyber and physical units and their individual requirements to provide the quality of service is one of the critical challenges to handle. A greedy scheduling policy called deadline-aware highest latency first has been proposed for this purpose. This paper also gives the analytical proof of its optimality, and finally, the claim is validated by comparing the performance of our algorithm with other scheduling policies by extensive simulations.

Read more
Performance

Debian Package usage profiler for Debian based Systems

The embedded devices of today due to their CPU, RAM capabilities can run various Linux distributions but in most cases they are different from general purpose distributions as they are usually lighter and specific to the needs of that particular system. In this project, we share the problems associated in adopting a fully heavy-weight Debian based system like Ubuntu in embedded/automotive platforms and provide solutions to optimize them to identify unused/redundant content in the system. This helps developer to reduce the hefty general purpose distribution to an application specific distribution. The solution involves collecting usage data in the system in a non-invasive manner (to avoid any drop in performance) to suggest users the redundant, unused parts of the system that can be safely removed without impacting the system functionality.

Read more
Performance

Deep Learning at 15PF: Supervised and Semi-Supervised Classification for Scientific Data

This paper presents the first, 15-PetaFLOP Deep Learning system for solving scientific pattern classification problems on contemporary HPC architectures. We develop supervised convolutional architectures for discriminating signals in high-energy physics data as well as semi-supervised architectures for localizing and classifying extreme weather in climate data. Our Intelcaffe-based implementation obtains ∼ 2TFLOP/s on a single Cori Phase-II Xeon-Phi node. We use a hybrid strategy employing synchronous node-groups, while using asynchronous communication across groups. We use this strategy to scale training of a single model to ∼ 9600 Xeon-Phi nodes; obtaining peak performance of 11.73-15.07 PFLOP/s and sustained performance of 11.41-13.27 PFLOP/s. At scale, our HEP architecture produces state-of-the-art classification accuracy on a dataset with 10M images, exceeding that achieved by selections on high-level physics-motivated features. Our semi-supervised architecture successfully extracts weather patterns in a 15TB climate dataset. Our results demonstrate that Deep Learning can be optimized and scaled effectively on many-core, HPC systems.

Read more
Performance

Defence Efficiency

In order to automate actions, such as defences against network attacks, one needs to quantify their efficiency. This can subsequently be used in post-evaluation, learning, etc. In order to quantify the defence efficiency as a function of the impact of the defence and its total cost, we present several natural requirements from such a definition of efficiency and provide a natural definition that complies with these requirements. Next, we precisely characterize our definition of efficiency by the axiomatic approach; namely, we strengthen the original requirements from such a definition and prove that the given definition is the unique definition that satisfies those requirements. Finally, we generalize the definition to the case of any number of input variables in two natural ways, and compare these generalizations.

Read more
Performance

Defining Big Data Analytics Benchmarks for Next Generation Supercomputers

The design and construction of high performance computing (HPC) systems relies on exhaustive performance analysis and benchmarking. Traditionally this activity has been geared exclusively towards simulation scientists, who, unsurprisingly, have been the primary customers of HPC for decades. However, there is a large and growing volume of data science work that requires these large scale resources, and as such the calls for inclusion and investments in data for HPC have been increasing. So when designing a next generation HPC platform, it is necessary to have HPC-amenable big data analytics benchmarks. In this paper, we propose a set of big data analytics benchmarks and sample codes designed for testing the capabilities of current and next generation supercomputers.

Read more

Ready to get started?

Join us today