Badriddine Khessib | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Badriddine Khessib is active.

Explore More

Publication

Featured researches published by Badriddine Khessib.

dependable systems and networks | 2014

Characterizing Application Memory Error Vulnerability to Optimize Datacenter Cost via Heterogeneous-Reliability Memory

Yixin Luo; Sriram Govindan; Bikash Sharma; Mark Santaniello; Justin Meza; Aman Kansal; Jie Liu; Badriddine Khessib; Kushagra Vaid; Onur Mutlu

Memory devices represent a key component of datacenter total cost of ownership (TCO), and techniques used to reduce errors that occur on these devices increase this cost. Existing approaches to providing reliability for memory devices pessimistically treat all data as equally vulnerable to memory errors. Our key insight is that there exists a diverse spectrum of tolerance to memory errors in new data-intensive applications, and that traditional one-size-fits-all memory reliability techniques are inefficient in terms of cost. For example, we found that while traditional error protection increases memory system cost by 12.5%, some applications can achieve 99.00% availability on a single server with a large number of memory errors without any error protection. This presents an opportunity to greatly reduce server hardware cost by provisioning the right amount of memory reliability for different applications. Toward this end, in this paper, we make three main contributions to enable highly-reliable servers at low datacenter cost. First, we develop a new methodology to quantify the tolerance of applications to memory errors. Second, using our methodology, we perform a case study of three new dataintensive workloads (an interactive web search application, an in-memory key -- value store, and a graph mining framework) to identify new insights into the nature of application memory error vulnerability. Third, based on our insights, we propose several new hardware/software heterogeneous-reliability memory system designs to lower datacenter cost while achieving high reliability and discuss their trade-off. We show that our new techniques can reduce server hardware cost by 4.7% while achieving 99.90% single server availability.

architectural support for programming languages and operating systems | 2014

Underprovisioning backup power infrastructure for datacenters

Di Wang; Sriram Govindan; Anand Sivasubramaniam; Aman Kansal; Jie Liu; Badriddine Khessib

While there has been prior work to underprovision the power distribution infrastructure for a datacenter to save costs, the ability to underprovision the backup power infrastructure, which contributes significantly to capital costs, is little explored. There are two main components in the backup infrastructure - Diesel Generators (DGs) and UPS units - which can both be underprovisioned (or even removed) in terms of their power and/or energy capacities. However, embarking on such underprovisioning mandates studying several ramifications - the resulting cost savings, the lower availability, and the performance and state loss consequences on individual applications - concurrently. This paper presents the first such study, considering cost, availability, performance and application consequences of underprovisioning the backup power infrastructure. We present a framework to quantify the cost of backup capacity that is provisioned, and implement techniques leveraging existing software and hardware mechanisms to provide as seamless an operation as possible for an application within the provisioned backup capacity during a power outage. We evaluate the cost-performance-availability trade-offs for different levels of backup underprovisioning for applications with diverse reliance on the backup infrastructure. Our results show that one may be able to completely do away with DGs, compensating for it with additional UPS energy capacities, to significantly cut costs and still be able to handle power outages lasting as high as 40 minutes (which constitute bulk of the outages). Further, we can push the limits of outage duration that can be handled in a cost-effective manner, if applications are willing to tolerate degraded performance during the outage. Our evaluations also show that different applications react differently to the outage handling mechanisms, and that the efficacy of the mechanisms is sensitive to the outage duration. The insights from this paper can spur new opportunities for future work on backup power infrastructure optimization.

tpc technology conference | 2011

Time and cost-efficient modeling and generation of large-scale TPCC/TPCE/TPCH workloads

Christina Delimitrou; Sriram Sankar; Badriddine Khessib; Kushagra Vaid; Christos Kozyrakis

Large-scale TPC workloads are critical for the evaluation of datacenter-scale storage systems. However, these workloads have not been previously characterized, in-depth, and modeled in a DC environment. In this work, we categorize the TPC workloads into storage threads that have unique features and characterize the storage activity of TPCC, TPCE and TPCH based on I/O traces from real server installations. We also propose a framework for modeling and generation of large-scale TPC workloads, which allows us to conduct a wide spectrum of storage experiments without requiring knowledge on the structure of the application or the overhead of fully deploying it in different storage configurations. Using our framework, we eliminate the time for TPC setup and reduce the time for experiments by two orders of magnitude, due to the compression in storage activity enforced by the model. We demonstrate the accuracy of the model and the applicability of our method to significant datacenter storage challenges, including identification of early disk errors, and SSD caching.

data management on new hardware | 2006

Large scale Itanium® 2 processor OLTP workload characterization and optimization

Gerrit Saylor; Badriddine Khessib

Large scale OLTP workloads on modern database servers are well understood across the industry. Their runtime performance characterizations serve to drive both server side software features and processor specific design decisions but are not understood outside of the primary industry stakeholders. We provide a rare glimpse into the performance characterizations of processor and platform targeted software optimizations running on a large-scale 32 processor, Intel® Itanium® 2 based, ccNUMA platform.

tpc technology conference | 2010

Using solid state drives as a mid-tier cache in enterprise database OLTP applications

Badriddine Khessib; Kushagra Vaid; Sriram Sankar; Chengliang Zhang

When originally introduced, flash based solid state drives (SSD) exhibited a very high random read throughput with low sub-millisecond latencies. However, in addition to their steep prices, SSDs suffered from slow write rates and reliability concerns related to cell wear. For these reasons, they were relegated to a niche status in the consumer and personal computer market. Since then, several architectural enhancements have been introduced that led to a substantial increase in random write operations as well as a reasonable improvement in reliability. From a purely performance point of view, these high I/O rates and improved reliability make the SSDs an ideal choice for enterprise On-Line Transaction Processing (OLTP) applications. However, from a price/performance point of view, the case for SSDs may not be clear. Enterprise class SSD Price/GB, continues to be at least 10x higher than conventional magnetic hard disk drives (HDD) despite considerable drop in Flash chip prices. We show that a complete replacement of traditional HDDs with SSDs is not cost effective. Further, we demonstrate that the most cost efficient use of SSDs for OLTP workloads is as an intermediate persistent cache that sits between conventional HDDs and memory, thus forming a three-level memory hierarchy. We also discuss two implementations of such cache: hardware or software. For the software approach, we discuss our implementation of such a cache in an inhouse database system. We also describe off-the shelf hardware solutions. We will develop a Total Cost of Ownership (TCO) model for All-SSD and All-HDD configurations. We will also come up with a modified OLTP benchmark that can scale IO density to validate this model. We will also show how such SSD cache implementations could increase the performance of OLTP applications while reducing the overall system cost.

international conference on distributed computing systems | 2017

Rain or Shine? — Making Sense of Cloudy Reliability Data

Iyswarya Narayanan; Bikash Sharma; Di Wang; Sriram Govindan; Laura Marie Caulfield; Anand Sivasubramaniam; Aman Kansal; Jie Liu; Badriddine Khessib; Kushagra Vaid

Cloud datacenters must ensure high availability for the hosted applications and failures can be the bane of datacenter operators. Understanding the what, when and why of failures can help tremendously to mitigate their occurrence and impact. Failures can, however, depend on numerous spatial and temporal factors spanning hardware, workloads, support facilities, and even the environment. One has to rely on failure data from the field to quantify the influence of these factors on failures. Towards this goal, we collect failures data along with many parameters that might influence failures from two large production datacenters with very diverse characteristics. We show that multiple factors simultaneously affect failures, and these factors may interact in non-trivial ways. This makes conventional approaches that study aggregate characteristics or single parameter influences, rather inaccurate. Instead, we build a multi-factor analysis framework to systematically identify influencing factors, quantify their relative impact, and help in more accurate decision making for failure mitigation. We demonstrate this approach for three important decisions: spare capacity provisioning, comparing the reliability of hardware for vendor selection, and quantifying flexibility in datacenter climate control for cost-reliability trade-offs.

Archive | 2011

POWER-CAPPING BASED ON UPS CAPACITY

Harry Rogers; Kushagra Vaid; Mark E. Shaw; Badriddine Khessib; Bryan Kelly; Matthew Allen Faist

measurement and modeling of computer systems | 2016

SSD Failures in Datacenters: What, When and Why?

Iyswarya Narayanan; Di Wang; Myeongjae Jeon; Bikash Sharma; Laura Marie Caulfield; Anand Sivasubramaniam; Ben Cutler; Jie Liu; Badriddine Khessib; Kushagra Vaid

Archive | 2011