Christos Kozyrakis | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Christos Kozyrakis is active.

Explore More

Publication

Featured researches published by Christos Kozyrakis.

high-performance computer architecture | 2007

Evaluating MapReduce for Multi-core and Multiprocessor Systems

Colby Ranger; Ramanan Raghuraman; Arun Penmetsa; Gary R. Bradski; Christos Kozyrakis

This paper evaluates the suitability of the MapReduce model for multi-core and multi-processor systems. MapReduce was created by Google for application development on data-centers with thousands of servers. It allows programmers to write functional-style code that is automatically parallelized and scheduled in a distributed system. We describe Phoenix, an implementation of MapReduce for shared-memory systems that includes a programming API and an efficient runtime system. The Phoenix runtime automatically manages thread creation, dynamic task scheduling, data partitioning, and fault tolerance across processor nodes. We study Phoenix with multi-core and symmetric multiprocessor systems and evaluate its performance potential and error recovery features. We also compare MapReduce code to code written in lower-level APIs such as P-threads. Overall, we establish that, given a careful implementation, MapReduce is a promising model for scalable performance on shared-memory systems with simple parallel code

ieee international symposium on workload characterization | 2008

STAMP: Stanford Transactional Applications for Multi-Processing

Chi Cao Minh; JaeWoong Chung; Christos Kozyrakis; Kunle Olukotun

Transactional Memory (TM) is emerging as a promising technology to simplify parallel programming. While several TM systems have been proposed in the research literature, we are still missing the tools and workloads necessary to analyze and compare the proposals. Most TM systems have been evaluated using microbenchmarks, which may not be representative of any real-world behavior, or individual applications, which do not stress a wide range of execution scenarios. We introduce the Stanford Transactional Application for Multi-Processing (STAMP), a comprehensive benchmark suite for evaluating TM systems. STAMP includes eight applications and thirty variants of input parameters and data sets in order to represent several application domains and cover a wide range of transactional execution cases (frequent or rare use of transactions, large or small transactions, high or low contention, etc.). Moreover, STAMP is portable across many types of TM systems, including hardware, software, and hybrid systems. In this paper, we provide descriptions and a detailed characterization of the applications in STAMP. We also use the suite to evaluate six different TM systems, identify their shortcomings, and motivate further research on their performance characteristics.

international symposium on computer architecture | 2004

Transactional Memory Coherence and Consistency

Lance Hammond; Vicky Wong; Michael K. Chen; Brian D. Carlstrom; John D. Davis; Ben Hertzberg; Manohar K. Prabhu; Honggo Wijaya; Christos Kozyrakis; Kunle Olukotun

In this paper, we propose a new shared memory model: transactional memory coherence and consistency (TCC). TCC provides a model in which atomic transactions are always the basic unit of parallel work, communication, memory coherence, and memory reference consistency. TCC greatly simplifies parallel software by eliminating the need for synchronization using conventional locks and semaphores, along with their complexities. TCC hardware must combine all writes from each transaction region in a program into a single packet and broadcast this packet to the permanent shared memory state atomically as a large block. This simplifies the coherence hardware because it reduces the need for small, low-latency messages and completely eliminates the need for conventional snoopy cache coherence protocols, as multiple speculatively written versions of a cache line may safely coexist within the system. Meanwhile, automatic, hardware-controlled rollback of speculative transactions resolves any correctness violations that may occur when several processors attempt to read and write the same data simultaneously. The cost of this simplified scheme is higher interprocessor bandwidth. To explore the costs and benefits of TCC, we study the characteristics of an optimal transaction-based memory system, and examine how different design parameters could affect the performance of real systems. Across a spectrum of applications, the TCC model itself did not limit available parallelism. Most applications are easily divided into transactions requiring only small write buffers, on the order of 4-8 KB. The broadcast requirements of TCC are high, but are well within the capabilities of CMPs and small-scale SMPs with high-speed interconnects.

web science | 2008

Transactional memory

James R. Larus; Christos Kozyrakis

Is TM the answer for improving parallel programming?

Operating Systems Review | 2010

The case for RAMClouds: scalable high-performance storage entirely in DRAM

John K. Ousterhout; Parag Agrawal; David Erickson; Christos Kozyrakis; Jacob Leverich; David Mazières; Subhasish Mitra; Aravind Narayanan; Guru M. Parulkar; Mendel Rosenblum; Stephen M. Rumble; Eric Stratmann; Ryan Stutsman

Disk-oriented approaches to online storage are becoming increasingly problematic: they do not scale gracefully to meet the needs of large-scale Web applications, and improvements in disk capacity have far outstripped improvements in access latency and bandwidth. This paper argues for a new approach to datacenter storage called RAMCloud, where information is kept entirely in DRAM and large-scale systems are created by aggregating the main memories of thousands of commodity servers. We believe that RAMClouds can provide durable and available storage with 100-1000x the throughput of disk-based systems and 100-1000x lower access latency. The combination of low latency and large scale will enable a new breed of dataintensive applications.

international symposium on computer architecture | 2007

An effective hybrid transactional memory system with strong isolation guarantees

Chi Cao Minh; Martin Trautmann; JaeWoong Chung; Austen McDonald; Nathan Grasso Bronson; Jared Casper; Christos Kozyrakis; Kunle Olukotun

We propose signature-accelerated transactional memory (SigTM), ahybrid TM system that reduces the overhead of software transactions. SigTM uses hardware signatures to track the read-set and write-set forpending transactions and perform conflict detection between concurrent threads. All other transactional functionality, including dataversioning, is implemented in software. Unlike previously proposed hybrid TM systems, SigTM requires no modifications to the hardware caches, which reduces hardware cost and simplifies support for nested transactions and multithreaded processor cores. SigTM is also the first hybrid TM system to provide strong isolation guarantees between transactional blocks and non-transactional accesses without additional read and write barriers in non-transactional code. Using a set of parallel programs that make frequent use of coarse-grain transactions, we show that SigTM accelerates software transactions by 30% to 280%. For certain workloads, SigTM can match the performance of a full-featured hardware TM system, while for workloads with large read-sets it can be up to two times slower. Overall, we show that SigTM combines the performance characteristics and strong isolation guarantees of hardware TM implementations with the low cost and flexibility of software TM systems.

international conference on power aware computing and systems | 2010

On the energy (in)efficiency of Hadoop clusters

Jacob Leverich; Christos Kozyrakis

Distributed processing frameworks, such as Yahoo!s Hadoop and Googles MapReduce, have been successful at harnessing expansive datacenter resources for large-scale data analysis. However, their effect on datacenter energy efficiency has not been scrutinized. Moreover, the filesystem component of these frameworks effectively precludes scale-down of clusters deploying these frameworks (i.e. operating at reduced capacity). This paper presents our early work on modifying Hadoop to allow scale-down of operational clusters. We find that running Hadoop clusters in fractional configurations can save between 9% and 50% of energy consumption, and that there is a tradeoff between performance energy consumption. We also outline further research into the energy-efficiency of these frameworks.

architectural support for programming languages and operating systems | 2013

Paragon: QoS-aware scheduling for heterogeneous datacenters

Christina Delimitrou; Christos Kozyrakis

Large-scale datacenters (DCs) host tens of thousands of diverse applications each day. However, interference between colocated workloads and the difficulty to match applications to one of the many hardware platforms available can degrade performance, violating the quality of service (QoS) guarantees that many cloud workloads require. While previous work has identified the impact of heterogeneity and interference, existing solutions are computationally intensive, cannot be applied online and do not scale beyond few applications. We present Paragon, an online and scalable DC scheduler that is heterogeneity and interference-aware. Paragon is derived from robust analytical methods and instead of profiling each application in detail, it leverages information the system already has about applications it has previously seen. It uses collaborative filtering techniques to quickly and accurately classify an unknown, incoming workload with respect to heterogeneity and interference in multiple shared resources, by identifying similarities to previously scheduled applications. The classification allows Paragon to greedily schedule applications in a manner that minimizes interference and maximizes server utilization. Paragon scales to tens of thousands of servers with marginal scheduling overheads in terms of time or state. We evaluate Paragon with a wide range of workload scenarios, on both small and large-scale systems, including 1,000 servers on EC2. For a 2,500-workload scenario, Paragon enforces performance guarantees for 91% of applications, while significantly improving utilization. In comparison, heterogeneity-oblivious, interference-oblivious and least-loaded schedulers only provide similar guarantees for 14%, 11% and 3% of workloads. The differences are more striking in oversubscribed scenarios where resource efficiency is more critical.

architectural support for programming languages and operating systems | 2014

Quasar: resource-efficient and QoS-aware cluster management

Christina Delimitrou; Christos Kozyrakis

Cloud computing promises flexibility and high performance for users and high cost-efficiency for operators. Nevertheless, most cloud facilities operate at very low utilization, hurting both cost effectiveness and future scalability. We present Quasar, a cluster management system that increases resource utilization while providing consistently high application performance. Quasar employs three techniques. First, it does not rely on resource reservations, which lead to underutilization as users do not necessarily understand workload dynamics and physical resource requirements of complex codebases. Instead, users express performance constraints for each workload, letting Quasar determine the right amount of resources to meet these constraints at any point. Second, Quasar uses classification techniques to quickly and accurately determine the impact of the amount of resources (scale-out and scale-up), type of resources, and interference on performance for each workload and dataset. Third, it uses the classification results to jointly perform resource allocation and assignment, quickly exploring the large space of options for an efficient way to pack workloads on available resources. Quasar monitors workload performance and adjusts resource allocation and assignment when needed. We evaluate Quasar over a wide range of workload scenarios, including combinations of distributed analytics frameworks and low-latency, stateful services, both on a local cluster and a cluster of dedicated EC2 servers. At steady state, Quasar improves resource utilization by 47% in the 200-server EC2 cluster, while meeting performance constraints for workloads of all types.

international symposium on computer architecture | 2007

Raksha: a flexible information flow architecture for software security

Michael Dalton; Hari Kannan; Christos Kozyrakis

High-level semantic vulnerabilities such as SQL injection and crosssite scripting have surpassed buffer overflows as the most prevalent security exploits. The breadth and diversity of software vulnerabilities demand new security solutions that combine the speed and practicality of hardware approaches with the flexibility and robustness of software systems. This paper proposes Raksha, an architecture for software security based on dynamic information flow tracking (DIFT). Raksha provides three novel features that allow for a flexible hardware/software approach to security. First, it supports flexible and programmable security policies that enable software to direct hardware analysis towards a wide range of high-level and low-level attacks. Second, it supports multiple active security policies that can protect the system against concurrent attacks. Third, it supports low-overhead security handlers that allow software to correct, complement, or extend the hardware-based analysis without the overhead associated with operating system traps. We present an FPGA prototype for Raksha that provides a full featured Linux workstation for security analysis. Using unmodified binaries for real-world applications, we demonstrate that Raksha can detect high-level attacks such as directory traversal, command injection, SQL injection, and cross-site scripting as well as low-level attacks such as buffer overflows. We also show that low overhead exception handling is critical for analyses such as memory corruption protection in order to address false positives that occur due to the diverse code patterns in frequently used software.

Explore More