Is this you? Create Your Porfile

Onur Kocberber

École Polytechnique Fédérale de Lausanne

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Onur Kocberber is active.

Explore More

Publication

Featured researches published by Onur Kocberber.

architectural support for programming languages and operating systems | 2012

Clearing the clouds: a study of emerging scale-out workloads on modern hardware

Michael Ferdman; Almutaz Adileh; Onur Kocberber; Stavros Volos; Mohammad Alisafaee; Djordje Jevdjic; Cansu Kaynak; Adrian Popescu; Anastasia Ailamaki; Babak Falsafi

Emerging scale-out workloads require extensive amounts of computational resources. However, data centers using modern server hardware face physical constraints in space and power, limiting further expansion and calling for improvements in the computational density per server and in the per-operation energy. Continuing to improve the computational resources of the cloud while staying within physical constraints mandates optimizing server efficiency to ensure that server hardware closely matches the needs of scale-out workloads. In this work, we introduce CloudSuite, a benchmark suite of emerging scale-out workloads. We use performance counters on modern servers to study scale-out workloads, finding that todays predominant processor micro-architecture is inefficient for running these workloads. We find that inefficiency comes from the mismatch between the workload needs and modern processors, particularly in the organization of instruction and data memory systems and the processor core micro-architecture. Moreover, while todays predominant micro-architecture is inefficient when executing scale-out workloads, we find that continuing the current trends will further exacerbate the inefficiency in the future. In this work, we identify the key micro-architectural needs of scale-out workloads, calling for a change in the trajectory of server processors that would lead to improved computational density and power efficiency in data centers.

international symposium on computer architecture | 2012

Scale-out processors

Pejman Lotfi-Kamran; Boris Grot; Michael Ferdman; Stavros Volos; Onur Kocberber; Javier Picorel; Almutaz Adileh; Djordje Jevdjic; Sachin Satish Idgunji; Emre Özer; Babak Falsafi

Scale-out datacenters mandate high per-server throughput to get the maximum benefit from the large TCO investment. Emerging applications (e.g., data serving and web search) that run in these datacenters operate on vast datasets that are not accommodated by on-die caches of existing server chips. Large caches reduce the die area available for cores and lower performance through long access latency when instructions are fetched. Performance on scale-out workloads is maximized through a modestly-sized last-level cache that captures the instruction footprint at the lowest possible access latency. In this work, we introduce a methodology for designing scalable and efficient scale-out server processors. Based on a metric of performance-density, we facilitate the design of optimal multi-core configurations, called pods. Each pod is a complete server that tightly couples a number of cores to a small last-level cache using a fast interconnect. Replicating the pod to fill the die area yields processors which have optimal performance density, leading to maximum per-chip throughput. Moreover, as each pod is a stand-alone server, scale-out processors avoid the expense of global (i.e., interpod) interconnect and coherence. These features synergistically maximize throughput, lower design complexity, and improve technology scalability. In 20nm technology, scaleout chips improve throughput by 5x-6.5x over conventional and by 1.6x-1.9x over emerging tiled organizations.

international symposium on microarchitecture | 2013

Meet the walkers: accelerating index traversals for in-memory databases

Onur Kocberber; Boris Grot; Javier Picorel; Babak Falsafi; Kevin T. Lim; Parthasarathy Ranganathan

The explosive growth in digital data and its growing role in real-time decision support motivate the design of high-performance database management systems (DBMSs). Meanwhile, slowdown in supply voltage scaling has stymied improvements in core performance and ushered an era of power-limited chips. These developments motivate the design of DBMS accelerators that (a) maximize utility by accelerating the dominant operations, and (b) provide flexibility in the choice of DBMS, data layout, and data types. We study data analytics workloads on contemporary in-memory databases and find hash index lookups to be the largest single contributor to the overall execution time. The critical path in hash index lookups consists of ALU-intensive key hashing followed by pointer chasing through a node list. Based on these observations, we introduce Widx, an on-chip accelerator for database hash index lookups, which achieves both high performance and flexibility by (1) decoupling key hashing from the list traversal, and (2) processing multiple keys in parallel on a set of programmable walker units. Widx reduces design cost and complexity through its tight integration with a conventional core, thus eliminating the need for a dedicated TLB and cache. An evaluation of Widx on a set of modern data analytics workloads (TPC-H, TPC-DS) using full-system simulation shows an average speedup of 3.1× over an aggressive OoO core on bulk hash table operations, while reducing the OoO core energy by 83%.

ACM Transactions on Computer Systems | 2012

Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors

Michael Ferdman; Almutaz Adileh; Onur Kocberber; Stavros Volos; Mohammad Alisafaee; Djordje Jevdjic; Cansu Kaynak; Adrian Popescu; Anastasia Ailamaki; Babak Falsafi

Emerging scale-out workloads require extensive amounts of computational resources. However, data centers using modern server hardware face physical constraints in space and power, limiting further expansion and calling for improvements in the computational density per server and in the per-operation energy. Continuing to improve the computational resources of the cloud while staying within physical constraints mandates optimizing server efficiency to ensure that server hardware closely matches the needs of scale-out workloads. In this work, we introduce CloudSuite, a benchmark suite of emerging scale-out workloads. We use performance counters on modern servers to study scale-out workloads, finding that today’s predominant processor microarchitecture is inefficient for running these workloads. We find that inefficiency comes from the mismatch between the workload needs and modern processors, particularly in the organization of instruction and data memory systems and the processor core microarchitecture. Moreover, while today’s predominant microarchitecture is inefficient when executing scale-out workloads, we find that continuing the current trends will further exacerbate the inefficiency in the future. In this work, we identify the key microarchitectural needs of scale-out workloads, calling for a change in the trajectory of server processors that would lead to improved computational density and power efficiency in data centers.

IEEE Micro | 2014

A Case for Specialized Processors for Scale-Out Workloads

Michael Ferdman; Almutaz Adileh; Onur Kocberber; Stavros Volos; Mohammad Alisafaee; Djordje Jevdjic; Cansu Kaynak; Adrian Popescu; Anastasia Ailamaki; Babak Falsafi

Emerging scale-out workloads need extensive amounts of computational resources. However, data centers using modern server hardware face physical constraints in space and power, limiting further expansion and requiring improvements in the computational density per server and in the per-operation energy. Continuing to improve the computational resources of the cloud while staying within physical constraints mandates optimizing server efficiency. In this work, we demonstrate that modern server processors are highly inefficient for running cloud workloads. To address this problem, we investigate the microarchitectural behavior of scale-out workloads and present opportunities to enable specialized processor designs that closely match the needs of the cloud.

very large data bases | 2015

Asynchronous memory access chaining

Onur Kocberber; Babak Falsafi; Boris Grot

In-memory databases rely on pointer-intensive data structures to quickly locate data in memory. A single lookup operation in such data structures often exhibits long-latency memory stalls due to dependent pointer dereferences. Hiding the memory latency by launching additional memory accesses for other lookups is an effective way of improving performance of pointer-chasing codes (e.g., hash table probes, tree traversals). The ability to exploit such inter-lookup parallelism is beyond the reach of modern out-of-order cores due to the limited size of their instruction window. Instead, recent work has proposed software prefetching techniques that exploit inter-lookup parallelism by arranging a set of independent lookups into a group or a pipeline, and navigate their respective pointer chains in a synchronized fashion. While these techniques work well for highly regular access patterns, they break down in the face of irregularity across lookups. Such irregularity includes variable-length pointer chains, early exit, and read/write dependencies. This work introduces Asynchronous Memory Access Chaining (AMAC), a new approach for exploiting inter-lookup parallelism to hide the memory access latency. AMAC achieves high dynamism in dealing with irregularity across lookups by maintaining the state of each lookup separately from that of other lookups. This feature enables AMAC to initiate a new lookup as soon as any of the in-flight lookups complete. In contrast, the static arrangement of lookups into a group or pipeline in existing techniques precludes such adaptivity. Our results show that AMAC matches or outperforms state-of-the-art prefetching techniques on regular access patterns, while delivering up to 2.3x higher performance under irregular data structure lookups. AMAC fully utilizes the available microarchitectural resources, generating the maximum number of memory accesses allowed by hardware in both single- and multi-threaded execution modes.

high-performance computer architecture | 2014

FADE: A programmable filtering accelerator for instruction-grain monitoring

Sotiria Fytraki; Evangelos Vlachos; Onur Kocberber; Babak Falsafi; Boris Grot

Instruction-grain monitoring is a powerful approach that enables a wide spectrum of bug-finding tools. As existing software approaches incur prohibitive runtime overhead, researchers have focused on hardware support for instruction-grain monitoring. A recurring theme in recent work is the use of hardware-assisted filtering so as to elide costly software analysis. This work generalizes and extends prior point solutions into a programmable filtering accelerator affording vast flexibility and at-speed event filtering. The pipelined microarchitecture of the accelerator affords a peak filtering rate of one application event per cycle, which suffices to keep up with an aggressive OoO core running the monitored application. A unique feature of the proposed design is the ability to dynamically resolve dependencies between unfilterable events and subsequent events, eliminating data-dependent stalls and maximizing accelerators performance. Our evaluation results show a monitoring slowdown of just 1.2-1.8x across a diverse set of monitoring tools.

Archive | 2012

Computer Architecture (ISCA), 2012 39th Annual International Symposium on

Pejman Lotfi-Kamran; Boris Grot; Michael Ferdman; Stavros Volos; Onur Kocberber; Javier Picorel; Almutaz Adileh; Djordje Jevdjic; Sachin Satish Idgunji; Emre Özer; Babak Falsafi

Archive | 2011

Clearing the Clouds: A Study of Emerging Workloads on Modern Hardware

Michael Ferdman; Almutaz Adileh; Onur Kocberber; Stavros Volos; Mohammad Alisafaee; Djordje Jevdjic; Cansu Kaynak; Adrian Popescu; Anastasia Ailamaki; Babak Falsafi

5th Workshop on Architectures and Systems for Big Data ( ASBD 2015 ) | 2015