Is this you? Create Your Porfile

Nehir Sonmez

Barcelona Supercomputing Center

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nehir Sonmez is active.

Explore More

Publication

Featured researches published by Nehir Sonmez.

computing frontiers | 2008

The limits of software transactional memory (STM): dissecting Haskell STM applications on a many-core environment

Cristian Perfumo; Nehir Sonmez; Srdjan Stipic; Osman S. Unsal; Adrián Cristal; Tim Harris; Mateo Valero

In this paper, we present a Haskell Transactional Memory benchmark to provide a comprehensive application suite for the use of Software Transactional Memory (STM) researchers. We develop a framework to profile the execution of the benchmark applications and to collect detailed runtime data on their transactional behavior, running them on a 128-core multiprocessor. Using a composite of the collected raw data, we propose new transactional performance metrics. We analyze key statistics related to scalability, atomic sections, transactional events, overall transactional overhead and the relative hardware performance, accordingly drawing conclusions on the results. Our findings advance our comprehension on the STM runtime and the characteristics of different applications under the transactional management of the pure, functional programming language, Haskell.

field programmable logic and applications | 2014

An empirical evaluation of High-Level Synthesis languages and tools for database acceleration

Oriol Arcas-Abella; Geoffrey Ndu; Nehir Sonmez; Mohsen Ghasempour; Adrià Armejach; Javier Navaridas; Wei Song; John Mawer; Adrián Cristal; Mikel Luján

High Level Synthesis (HLS) languages and tools are emerging as the most promising technique to make FPGAs more accessible to software developers. Nevertheless, picking the most suitable HLS for a certain class of algorithms depends on requirements such as area and throughput, as well as on programmer experience. In this paper, we explore the different trade-offs present when using a representative set of HLS tools in the context of Database Management Systems (DBMS) acceleration. More specifically, we conduct an empirical analysis of four representative frameworks (Bluespec SystemVerilog, Altera OpenCL, LegUp and Chisel) that we utilize to accelerate commonly-used database algorithms such as sorting, the median operator, and hash joins. Through our implementation experience and empirical results for database acceleration, we conclude that the selection of the most suitable HLS depends on a set of orthogonal characteristics, which we highlight for each HLS framework.

international parallel and distributed processing symposium | 2009

Taking the heat off transactions: Dynamic selection of pessimistic concurrency control

Nehir Sonmez; Tim Harris; Adrián Cristal; Osman S. Unsal; Mateo Valero

In this paper we investigate feedback-directed dynamic selection between different implementations of atomic blocks. We initially execute atomic blocks using STM with optimistic concurrency control. At runtime, we identify “hot” variables that cause large numbers of transactions to abort. For these variables we selectively switch to using pessimistic concurrency control, in the hope of deferring transactions until they will be able to run to completion. This trades off a reduction in single-threaded speed (since pessimistic concurrency control is not as streamlined as our optimistic implementation), against a reduced amount of wasted work in aborted transactions. We describe our implementation in the Haskell programming language, and examine its performance with a range of micro-benchmarks and larger programs. We show that our technique is effective at reducing the amount of wasted work, but that for current workloads there is often not enough wasted work for an overall improvement to be possible. As we demonstrate, our technique is not appropriate for some workloads: the extra work introduced by lock-induced deadlock is greater than the wasted work saved from aborted transactions. For other workloads, we show that using mutual exclusion locks for “hot” variables could be preferable to multi-reader locks because mutual exclusion avoids deadlocks caused by concurrent attempts to upgrade to write access.

field-programmable custom computing machines | 2015

HATCH: Hash Table Caching in Hardware for Efficient Relational Join on FPGA

Behzad Salami; Oriol Arcas-Abella; Nehir Sonmez

In this paper we present HATCH, a novel hash join engine. We follow a new design point which enables us to effectively cache the hash table entries in fast BRAM resources, meanwhile supporting collision resolution in hardware. HATCH enables us to have the best of two worlds: (i) to use the full capacity of the DDR memory to store complete hash tables, and (ii) by employing a cache, to exploit the high access speed of BRAMs. We demonstrate the usefulness of our approach by running hash join operations from 5 TPC-H benchmark queries and report speedups up to 2.8× over a pipeline-optimized baseline.

Proceedings of the 12th FPGAworld Conference 2015 on | 2015

High Level Synthesis Based Hardware Accelerator Design for Processing SQL Queries

Gorker Alp Malazgirt; Nehir Sonmez; Arda Yurdakul; Adrián Cristal; Osman S. Unsal

About three exabytes of data is created and stored in databases each day, and this number is doubling approximately every forty months. Querying this enormous amount of data has been a challenge and new methods have been actively researched. In this paper, we present hardware accelerators which are designed to speed up database analytics for in-memory databases. Unlike traditional hardware accelerator designs, our hardware accelerators are composed using High Level Synthesis (HLS), which enables high level descriptions of functionality such as data filtering, sorting, equijoins to be targeted directly into RTL. We have simulated TPC-H benchmark queries using Xilinx Vivado HLS managed in our custom simulation software framework. Our results have demonstrated the capabilities of HLS in database acceleration domain; such that the 200MHz FPGA accelerator can provide two orders of magnitude performance improvement compared to PostgreSQL based full software implementation running on a modern multicore system.

applied reconfigurable computing | 2011

From plasma to beefarm: design experience of an FPGA-based multicore prototype

Nehir Sonmez; Oriol Arcas; Gokhan Sayilar; Osman S. Unsal; Adrian Cristal; Ibrahim Hur; Satnam Singh; Mateo Valero

In this paper, we take a MIPS-based open-source uniprocessor soft core, Plasma, and extend it to obtain the Beefarm infrastructure for FPGA-based multiprocessor emulation, a popular research topic of the last few years both in the FPGA and the computer architecture communities. We discuss various design tradeoffs and we demonstrate superior scalability through experimental results compared to traditional software instruction set simulators. Based on our experience of designing and building a complete FPGA-based multiprocessor emulation system that supports run-time and compiler infrastructure and on the actual executions of our experiments running Software Transactional Memory (STM) benchmarks, we comment on the pros, cons and future trends of using hardware-based emulation for research.

Computing in Science and Engineering | 2016

Hardware Acceleration for Query Processing: Leveraging FPGAs, CPUs, and Memory

Oriol Arcas-Abella; Adrià Armejach; Timothy Hayes; Gorker Alp Malazgirt; Oscar Palomar; Behzad Salami; Nehir Sonmez

Database management systems have become an indispensable tool for industry, government, and academia, and form a significant component of modern datacenters. They can be used in a multitude of scenarios, including online analytical processing, data mining, e-commerce, and scientific analysis. Given the exponential growth in new data produced each year, there is a pressure on software and hardware developers to create datacenters that can cope with increasing requirements. The authors look at the organization of a modern relational database management system and propose optimizations and redesign for the storage access, memory, and CPU.

field programmable gate arrays | 2015

Accelerating Complete Decision Support Queries Through High-Level Synthesis Technology (Abstract Only)

Gorker Alp Malazgirt; Nehir Sonmez; Arda Yurdakul; Osman S. Unsal; Adrián Cristal

Recently, with the rise of Internet of Things and Big Data, acceleration of database analytics in order to have faster query processing capabilities has gained significant attention. At the same time, High-Level Synthesis (HLS) technology has matured and is now a promising approach to design such hardware accelerators. In this work, we use a modern HLS, Vivado to design high-performance database accelerators for filtering, aggregation, sorting, merging and join operations. Later, we use these as building blocks to implement an acceleration system for in-memory databases on a Virtex-7 FPGA, detailed enough to run full TPC-H benchmarks completely in hardware. Presenting performance, area and memory requirements, we show up to 140x speedup compared to a software DBMS, and demonstrate that HLS technology is indeed a very appropriate match for database acceleration.

field-programmable custom computing machines | 2011

TMbox: A Flexible and Reconfigurable 16-Core Hybrid Transactional Memory System

Nehir Sonmez; Oriol Arcas; Otto Pflucker; Osman S. Unsal; A. Cristal; Ibrahim Hur; Satnam Singh; Mateo Valero

In this paper we present the design and implementation of TM box: An MPSoC built to explore trade-offs in multicore design space and to evaluate parallel programming proposals such as Transactional Memory (TM). Our flexible system, comprised of MIPS R3000-compatible cores is easily modifiable to study different architecture, library and operating system extensions. For this paper we evaluate a 16-core Hybrid Transactional Memory implementation based on the Tiny STM-ASF proposal on a Virtex-5 FPGA and we accelerate three benchmarks written to investigate TM.

Microprocessors and Microsystems | 2017

AxleDB: A novel programmable query processing platform on FPGA

Behzad Salami; Gorker Alp Malazgirt; Oriol Arcas-Abella; Arda Yurdakul; Nehir Sonmez

With the rise of Big Data, providing high-performance query processing capabilities through the acceleration of the database analytic has gained significant attention. Leveraging Field Programmable Gate Array (FPGA) technology, this approach can lead to clear benefits. In this work, we present the design and implementation of AxleDB: An FPGA-based platform that enables fast query processing for database systems by melding novel database-specific accelerators with commercial-off-the-shelf (COTS) storage using modern interfaces, in a novel, unified, and a programmable environment. AxleDB can perform a large subset of SQL queries through its set of instructions that can map compute-intensive database operations, such as filter, arithmetic, aggregate, group by, table join, or sort, on to the specialized high-throughput accelerators. To minimize the amount of SSD I/O operations required, AxleDB also supports hardware MinMax indexing for databases. We evaluated AxleDB with five decision support queries from the TPC-H benchmark suite and achieved a speedup from 1.8X to 34.2X and energy efficiency from 2.8X to 62.1X, in comparison to the state-of-the-art DBMS, i.e., PostgreSQL and MonetDB.

Explore More