Kaan Kara | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kaan Kara is active.

Explore More

Publication

Featured researches published by Kaan Kara.

field programmable custom computing machines | 2017

Centaur: A Framework for Hybrid CPU-FPGA Databases

Muhsen Owaida; David Sidler; Kaan Kara; Gustavo Alonso

Accelerating relational databases in general and SQL in particular has become an important topic given thechallenges arising from large data collections and increasinglycomplex workloads. Most existing work, however, has beenfocused on either accelerating a single operator (e.g., a join) orin data reduction along the data path (e.g., from disk to CPU). In this paper we focus instead on the system aspects of accelerating a relational engine in hybrid CPU-FPGA architectures. In particular, we present Centaur, a framework running on theFPGA that allows the dynamic allocation of FPGA operatorsto query plans, pipelining these operators among themselveswhen needed, and the hybrid execution of operator pipelinesrunning on the CPU and the FPGA. Centaur is fully compatiblewith relational engines as we demonstrate through its seamlessintegration with MonetDB, a popular column store database. Inthe paper, we describe how this integration is achieved, andempirically demonstrate the advantages of such an approach. The main contribution of the paper is to provide a realisticsolution for accelerating SQL that is compatible with existingdatabase architectures, thereby opening up the possibilities forfurther exploration of FPGA based data processing.

international conference on management of data | 2017

FPGA-based Data Partitioning

Kaan Kara; Jana Giceva; Gustavo Alonso

Implementing parallel operators in multi-core machines often involves a data partitioning step that divides the data into cache-size blocks and arranges them so to allow concurrent threads to process them in parallel. Data partitioning is expensive, in some cases up to 90% of the cost of, e.g., a parallel hash join. In this paper we explore the use of an FPGA to accelerate data partitioning. We do so in the context of new hybrid architectures where the FPGA is located as a co-processor residing on a socket and with coherent access to the same memory as the CPU residing on the other socket. Such an architecture reduces data transfer overheads between the CPU and the FPGA, enabling hybrid operator execution where the partitioning happens on the FPGA and the build and probe phases of a join happen on the CPU. Our experiments demonstrate that FPGA-based partitioning is significantly faster and more robust than CPU-based partitioning. The results open interesting options as FPGAs are gradually integrated tighter with the CPU.

field programmable custom computing machines | 2017

FPGA-Accelerated Dense Linear Machine Learning: A Precision-Convergence Trade-Off

Kaan Kara; Dan Alistarh; Gustavo Alonso; Onur Mutlu; Ce Zhang

Stochastic gradient descent (SGD) is a commonly used algorithm for training linear machine learning models. Based on vector algebra, it benefits from the inherent parallelism available in an FPGA. In this paper, we first present a single-precision floating-point SGD implementation on an FPGA that provides similar performance as a 10-core CPU. We then adapt the design to make it capable of processing low-precision data. The low-precision data is obtained from a novel compression scheme—called stochastic quantization, specifically designed for machine learning applications. We test both full-precision and low-precision designs on various regression and classification data sets. We achieve up to an order of magnitude training speedup when using low-precision data compared to a full-precision SGD on the same FPGA and a state-of-the-art multi-core solution, while maintaining the quality of training. We open source the designs presented in this paper.

international conference on management of data | 2017

doppioDB: A Hardware Accelerated Database

David Sidler; Zsolt István; Muhsen Owaida; Kaan Kara; Gustavo Alonso

Relational databases provide a wealth of functionality to a wide range of applications. Yet, there are tasks for which they are less than optimal, for instance when processing becomes more complex (e.g., matching regular expressions) or the data is less structured (e.g., text or long strings). In this demonstration we show the benefit of using specialized hardware for such tasks and highlight the importance of a flexible, reusable mechanism for extending database engines with hardware-based operators. We present doppioDB which consists of MonetDB, a main-memory column store, extended with Hardware User Defined Functions (HUDFs). In our demonstration the HUDFs are used to provide seamless acceleration of two string operators, LIKE and REGEXP_LIKE, and two analytics operators, SKYLINE and SGD (stochastic gradient descent). We evaluate doppioDB on an emerging hybrid multicore architecture, the Intel Xeon+FPGA platform, where the CPU and FPGA have cache-coherent access to the same memory, such that the hardware operators can directly access the database tables. For integration we rely on HUDFs as a unit of scheduling and management on the FPGA. In the demonstration we show the acceleration benefits of hardware operators, as well as their flexibility in accommodating changing workloads.

field programmable logic and applications | 2016

Fast and robust hashing for database operators

Kaan Kara; Gustavo Alonso

Hashing is an essential part of many database operators, such as joins or aggregation, especially when executed in parallel. Often, database engines resort to using easily computed hash functions like modulo to prevent that hashing becomes a bottleneck. The disadvantage of simple hash functions is that they produce imperfect data distributions, particularly when the data is skewed. Robust hash functions produce balanced distributions but they are computationally expensive. Our purpose in this paper is to break the present trade-off between robustness and performance. We achieve this by showing how to implement robust hash functions suitable for database operators on an FPGA. Our target platform (Intel QuickAssist QPI-FPGA) provides a shared memory architecture between the CPU and the FPGA, enabling database engines to use the hardware hashing without any modifications to their memory layout. Depending on the hash function, we achieve 6.6× improvement over pure software implementations. We also show how to integrate hardware hashing in a hybrid hash table without any acceleration overhead.

field programmable logic and applications | 2017

doppioDB: A hardware accelerated database

David Sidler; Muhsen Owaida; Zsolt István; Kaan Kara; Gustavo Alonso

Relational databases provide a wealth of functionality to a wide range of applications. Yet, there are tasks for which they are less than optimal, for instance when processing becomes more complex (e.g., regular expression evaluation, data analytics) or the data is less structured (e.g., text or long strings). With the increasing amount of user-generated data stored in relational databases, there is a growing need to analyze unstructured text data. At the same time more complex analytical operators are required to extract useful information from the vast amount of collected data. However, many analytical operators incur a significant compute complexity not suitable to database engines where multiple queries share the available resources. In this demonstration we show the benefit of using specialized hardware for such tasks and highlight the importance of a flexible, reusable mechanism for extending database engines with hardware-based operators. Our hybrid database engine, doppioDB, is deployed on an emerging Xeon+FPGA multicore architecture where the CPU and FPGA have cache-coherent access to the same memory, such that the hardware operators can directly access the database tables. The demonstration is illustrating the acceleration benefits of hardware operators, as well as doppioDBs flexibility in accommodating changing workloads.

international conference on machine learning | 2017