David Sidler | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David Sidler is active.

Explore More

Publication

Featured researches published by David Sidler.

field-programmable custom computing machines | 2015

Scalable 10Gbps TCP/IP Stack Architecture for Reconfigurable Hardware

David Sidler; Gustavo Alonso; Michaela Blott; Kimon Karras; Kees A. Vissers; Raymond Carley

TCP/IP is the predominant communication protocol in modern networks but also one of the most demanding. Consequently, TCP/IP offload is becoming increasingly popular with standard network interface cards. TCP/IP Offload Engines have also emerged for FPGAs, and are being offered by vendors such as Intilop, Fraunhofer HHI, PLDA and Dini Group. With the target application being high-frequency trading, these implementations focus on low latency and support a limited session count. However, many more applications beyond high-frequency trading can potentially be accelerated inside an FPGA once TCP with high session count is available inside the fabric. This way, a network-attached FPGA on ingress and egress to a CPU can accelerate functions such as encryption, compression, memcached and many others in addition to running the complete network stack. This paper introduces a novel architecture for a 10Gbps line-rate TCP/IP stack for FPGAs that can scale with the number of sessions and thereby addresses these new applications. We prototyped the design on a VC709 development board, demonstrating compatibility with existing network infrastructure, operating at full 10Gbps throughput full-duplex while supporting 10,000 sessions. Finally, the design has been described primarily using high-level synthesis, which accelerates development time and improves maintainability.

international conference on management of data | 2017

Accelerating Pattern Matching Queries in Hybrid CPU-FPGA Architectures

David Sidler; Zsolt István; Muhsen Owaida; Gustavo Alonso

Taking advantage of recently released hybrid multicore architectures, such as the Intels Xeon+FPGA machine, where the FPGA has coherent access to the main memory through the QPI bus, we explore the benefits of specializing operators to hardware. We focus on two commonly used SQL operators for strings: LIKE, and REGEXP_LIKE, and provide a novel and efficient implementation of these operators in reconfigurable hardware. We integrate the hardware accelerator into MonetDB, a main-memory column store, and demonstrate a significant improvement in response time and throughput. Our Hardware User Defined Function (HUDF) can speed up complex pattern matching by an order of magnitude in comparison to the database running on a 10-core CPU. The insights gained from integrating hardware based string operators into MonetDB should also be useful for future designs combining hardware specialization and databases.

field programmable custom computing machines | 2017

Centaur: A Framework for Hybrid CPU-FPGA Databases

Muhsen Owaida; David Sidler; Kaan Kara; Gustavo Alonso

Accelerating relational databases in general and SQL in particular has become an important topic given thechallenges arising from large data collections and increasinglycomplex workloads. Most existing work, however, has beenfocused on either accelerating a single operator (e.g., a join) orin data reduction along the data path (e.g., from disk to CPU). In this paper we focus instead on the system aspects of accelerating a relational engine in hybrid CPU-FPGA architectures. In particular, we present Centaur, a framework running on theFPGA that allows the dynamic allocation of FPGA operatorsto query plans, pipelining these operators among themselveswhen needed, and the hybrid execution of operator pipelinesrunning on the CPU and the FPGA. Centaur is fully compatiblewith relational engines as we demonstrate through its seamlessintegration with MonetDB, a popular column store database. Inthe paper, we describe how this integration is achieved, andempirically demonstrate the advantages of such an approach. The main contribution of the paper is to provide a realisticsolution for accelerating SQL that is compatible with existingdatabase architectures, thereby opening up the possibilities forfurther exploration of FPGA based data processing.

field programmable custom computing machines | 2016

Runtime Parameterizable Regular Expression Operators for Databases

Zsolt István; David Sidler; Gustavo Alonso

Relational databases execute user queries through operator trees, where each operator has a well defined interface and a specific task (e.g., arithmetic function, pattern matching, aggregation, etc.). Hardware acceleration of compute intensive operators is a promising prospect but it comes with challenges. Databases execute tens of thousands of different queries per second. Thus, if only one specific instantiation of an operator is supported by the accelerator, it will have little effect on the overall workload. In this paper we explore the tradeoff between resource efficiency and expression complexity for an FPGA accelerator targeting string-matching operators (LIKE and REGEXP_LIKE in SQL). This tradeoff is complex. For instance, the FPGA not always wins: simple queries that can be answered from indexes run faster on the CPU. On complex regular expressions, the FPGA is faster but needs to be parametrized at runtime to be able to support different queries. For very long patterns, the entire expression might not fit into the FPGA circuit and a combined mode CPU-FPGA must be chosen. We evaluate our design on a heterogeneous multi-core machine in which the FPGA has cache coherent access to the CPU memory. In addition to the string matching circuit, we also show how to implement database page parsing logic so as to be able to work directly on the same memory data structures as the database engine.

international conference on management of data | 2017

doppioDB: A Hardware Accelerated Database

David Sidler; Zsolt István; Muhsen Owaida; Kaan Kara; Gustavo Alonso

Relational databases provide a wealth of functionality to a wide range of applications. Yet, there are tasks for which they are less than optimal, for instance when processing becomes more complex (e.g., matching regular expressions) or the data is less structured (e.g., text or long strings). In this demonstration we show the benefit of using specialized hardware for such tasks and highlight the importance of a flexible, reusable mechanism for extending database engines with hardware-based operators. We present doppioDB which consists of MonetDB, a main-memory column store, extended with Hardware User Defined Functions (HUDFs). In our demonstration the HUDFs are used to provide seamless acceleration of two string operators, LIKE and REGEXP_LIKE, and two analytics operators, SKYLINE and SGD (stochastic gradient descent). We evaluate doppioDB on an emerging hybrid multicore architecture, the Intel Xeon+FPGA platform, where the CPU and FPGA have cache-coherent access to the same memory, such that the hardware operators can directly access the database tables. For integration we rely on HUDFs as a unit of scheduling and management on the FPGA. In the demonstration we show the acceleration benefits of hardware operators, as well as their flexibility in accommodating changing workloads.

field programmable logic and applications | 2016

Low-latency TCP/IP stack for data center applications

David Sidler; Zsolt István; Gustavo Alonso

TCP/IP is widely used both in the Internet as well as in data centers. The protocol makes very few assumptions about the underlying network and provides useful guarantees such as reliable transmission, in-order delivery, or control flow. The price for this functionality is complexity, latency, and computational overhead, which is especially pronounced in software implementations. While for Internet communication this is acceptable, the overhead is too high in data centers. In this paper, we explore how to optimize a TCP/IP stack running on an FPGA for data center applications with an emphasis on data processing (e.g., key value stores). Using a key-value store and a low-latency consensus protocol implemented on an FPGA as an example of the requirements that arise in data centers, we provide an extensive analysis of the overheads of TCP/IP and the solutions that can be adopted to minimize such an overhead. The proposed optimized TCP/IP stack minimizes tail latencies (a key metric in distributed data processing) and is efficiently implemented so as to be able to share the FPGA with application logic.

very large data bases | 2017

Caribou: intelligent distributed storage

Zsolt István; David Sidler; Gustavo Alonso

The ever increasing amount of data being handled in data centers causes an intrinsic inefficiency: moving data around is expensive in terms of bandwidth, latency, and power consumption, especially given the low computational complexity of many database operations. In this paper we explore near-data processing in database engines, i.e., the option of offloading part of the computation directly to the storage nodes. We implement our ideas in Caribou, an intelligent distributed storage layer incorporating many of the lessons learned while building systems with specialized hardware. Caribou provides access to DRAM/NVRAM storage over the network through a simple key-value store interface, with each storage node providing high-bandwidth near-data processing at line rate and fault tolerance through replication. The result is a highly efficient, distributed, intelligent data storage that can be used to both boost performance and reduce power consumption and real estate usage in the data center thanks to the micro-server architecture adopted.

field programmable logic and applications | 2017

doppioDB: A hardware accelerated database

David Sidler; Muhsen Owaida; Zsolt István; Kaan Kara; Gustavo Alonso

Relational databases provide a wealth of functionality to a wide range of applications. Yet, there are tasks for which they are less than optimal, for instance when processing becomes more complex (e.g., regular expression evaluation, data analytics) or the data is less structured (e.g., text or long strings). With the increasing amount of user-generated data stored in relational databases, there is a growing need to analyze unstructured text data. At the same time more complex analytical operators are required to extract useful information from the vast amount of collected data. However, many analytical operators incur a significant compute complexity not suitable to database engines where multiple queries share the available resources. In this demonstration we show the benefit of using specialized hardware for such tasks and highlight the importance of a flexible, reusable mechanism for extending database engines with hardware-based operators. Our hybrid database engine, doppioDB, is deployed on an emerging Xeon+FPGA multicore architecture where the CPU and FPGA have cache-coherent access to the same memory, such that the hardware operators can directly access the database tables. The demonstration is illustrating the acceleration benefits of hardware operators, as well as doppioDBs flexibility in accommodating changing workloads.

field-programmable technology | 2016

Debugging framework for FPGA-based soft processors

David Sidler; Ken Eguro

Soft processors are one way to raise the computational abstraction of FPGAs while keeping the advantages of reconfigurable hardware, such as adaptability, deterministic performance and high performance/watt. Software developers can quickly build, test and deploy applications using familiar tools while still leveraging important optimizations such as application-specific custom instructions. However, they also present unique debugging problems. For example, the higher-level programming abstraction runs contradictory to classical low-level debugging tools like logic analyzers. In this work we present a debugging framework for FPGA-based soft processors that enables step-by-step debugging at the level of all soft processor instructions, time-travel debugging, post-mortem memory dumps, and performance metrics. By using knowledge about the soft processors internals, our framework can capture execution traces up to 60× more space-efficient than traditional embedded logic analyzers.

field programmable logic and applications | 2015

Building a distributed key-value store with FPGA-based microservers

Zsolt István; David Sidler; Gustavo Alonso

Energy efficiency is one of the major challenges in datacenters, and a promising way to tackle it are microservers. These scaled down machines with smaller CPUs, less peripherals and tighter integration improve energy efficiency, but often at the expense of lower performance. In this work we explore the tailoring of standard software components to specialized hardware as a way to get the energy efficiency of microservers without compromising performance. Our specialized microserver implements memcached, a common component in many web stacks, on a cluster of FPGAs. The design explores aspects such as pipelining techniques, tight integration with the network stack, dealing with the memory bottleneck, and shows how to build a complete system out of individual microservers. To our knowledge this is the first stand-alone FPGA-based solution that can be used as a drop-in replacement for the software version. Beyond the per-node performance, in this demo we focus on the replication and the integration aspects of our system. We run a common benchmark on a PostgreSQL database with a two-node deployment of FPGAs acting as a cache for query results.

Explore More