Arda Yurdakul | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Arda Yurdakul is active.

Explore More

Publication

Featured researches published by Arda Yurdakul.

IEEE Transactions on Circuits and Systems | 2008

An Algorithm for the Design of Low-Power Hardware-Efficient FIR Filters

Mustafa Aktan; Arda Yurdakul; Günhan Dündar

A novel algorithm for designing low-power and hardware-efficient linear-phase finite-impulse response (FIR) filters is presented. The algorithm finds filter coefficients with reduced number of signed-power-of-two (SPT) terms given the filter frequency response characteristics. The algorithm is a branch-and-bound-based algorithm that fixes a coefficient to a certain value. The value is determined by finding the boundary values of the coefficient using linear programming. Although the worst case run time of the algorithm is exponential, its capability to find appreciably good solutions in a reasonable amount of time makes it a desirable CAD tool for designing low-power and hardware-efficient filters. The superiority of the algorithm on existing methods in terms of SPT term count, design time, hardware complexity, and power performance is shown with several design examples. Up to 30% reduction in the number of SPT terms is achieved over unoptimized Remez coefficients, which is 20% better than compared optimization methods. The average power saving is 20% over unoptimized coefficients, which is up to 14% better than optimized coefficients obtained with existing methods.

signal processing systems | 1999

Multiplierless Realization of Linear DSP Transforms by Using Common Two-Term Expressions

Arda Yurdakul; Günhan Dündar

Recently, most DSP systems have used multirate signal processing techniques or transforms for reducing computational complexity without compromising the system quality. In these techniques, realizing each constant separately is a redundant process as some constants appear more than once, and increases area and power consumption of the system. This paper introduces the concept of handling all coefficients in the system at the same time. To do this, the two-term expressions of constants in a system for adder and shifter minimization is presented.

conference on ph.d. research in microelectronics and electronics | 2008

Self-reconfiguration on Spartan-III FPGAs with compressed partial bitstreams via a parallel configuration access port (cPCAP) core

Salih Bayar; Arda Yurdakul

This paper presents an alternative approach for dynamic partial self-reconfiguration that enables a field programmable gate array (FPGA) to reconfigure itself at run-time partially through a parallel configuration access port (cPCAP) under the control of the stand alone cPCAP core within the FPGA instead of using an embedded processor. The cPCAP core with bitstream decompression module needs only 361 slices, which is approximately 18% of a Spartan-3S200 FPGA. The dynamic partial self-reconfiguration via cPCAP core works up to 50 Mbyte/s. The compressed partial bitstream is stored in BlockRAM within the FPGA and decompressed via cPCAP core at the time of reconfiguration of the FPGA. This approach has been implemented on a pure Spartan-3 FPGA from Xilinx, but it can also be used for any other FPGA architectures, such as Virtex-II(Pro), Virtex-4, Virtex-5, etc.

digital systems design | 2013

Efficient Implementations of Multi-pumped Multi-port Register Files in FPGAs

Hasan Erdem Yantir; Salih Bayar; Arda Yurdakul

Existing implementation methods of multi-port register files (MPo-RF) in FPGAs are not scalable enough to deal with the increased number of ports due to higher logic area and power. While the usage of dedicated block RAMs (BRAMs) limits the designer to use only single read and single write port, slice based approach causes large resource occupation and degrades design performance significantly. Similarly, the conventional multi-pumping (MPu) approaches are not efficient enough due to increased combinational delay and area of huge multiplexers. In this paper, we propose a new design which exploits the banking and replication of BRAMs with efficient shift register based multi-pumping (SR-MPu) approach. While increased port number causes internal frequency drops in conventional multiplexer based MPu approaches, it does not affect internal operating frequency of our SR-MPu methodology. Test results on Xilinx Virtex-5 XC5VLX110T FPGA show that our 32-bit 12-read & 6-write (12R&6W) RF can operate internally up to 429 Mhz while 64-bit version up to 408 Mhz. The speed of our RF is independent from MPu factor and occupies lower logic resources up to 47% when compared with other design methods. In terms of energy consumption, our RF design saves energy up to 26% according to the Xilinx Power Analyzer (XPA) results.

reconfigurable communication centric systems on chip | 2011

A self-reconfigurable platform for general purpose image processing systems on low-cost spartan-6 FPGAs

Salih Bayar; Arda Yurdakul; Mehmet Tukel

There is still no partial reconfiguration tool support on low-cost Field Programmable Gate Arrays (FPGAs) such as old-fashioned Spartan-3 and state-of-the-art Spartan-6 FPGA families by Xilinx. This forces the designers and engineers, who are using the partial reconfiguration capability of FPGAs, to use expensive families such as Virtex-4, Virtex-5 and Virtex-6 which are officially supported by partial reconfiguration (PR) software. Moreover, Xilinx still does not offer a portable, dedicated self-reconfiguration engine for all of the FPGAs. Self-reconfiguration is achieved with general-purpose processors such as MicroBlaze and PowerPC which are too overqualified for this purpose. In this study, we propose a new self-reconfiguration mechanism for Spartan-6 FPGAs. This mechanism can be used to implement large and complex designs on small FPGAs as chip area can be dramatically reduced by exploiting the dynamic partial reconfiguration feature for on-demand functionality loading and maximal utilization of the hardware. This approach is highly attractive for designing low-cost compute-intensive applications such as high performance image processing systems. For Spartan-6 FPGAs, we have developed hard-macros and exploited the self-reconfiguration engine, compressed Parallel Configuration Access Port (cPCAP) [1], that we designed for Spartan-3. The modified cPCAP core with block RAM controller, bitstream decompressor unit and Internal Configuration Access Port (ICAP) Finite State Machine (FSM) occupies only about 82 of 6,822 slices (1.2% of whole device) on a Spartan-XC6SLX45 FPGA and it achieves the maximum theoretical reconfiguration speed of 200MB/s (ICAP, 16-bit at 100MHz) proposed by Xilinx. We have also implemented a Reconfigurable Processing Element (RPE) whose arithmetic unit can be reconfigured on-the-fly. Multiple RPEs can be utilized to design a General Purpose Image Processing System (GPIPS) that can implement a number of different algorithms during runtime. As an illustrative example, we programmed the GPIPS on Spartan-6 for switching between two applications on-demand such as two-dimensional filtering and block-matching.

Proceedings of the 12th FPGAworld Conference 2015 on | 2015

High Level Synthesis Based Hardware Accelerator Design for Processing SQL Queries

Gorker Alp Malazgirt; Nehir Sonmez; Arda Yurdakul; Adrián Cristal; Osman S. Unsal

About three exabytes of data is created and stored in databases each day, and this number is doubling approximately every forty months. Querying this enormous amount of data has been a challenge and new methods have been actively researched. In this paper, we present hardware accelerators which are designed to speed up database analytics for in-memory databases. Unlike traditional hardware accelerator designs, our hardware accelerators are composed using High Level Synthesis (HLS), which enables high level descriptions of functionality such as data filtering, sorting, equijoins to be targeted directly into RTL. We have simulated TPC-H benchmark queries using Xilinx Vivado HLS managed in our custom simulation software framework. Our results have demonstrated the capabilities of HLS in database acceleration domain; such that the 200MHz FPGA accelerator can provide two orders of magnitude performance improvement compared to PostgreSQL based full software implementation running on a modern multicore system.

field programmable logic and applications | 2014

Application specific multi-port memory customization in FPGAs

Gorker Alp Malazgirt; Hasan Erdem Yantir; Arda Yurdakul; Smail Niar

FPGA block RAMs (BRAMs) offer speed advantages compared to LUT-based memory designs but a BRAM has only one read and one write port. Designers need to use multiple BRAMs in order to create multi-port memory structures which are more difficult than designing with LUT-based multiport memories. Multi-port memory designs increase overall performance but comes with area cost. In this paper, we present a fully automated methodology that tailors our multi-port memory from a given application. We present our performance improvements and area tradeoffs on state-of-the-art string matching algorithms.

IEEE Transactions on Very Large Scale Integration Systems | 2015

PFMAP: Exploitation of Particle Filters for Network-on-Chip Mapping

Salih Bayar; Arda Yurdakul

In this paper, we propose a mapping algorithm called particle filter mapping (PFMAP); PFMAP is able to map task nodes onto the cores of tile-based network-on-chip (NoC) architectures, such as regular, irregular, and custom 2-D or 3-D topologies. PFMAP is inspired from systematic resampling algorithm for particle filters, in which all particles can run parallel and independently from each other. Based upon the experimental results from applying PFMAP for various real life and synthetic applications onto the different topologies and architectures, the performance of the 2-D mesh architectures in terms of communication cost increased by up to 51% for irregular topologies, and by up to 31% for custom architectures. Similarly, total travel distance obtained by PFMAP is reduced by up to 45% for custom 2-D mesh architectures. In addition to these, average clock cycles per flit and total network power are reduced by up to 17% and 15% for regular 2-D mesh architectures, respectively. Finally, communication cost is diminished by up to 34% for 3-D regular NoC architectures.

Journal of Systems Architecture | 2012

A dynamically reconfigurable communication architecture for multicore embedded systems

Salih Bayar; Arda Yurdakul

To deal with the communication bottleneck of multiprocessor systems, several communication architectures have been proposed in the last decade. Yet, none of them has demonstrated the performance of the direct connections between two communicating units. In this paper, we propose dynamically reconfigurable point-to-point (DRP2P) interconnects for setting up direct connection between two communicating units before the communication starts. DRP2P is neither point-to-point (P2P) nor Network-on-Chip (NoC); it stands between these two on-chip communication architectures. It is as fast as P2P and as scalable as NoC. Instead of using routers like in NoC, we utilize partial reconfiguration ability of FPGAs for routing data packets. Furthermore, DRP2P can work both on regular and irregular topologies. The only drawback of our approach is the reconfiguration latency. This drawback is completely hidden when the reconfiguration of the communication links is achieved during the computation times of the cores. DRP2P solves the scalability issue of P2P by setting up on-demand communication-specific links between cores. So, the occupied area and the total power consumption of communication architecture can be reduced significantly. We designed an on-chip self-reconfiguration core, c^2PCAP so as to achieve DRP2P interconnects as fast as possible. The c^2PCAP core is designed for Xilinx FPGAs and can partially reconfigure the FPGA at the highest rate proposed by the manufacturer (e.g. up to 400MB/s for Virtex-4).

computer and information technology | 2010

Introducing Hardware-in-Loop Concept to the Hardware/Software Co-design of Real-time Embedded Systems

Dogan Fennibay; Arda Yurdakul; Alper Sen

As the need for embedded systems to interact with other systems is growing fast, we see great opportunities in introducing the hardware-in-the-loop technique to the field of hardware/software co-design of embedded systems. This technique reduces the need to develop models for existing hardware and increases the accuracy of the overall system. This work is especially important now that complexity and time-to-market constraints demand early simulation, verification, and architectural exploration of systems. We introduce the hardware-in-the loop technique to the field of hardware/software co-design of industrial embedded systems using SystemC as the modeling environment. We conceptualize the hybrid channel to clearly define the communication between real and virtual (modeled) subsystems. We patch the SystemC kernel for hard real-time execution and we improve the underlying operating system to guarantee an upper bound for the overall system latency. We have performed tests to measure the performance of our method in terms of response time and determinism. We have achieved a stable operating frequency of 10 KHz and an I/O performance of sub-millisecond round-trip time over Ethernet. Moreover we have developed a non-timed transaction-level model of a BACnet Broadcast Management Device (BBMD) and connected it with real devices to see our methods performance in a real-life environment. Our model outperformed the competing real system up to 80 times in maximum response time. We deem the results very promising for the future of our method.

Explore More