Aysegul Dundar | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Aysegul Dundar is active.

Explore More

Publication

Featured researches published by Aysegul Dundar.

computer vision and pattern recognition | 2014

A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks

Vinayak Gokhale; Jonghoon Jin; Aysegul Dundar; Berin Martini; Eugenio Culurciello

Deep networks are state-of-the-art models used for understanding the content of images, videos, audio and raw input data. Current computing systems are not able to run deep network models in real-time with low power consumption. In this paper we present nn-X: a scalable, low-power coprocessor for enabling real-time execution of deep neural networks. nn-X is implemented on programmable logic devices and comprises an array of configurable processing elements called collections. These collections perform the most common operations in deep networks: convolution, subsampling and non-linear functions. The nn-X system includes 4 high-speed direct memory access interfaces to DDR3 memory and two ARM Cortex-A9 processors. Each port is capable of a sustained throughput of 950 MB/s in full duplex. nn-X is able to achieve a peak performance of 227 G-ops/s, a measured performance in deep learning applications of up to 200 G-ops/s while consuming less than 4 watts of power. This translates to a performance per power improvement of 10 to 100 times that of conventional mobile and desktop processors.

midwest symposium on circuits and systems | 2014

An efficient implementation of deep convolutional neural networks on a mobile coprocessor

Jonghoon Jin; Vinayak Gokhale; Aysegul Dundar; Bharadwaj Krishnamurthy; Berin Martini; Eugenio Culurciello

In this paper we present a hardware accelerated real-time implementation of deep convolutional neural networks (DCNNs). DCNNs are becoming popular because of advances in the processing capabilities of general purpose processors. However, DCNNs produce hundreds of intermediate results whose constant memory accesses result in inefficient use of general purpose processor hardware. By using an efficient routing strategy, we are able to maximize utilization of available hardware resources but also obtain high performance in real world applications. Our system, consisting of an ARM Cortex-A9 processor and a coprocessor, is capable of a peak performance of 40 G-ops/s while consuming less than 4W of power. The entire platform is in a small form factor which, combined with its high performance at low power consumption makes it feasible to use this hardware in applications like micro-UAVs, surveillance systems and autonomous robots.

conference on information sciences and systems | 2013

Tracking with deep neural networks

Jonghoon Jin; Aysegul Dundar; Jordan Bates; Clément Farabet; Eugenio Culurciello

We present deep neural network models applied to tracking objects of interest. Deep neural networks trained for general-purpose use are introduced to conduct long-term tracking, which requires scale-invariant feature extraction even when the object dramatically changes shape as it moves in the scene. We use two-layer networks trained using either supervised or unsupervised learning techniques. The networks, augmented with a radial basis function classifier, are able to track objects based on a single example. We tested the networks tracking capability on the TLD dataset, one of the most difficult sets of tracking tasks and real-time tracking is achieved in 0.074 seconds per frame for 320×240 pixel image on a 2-core 2.7GHz Intel i7 laptop.

ieee high performance extreme computing conference | 2014

Memory access optimized routing scheme for deep networks on a mobile coprocessor

Aysegul Dundar; Jonghoon Jin; Vinayak Gokhale; Berin Martini; Eugenio Culurciello

In this paper, we present a memory access optimized routing scheme for a hardware accelerated real-time implementation of deep convolutional neural networks (DCNNs) on a mobile platform. DCNNs consist of multiple layers of 3D convolutions, each comprising between tens and hundreds of filters and they generate the most expensive operations in DCNNs. Systems that run DCNNs need to pass 3D input maps to the hardware accelerators for convolutions and they face the limitation of streaming data in and out of the hardware accelerator. The bandwidth limited systems require data reuse to utilize computational resources efficiently. We propose a new routing scheme for 3D convolutions by taking advantage of the characteristic of DCNNs to fully utilize all the resources in the hardware accelerator. This routing scheme is implemented on the Xilinx Zynq-7000 All Programmable SoC. The system fully explores weight level and node level parallelization of DCNNs and achieves a peak performance 2x better than the previous routing scheme while running DCNNs.

IEEE Transactions on Neural Networks | 2017

Embedded Streaming Deep Neural Networks Accelerator With Applications

Aysegul Dundar; Jonghoon Jin; Berin Martini; Eugenio Culurciello

Deep convolutional neural networks (DCNNs) have become a very powerful tool in visual perception. DCNNs have applications in autonomous robots, security systems, mobile phones, and automobiles, where high throughput of the feedforward evaluation phase and power efficiency are important. Because of this increased usage, many field-programmable gate array (FPGA)-based accelerators have been proposed. In this paper, we present an optimized streaming method for DCNNs’ hardware accelerator on an embedded platform. The streaming method acts as a compiler, transforming a high-level representation of DCNNs into operation codes to execute applications in a hardware accelerator. The proposed method utilizes maximum computational resources available based on a novel-scheduled routing topology that combines data reuse and data concatenation. It is tested with a hardware accelerator implemented on the Xilinx Kintex-7 XC7K325T FPGA. The system fully explores weight-level and node-level parallelizations of DCNNs and achieves a peak performance of 247 G-ops while consuming less than 4 W of power. We test our system with applications on object classification and object detection in real-world scenarios. Our results indicate high-performance efficiency, outperforming all other presented platforms while running these applications.

arXiv: Neural and Evolutionary Computing | 2014