Berin Martini | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Berin Martini is active.

Explore More

Publication

Featured researches published by Berin Martini.

computer vision and pattern recognition | 2011

NeuFlow: A runtime reconfigurable dataflow processor for vision

Clément Farabet; Berin Martini; Benoit Corda; Polina Akselrod; Eugenio Culurciello; Yann LeCun

In this paper we present a scalable dataflow hardware architecture optimized for the computation of general-purpose vision algorithms — neuFlow — and a dataflow compiler — luaFlow — that transforms high-level flow-graph representations of these algorithms into machine code for neuFlow. This system was designed with the goal of providing real-time detection, categorization and localization of objects in complex scenes, while consuming 10 Watts when implemented on a Xilinx Virtex 6 FPGA platform, or about ten times less than a laptop computer, and producing speedups of up to 100 times in real-world applications. We present an application of the system on street scene analysis, segmenting 20 categories on 500 × 375 frames at 12 frames per second on our custom hardware neuFlow.

international symposium on circuits and systems | 2010

Hardware accelerated convolutional neural networks for synthetic vision systems

Clément Farabet; Berin Martini; Polina Akselrod; Selçuk Talay; Yann LeCun; Eugenio Culurciello

In this paper we present a scalable hardware architecture to implement large-scale convolutional neural networks and state-of-the-art multi-layered artificial vision systems. This system is fully digital and is a modular vision engine with the goal of performing real-time detection, recognition and segmentation of mega-pixel images. We present a performance comparison between a software, FPGA and ASIC implementation that shows a speed up in custom hardware implementations.

computer vision and pattern recognition | 2014

A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks

Vinayak Gokhale; Jonghoon Jin; Aysegul Dundar; Berin Martini; Eugenio Culurciello

Deep networks are state-of-the-art models used for understanding the content of images, videos, audio and raw input data. Current computing systems are not able to run deep network models in real-time with low power consumption. In this paper we present nn-X: a scalable, low-power coprocessor for enabling real-time execution of deep neural networks. nn-X is implemented on programmable logic devices and comprises an array of configurable processing elements called collections. These collections perform the most common operations in deep networks: convolution, subsampling and non-linear functions. The nn-X system includes 4 high-speed direct memory access interfaces to DDR3 memory and two ARM Cortex-A9 processors. Each port is capable of a sustained throughput of 950 MB/s in full duplex. nn-X is able to achieve a peak performance of 227 G-ops/s, a measured performance in deep learning applications of up to 200 G-ops/s while consuming less than 4 watts of power. This translates to a performance per power improvement of 10 to 100 times that of conventional mobile and desktop processors.

Archive | 2011

Large-Scale FPGA-based Convolutional Networks

Clément Farabet; Yann LeCun; Koray Kavukcuoglu; Berin Martini; Polina Akselrod; Selçuk Talay; Eugenio Culurciello

Micro-robots, unmanned aerial vehicles, imaging sensor networks, wireless phones, and other embedded vision systems all require low cost and high-speed implementations of synthetic vision systems capable of recognizing and categorizing objects in a scene. Many successful object recognition systems use dense features extracted on regularly spaced patches over the input image. The majority of the feature extraction systems have a common structure composed of a filter bank (generally based on oriented edge detectors or 2D Gabor functions), a nonlinear operation (quantization, winner-take-all, sparsification, normalization, and/or pointwise saturation), and finally a pooling operation (max, average, or histogramming). For example, the scale-invariant feature transform (SIFT) (Lowe, 2004) operator applies oriented edge filters to a small patch and determines the dominant orientation through a winner-take-all operation. Finally, the resulting sparse vectors are added (pooled) over a larger patch to form a local orientation histogram. Some recognition systems use a single stage of feature extractors (Lazebnik, Schmid, and Ponce, 2006; Dalal and Triggs, 2005; Berg, Berg, and Malik, 2005; Pinto, Cox, and DiCarlo, 2008). Other models such as HMAX-type models (Serre, Wolf, and Poggio, 2005; Mutch, and Lowe, 2006) and convolutional networks use two more layers of successive feature extractors. Different training algorithms have been used for learning the parameters of convolutional networks. In LeCun et al. (1998b) and Huang and LeCun (2006), pure supervised learning is used to update the parameters. However, recent works have focused on training with an auxiliary task (Ahmed et al., 2008) or using unsupervised objectives (Ranzato et al., 2007b; Kavukcuoglu et al., 2009; Jarrett et al., 2009; Lee et al., 2009).

international midwest symposium on circuits and systems | 2012

NeuFlow: Dataflow vision processing system-on-a-chip

Phi Hung Pham; Darko Jelaca; Clément Farabet; Berin Martini; Yann LeCun; Eugenio Culurciello

This paper presents a bio-inspired vision system-on-a-chip - neuFlow SoC implemented in the IBM 45 nm SOI process. The neuFlow SoC was designed to accelerate neural networks and other complex vision algorithms based on large numbers of convolutions and matrix-to-matrix operations. Post-layout characterization shows that the system delivers up to 320 GOPS with an average power consumption of 0.6 W. The power-efficiency and portability of this system is ideal for embedded vision-based devices, such as driver assistance, and robotic vision.

IEEE Transactions on Circuits and Systems I-regular Papers | 2013

Continuous Time Level Crossing Sampling ADC for Bio-Potential Recording Systems

Wei Tang; Ahmad Osman; Dongsoo Kim; Brian Goldstein; Chenxi Huang; Berin Martini; Vincent A. Pieribone; Eugenio Culurciello

In this paper we present a fixed window level crossing sampling analog to digital convertor for bio-potential recording sensors. This is the first proposed and fully implemented fixed window level crossing ADC without local DACs and clocks. The circuit is designed to reduce data size, power, and silicon area in future wireless neurophysiological sensor systems. We built a testing system to measure bio-potential signals and used it to evaluate the performance of the circuit. The bio-potential amplifier offers a gain of 53 dB within a bandwidth of 200 Hz-20 kHz. The input-referred rms noise is 2.8 μV. In the asynchronous level crossing ADC, the minimum delta resolution is 4 mV. The input signal frequency of the ADC is up to 5 kHz. The system was fabricated using the AMI 0.5 μm CMOS process. The chip size is 1.5 mm by 1.5 mm. The power consumption of the 4-channel system from a 3.3 V supply is 118.8 μW in the static state and 501.6 μW with a 240 kS/s sampling rate. The conversion efficiency is 1.6 nJ/conversion.

midwest symposium on circuits and systems | 2014

An efficient implementation of deep convolutional neural networks on a mobile coprocessor

Jonghoon Jin; Vinayak Gokhale; Aysegul Dundar; Bharadwaj Krishnamurthy; Berin Martini; Eugenio Culurciello

In this paper we present a hardware accelerated real-time implementation of deep convolutional neural networks (DCNNs). DCNNs are becoming popular because of advances in the processing capabilities of general purpose processors. However, DCNNs produce hundreds of intermediate results whose constant memory accesses result in inefficient use of general purpose processor hardware. By using an efficient routing strategy, we are able to maximize utilization of available hardware resources but also obtain high performance in real world applications. Our system, consisting of an ARM Cortex-A9 processor and a coprocessor, is capable of a peak performance of 40 G-ops/s while consuming less than 4W of power. The entire platform is in a small form factor which, combined with its high performance at low power consumption makes it feasible to use this hardware in applications like micro-UAVs, surveillance systems and autonomous robots.

international symposium on circuits and systems | 2010

4-Channel asynchronous bio-potential recording system

Wei Tang; Chenxi Huang; Dongsoo Kim; Berin Martini; Eugenio Culurciello

We present a 4-channel bio-potential recording system using asynchronous delta analog-to-digital converter. The system is designed to reduce the amount of data in neuro-physiological sensors. The circuit includes low-power low-noise amplifiers with asynchronous delta modulators. The analog front end offers a gain of 53dB within bandwidth of 200Hz-20kHz. In this asynchronous data converter, the minimum input-referred delta resolution is 4μV. The input-referred rms noise is 2μV. T h e system was fabricated with AMI 0.5μm CMOS technology. The chip size is 1.5mm by 1.5mm. The power consumption of the 4-channel system with 3.3V supply is 118.8μW in static state and 501.6μW with 240kbps data conversion rate. A graphic user interface was developed to monitor the transmitted signal in real time.

ieee high performance extreme computing conference | 2014

Memory access optimized routing scheme for deep networks on a mobile coprocessor

Aysegul Dundar; Jonghoon Jin; Vinayak Gokhale; Berin Martini; Eugenio Culurciello

In this paper, we present a memory access optimized routing scheme for a hardware accelerated real-time implementation of deep convolutional neural networks (DCNNs) on a mobile platform. DCNNs consist of multiple layers of 3D convolutions, each comprising between tens and hundreds of filters and they generate the most expensive operations in DCNNs. Systems that run DCNNs need to pass 3D input maps to the hardware accelerators for convolutions and they face the limitation of streaming data in and out of the hardware accelerator. The bandwidth limited systems require data reuse to utilize computational resources efficiently. We propose a new routing scheme for 3D convolutions by taking advantage of the characteristic of DCNNs to fully utilize all the resources in the hardware accelerator. This routing scheme is implemented on the Xilinx Zynq-7000 All Programmable SoC. The system fully explores weight level and node level parallelization of DCNNs and achieves a peak performance 2x better than the previous routing scheme while running DCNNs.

international symposium on circuits and systems | 2009

A bio-inspired event-based size and position invariant human posture recognition algorithm

Shoushun Chen; Berin Martini; Eugenio Culurciello

This paper proposes a new approach to recognize human postures in realtime video sequences. The algorithm employs temporal difference imaging between video sequences as input and then decompose the contour of the active object into vectorial line segments. A scheme based on simplified Line Segment Hausdorff Distance combined with projection histograms is proposed to achieve size and position invariance recognition. Consistent with the hierarchical model of the human visual system, sub-sampling techniques are used to represent the object by line segments at multiple resolution levels. The whole classification is described as a coarse to fine procedure. An average realtime recognition rate of 88% is achieved in the experiment. Compared to conventional convolution method, the proposed algorithm reduces the computation cycles by 10 – 100 times. This work sets the foundation for size and position invariant object recognition for the implementation of event-based vision systems

Explore More