Patricio Bulić | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Patricio Bulić is active.

Explore More

Publication

Featured researches published by Patricio Bulić.

Microprocessors and Microsystems | 2011

An iterative logarithmic multiplier

Zdenka Babic; Aleksej Avramovic; Patricio Bulić

Digital signal processing algorithms often rely heavily on a large number of multiplications, which is both time and power consuming. However, there are many practical solutions to simplify multiplication, like truncated and logarithmic multipliers. These methods consume less time and power but introduce errors. Nevertheless, they can be used in situations where a shorter time delay is more important than accuracy. In digital signal processing, these conditions are often met, especially in video compression and tracking, where integer arithmetic gives satisfactory results. This paper presents a simple and efficient multiplier with the possibility to achieve an arbitrary accuracy through an iterative procedure, prior to achieving the exact result. The multiplier is based on the same form of number representation as Mitchells algorithm, but it uses different error correction circuits than those proposed by Mitchell. In such a way, the error correction can be done almost in parallel (actually this is achieved through pipelining) with the basic multiplication. The hardware solution involves adders and shifters, so it is not gate and power consuming. The error summary for operands ranging from 8 bits to 16 bits indicates a very low relative error percentage with two iterations only. For the hardware implementation assessment, the proposed multiplier is implemented on the Spartan 3 FPGA chip. For 16-bit operands, the time delay estimation indicates that a multiplier with two iterations can work with a clock cycle more than 150MHz, and with the maximum relative error being less than 2%.

Neurocomputing | 2012

Applicability of approximate multipliers in hardware neural networks

Uroš Lotrič; Patricio Bulić

In recent years there has been a growing interest in hardware neural networks, which express many benefits over conventional software models, mainly in applications where speed, cost, reliability, or energy efficiency are of great importance. These hardware neural networks require many resource-, power- and time-consuming multiplication operations, thus special care must be taken during their design. Since the neural network processing can be performed in parallel, there is usually a requirement for designs with as many concurrent multiplication circuits as possible. One option to achieve this goal is to replace the complex exact multiplying circuits with simpler, approximate ones. The present work demonstrates the application of approximate multiplying circuits in the design of a feed-forward neural network model with on-chip learning ability. The experiments performed on a heterogeneous Proben1 benchmark dataset show that the adaptive nature of the neural network model successfully compensates for the calculation errors of the approximate multiplying circuits. At the same time, the proposed designs also profit from more computing power and increased energy efficiency.

International Journal of Parallel Programming | 2003

An extended ANSI C for processors with a multimedia extension

Patricio Bulić; Veselko Gustin

This paper presents the Multimedia C language, which is designed for the multimedia extensions included in all modern microprocessors. The paper discusses the language syntax, the implementation of its compiler and its use in developing multimedia applications. The goal was to provide programmers with the most natural way of using multimedia processing facilities in the C language. The MMC language has been used to develop some of the most frequently used multimedia kernels. The presented experiments on these scientific and multimedia applications have yielded good performance improvements.

Computer Applications in Engineering Education | 2006

Learning computer architecture concepts with the FPGA-based “Move” microprocessor

Veselko Gustin; Patricio Bulić

In this article we introduce the use of a programmable logic device (PLD) in an application‐oriented study as an example of designing a microprocessor based on reduced instruction set computer (RISC) architecture. Since the concept of an in‐system configurable logic circuit is becoming increasingly popular, we now use it for the purpose of logic design. We suggest that students use PLDs when constructing a central processing unit (CPU) with their own configured functions that are directly implemented in the logic. Such an approach could greatly increase the understanding of the architectural concept of the CPU.

international conference on computer design | 2010

A simple pipelined logarithmic multiplier

Patricio Bulić; Zdenka Babic; Aleksej Avramovic

Digital signal processing algorithms often rely heavily on a large number of multiplications, which is both time and power consuming. However, there are many practical solutions to simplify multiplication, like truncated and logarithmic multipliers. These methods consume less time and power but introduce errors. Nevertheless, they can be used in situations where a shorter time delay is more important than accuracy. In digital signal processing, these conditions are often met, especially in video compression and tracking, where integer arithmetic gives satisfactory results. This paper presents and compare different multipliers in a logarithmic number system. For the hardware implementation assessment, the multipliers are implemented on the Spartan 3 FPGA chip and are compared against speed, resources required for implementation, power consumption and error rate. We also propose a simple and efficient logarithmic multiplier with the possibility to achieve an arbitrary accuracy through an iterative procedure. In such a way, the error correction can be done almost in parallel (actually this is achieved through pipelining) with the basic multiplication. The hardware solution involves adders and shifters, so it is not gate and power consuming. The error of proposed multiplier for operands ranging from 8 bits to 16 bits indicates a very low relative error percentage.

The Journal of Supercomputing | 2013

A GPU implementation of a structural-similarity-based aerial-image classification

Rok Češnovar; Vladimir Risojevic; Zdenka Babic; Tomaž Dobravec; Patricio Bulić

There is an increasing need for fast and efficient algorithms for the automatic analysis of remote-sensing images. In this paper we address the implementation of the semantic classification of aerial images with general-purpose graphics-processing units (GPGPUs). We propose the calculation of a local Gabor-based structural texture descriptor and a structural texture similarity metric combined with a nearest-neighbor classifier and image-to-class similarity on CUDA supported graphics-processing units. We first present the algorithm and then describe the GPU implementation and optimization with the CUDA programming model. We then evaluate the results of the algorithm on a dataset of aerial images and present the execution times for the sequential and parallel implementations of the whole algorithm as well as measurements only for the selected steps of the algorithm. We show that the algorithms for the image classification can be effectively implemented on the GPUs. In our case, the presented algorithm is around 39 times faster on the Tesla C1060 unit than on the Core i5 650 CPU, while keeping the same success rate of classification.

international conference on computer design | 2011

A simple pipelined squaring circuit for DSP

Vladimir Risojevic; Aleksej Avramovic; Zdenka Babic; Patricio Bulić

There are many digital signal processing applications where a shorter time delay of algorithms and efficient implementations are more important than accuracy. Since squaring is one of the fundamental operations widely used in digital signal processing algorithms, approximate squaring is proposed. We present a simple way of approximate squaring that allows achieving a desired accuracy. The proposed method uses the same simple combinational logic for the first approximation and correction terms. Performed analysis for various bit-length operands and level of approximation showed that maximum relative errors and average relative errors decrease significantly by adding more correction terms. The proposed squaring method can be implemented with a great level of parallelism. The pipelined implementation is also proposed in this paper. The proposed squarer achieved significant savings in area and power when compared to multiplier based squarer. As an example, an analysis of the impact of Euclidean distance calculation by approximate squaring on image retrieval is performed.

ieee international conference on high performance computing data and analytics | 2002

Introducing the vector C

Patricio Bulić; Veselko Gustin

This paper presents the vector C (VC) language, which is designed for the multimedia extensions included in all modern microprocessors. The paper discusses the language syntax, the implementation of its compiler and its use in developing multimedia applications. The goal was to provide programmers with the most natural way of using multi-media processing facilities in the C language. The VC language has been used to develop some of the most frequently used multimedia kernels. The experiments on these scientific and multimedia applications have yielded good performance improvements

european conference on parallel processing | 2001

Macro Extension for SIMD Processing

Patricio Bulić; Veselko Gustin

The need for multimedia applications has prompted the addition of a SIMD instruction set On the one hand we have modern multimedia execution hardware and on the other we have the software and the general compilers which are not able to automatically exploit the multimedia instruction set. Our solution to these problems is to find statement candidates in the program written in the language C/C++ (as we mainly use this language), and to employ the SIMD instruction set in the easiest possible way. We proposed the algorithm for identifying candidates for parallel processing (ICPP) which is based on the syntax and semantic cheching of statements. We define the macro library MacroVect.c as the substitution for the discovered statement candidates.

Microelectronics Journal | 2014

An approximate logarithmic squaring circuit with error compensation for DSP applications

Aleksej Avramovic; Zdenka Babic; Dušan Raič; Drago Strle; Patricio Bulić

The squaring function is one of the frequently used arithmetic functions in DSP, so an approximation of the squaring function is acceptable as long as this approximation corrupts the bits that are already corrupted by noise, and does not degrade application@?s performance significantly. Approximation of the squaring function can lead to significant savings in hardware and processing time. Previously proposed approximations of the squaring function include LUT-based solutions, linear interpolation of the squaring function and minimization of combinational logic. This paper proposes approximation based on a simple logarithmic interpolation of a squaring function with a simple logic block, which can be reused for the error compensation. The proposed block performs approximation of the squaring function with a shift operation and a carry-free subtraction. The proposed approximate squarer with one compensation block achieves the average relative error below 1.5% for any bit length, while maintaining a low power consumption. In order to evaluate the device utilization, the propagation delay and power consumption and to compare it with the existing solutions, we have synthesized the proposed squarer and the existing solutions for the standard cell library and 0.25@mm CMOS process parameters.

Explore More