Ernest Jamro
AGH University of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ernest Jamro.
Proceedings of the 26th Euromicro Conference. EUROMICRO 2000. Informatics: Inventing the Future | 2000
Kazimierz Wiatr; Ernest Jamro
Investigates different architectures implementing bit-parallel constant-coefficient multiplication in FPGA structures. First, multiplierless multiplication (MM) architectures employing canonic sign digit (CSD) and sub-structure sharing methods are addressed, and a novel algorithm for the conversion from twos-complement to CSD representation is presented. In the second part of this paper, lookup table-based multiplication (LM) is investigated. Correspondingly, the usage of different memory modules and finding the optimal combination of the memory and adders are considered. The LM architecture also considers reduction of the address width for each memory cell and the possibility of memory sub-structure sharing. Finally, implementation results for the Xilinx XC4000 and Virtex families are presented. As a result, MM generally surpasses the LM architecture. However, the actual choice between these two architectures is coefficient- and input parameter-dependent.
international conference on information technology coding and computing | 2000
Kazimierz Wiatr; Ernest Jamro
In this paper different architectures for real time image constant coefficients convolutions are considered. Accordingly, look-up-table (LUT) based multiplication/convolution, LUT based distributed arithmetic (DA) convolution and multiplierless convolution (MC) implementations into FPGA structures has been investigated. In one result, the choice between these architectures depends on given coefficients values, however in most cases the MC preferable. Furthermore the change of coefficient values in real-time systems is also considered. This work is a contribution to worldwide intense research on developing reconfigurable and user dedicated custom computing machines (CCM).
applied reconfigurable computing | 2008
Maciej Wielgosz; Ernest Jamro; Kazimierz Wiatr
This paper presents implementation of the double precision exponential function. A novel table-based architecture, together with short Taylor expansion, provides low latency (30 clock cycles) which is comparable to 32-bit implementations. Low area consumption of a single exp()module (roughtly 4% of XC4LX200) allows implementation of several parallel modules on a single FPGAs. The exp() function was implemented on the SGI RASC platform, thus external memory interface limitation allowed only a twin module parallelism. Each module is capable of processing at speed of 200 MHz with max. error of 1 ulp, RMSE equals 0,62. This implementation aims primarily to meet quantum chemistrys huge and strict requirements of precision and speed.
field-programmable logic and applications | 2007
Ernest Jamro; Kazimierz Wiatr; Maciej Wielgosz
Most presented implementations of the exponential function confine to the single precision format. Increasing data width to the double precision format requires a different approach. The presented novel architecture employs three independent Look-Up Tables (LUTs) together with a short Taylor expansion exp(x)ap1+times. Implementation results show that the double precision exp() function implementation achieves huge performance with satisfactory accuracy, latency and FPGA area consumption.
IFAC Proceedings Volumes | 2006
Ernest Jamro; Maciej Wielgosz; Kazimierz Wiatr
Abstract At first part of this paper, the architecture for quasi-static Huffman encoder is described which main part is Look-Up Table (LUT). In order to reduce the hardware requirements, the maximum length of the encoded word is limited. This reduces the compression ratio insignificantly which is proved in this paper. The dynamic encoding is achieved by a change of the LUT contents and hardware-software co-design approach. Consequently counting the input words statistics (histogram) and sorting the resultant histogram is implemented in hardware. The final calculation of the new LUT contents and controlling the whole system is achieved by the soft-processor Micro Blaze.
digital systems design | 2001
Ernest Jamro; Kazimierz Wiatr
Addition is a fundamental operation for the convolution (FIR filters). In FPGAs, addition should be carried out in a standard way employing ripple-carry adders (rather than carry-save adders), which complicates search for an optimal adder structure as routing order has a substantial influence on the addition cost. Further, complex parameters of inputs to the adders tree have been considered, e.g. correlation between inputs. These parameters are specified in different ways for different convolver architectures: Multiplierless Multiplication, Look-Up Table based Multiplication, Distributed Arithmetic. Furthermore, optimization techniques: Exhaustive Search and Greedy Algorithm have been implemented, and as a result, the Greedy Algorithm is the best solution when time of computation is of great importance. Otherwise, the Exhaustive Search should be employed for the number of the addition inputs n/spl les/8. This paper is a part of the research on the AuToCon-Automated Tool for generating Convolution in FPGAs.
information technology interfaces | 2001
Ernest Jamro; Kazimierz Wiatr
Addition is an essential operation for convolution (or FIR filters). In FPGAs, addition should be carried out in a standard way employing ripple-carry adders (rather than carry-save adders), which complicates the search for an optimal adder structure as routing order has a substantial influence on the addition cost. Further, complex parameters of addition inputs have been considered e.g. correlation between inputs. These parameters are specified in different ways for different convolver architectures: multiplierless multiplication, look-up table based multiplication, distributed arithmetic. Furthermore, different optimisation techniques, exhausted search and simulated annealing, have been implemented. Otherwise, exhausted search should be employed for the number of the addition inputs n/spl les/8 or simulated annealing for n>8. Employing simulated annealing gives about 10-20% area reduction in comparison to the greedy algorithm. This paper is a part of the research on the AuToCon-automated tool for generating convolution in FPGAs.
digital systems design | 2001
Ernest Jamro; Kazimierz Wiatr
In FPGAs, an addition should be carried out in the standard way employing ripple-carry adders (rather than carry-save adders), which complicates search for an optimal adder structure as routing order has a substantial influence on the addition cost. Further, complex parameters of inputs to the adder block have been considered e.g. correlation between inputs. These parameters are specified in different ways for different convolver architectures. Consequently optimisation of the adder tree is a key issue addressed in this paper. Simulated Annealing and Genetic Programming have been proposed, and obtained results compared with the Greedy Algorithm (GrA) and the Exhaustive Search (ES). As a result, the GrA is the best solution when computation time is of great importance. Otherwise, the Simulated Annealing should be employed for the number of addition inputs N>8, and the ES is recommended for N/spl les/8. Employing the Simulated Annealing gives about 10-20% area reduction in comparison to the GrA.
automation, robotics and control systems | 2013
Marcin Pietron; Maciej Wielgosz; Dominik Zurek; Ernest Jamro; Kazimierz Wiatr
This paper presents preliminary implementation results of the SVM (Support Vector Machine) algorithm. SVM is a dedicated mathematical formula which allows us to extract selective objects from a picture and assign them to an appropriate class. Consequently, a black and white images reflecting an occurrence of the desired feature is derived from an original picture fed into the classifier. This work is primarily focused on the FPGA and GPU implementations aspects of the algorithm as well as on comparison of the hardware and software performance. A human skin classifier was used as an example and implemented both on Intel Xeon E5645.40 GHz, Xilinx Virtex-5 LX220 and Nvidia Tesla m2090. It is worth emphasizing that in case of FPGA implementation the critical hardware components were designed using HDL (Hardware Description Language), whereas the less demanding or standard ones such as communication interfaces, FIFO, FSMs were implemented in Impulse C. Such an approach allowed us both to cut a design time and preserve a high performance of the hardware classification module. In case of GPU implementation whole algorithm is implemented in CUDA.
design and diagnostics of electronic circuits and systems | 2007
Ernest Jamro; Maciej Wielgosz; Kazimierz Wiatr
Highly parallel architecture for local histogram equalisation is studied. Three different kinds of approaches to the parallel architecture are regarded in this paper. (1) Module-level -which focuses on processing as many data as possible within a single module. (2) 1D -Several modules conducting simultaneously histogram equalization on partially overlapping (either horizontally or vertically) frames. (3) 2D -utilizes the same approach as 1D but in two dimensions. Beside above-mentioned solutions, differential processing of overlapping frames was also considered. At the end of this paper the optimal proportion of the above mention solutions are studied and implementation results given.