Karl Leboeuf | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Karl Leboeuf is active.

Explore More

Publication

Featured researches published by Karl Leboeuf.

international symposium on circuits and systems | 2009

Efficient hardware implementation of the hyperbolic tangent sigmoid function

Ashkan Hosseinzadeh Namin; Karl Leboeuf; Roberto Muscedere; Huapeng Wu; Majid Ahmadi

Efficient implementation of the activation function is important in the hardware design of artificial neural networks. Sigmoid, and hyperbolic tangent sigmoid functions are the most widely used activation functions for this purpose. In this paper, we present a simple and efficient architecture for digital hardware implementation of the hyperbolic tangent sigmoid function. The proposed method employs a piecewise linear approximation as a foundation, and further improves the results using a lookup table. Our design proves to be more efficient considering area × delay as a performance metric when compared to similar proposals. VLSI implementation of the proposed design using a 0.18µm CMOS process is also presented, which shows a 35% improvement over similar recently published architectures.

international conference on hybrid information technology | 2008

High Speed VLSI Implementation of the Hyperbolic Tangent Sigmoid Function

Karl Leboeuf; Ashkan Hosseinzadeh Namin; Roberto Muscedere; Huapeng Wu; Majid Ahmadi

The hyperbolic tangent function is commonly used as the activation function in artificial neural networks. In this work two different hardware implementations for the hyperbolic tangent function are proposed. Both methods are based on the approximation of the function rather than calculating it, since it has exponential nature. The first method uses a lookup table to approximate the function, while the second method reduces the size of the table by using range addressable decoding as opposed to the classic decoding scheme. Hardware synthesis results show the proposed methods perform significantly faster, and use less area compared to other similar methods with the same amount of error.

international symposium on circuits and systems | 2013

A GPU implementation of the Montgomery multiplication algorithm for elliptic curve cryptography

Karl Leboeuf; Roberto Muscedere; Majid Ahmadi

This work presents a GPU implementation of the Montgomery multiplication algorithm that is heavily optimized for the GPUs SEVID architecture, as well as the field sizes and constraints required for elliptic curve cryptography. We present and compare the throughput results of our proposed algorithm for 10 commonly used field sizes from 112 to 521 bits. When executed by our NVIDIA GTX-480 GPU device, the proposed algorithms measured throughput in multiplication operations per second is 1.24 to 1.72 times greater than the next fastest GPU-based algorithm running on the same device, and is significantly greater than all other published CPU and GPU-based implementations. The proposed work could be used as a component of an elliptic curve cryptography acceleration appliance, or for cryptanalysis.

international midwest symposium on circuits and systems | 2011

Performance analysis of table-based approximations of the hyperbolic tangent activation function

Karl Leboeuf; Roberto Muscedere; Majid Ahmadi

When designing an artificial neural network system in hardware, the implementation of the activation function is an important consideration. The hyperbolic tangent activation function is the most popular, and many approaches exist to approximate it, with varying trade-offs between area utilization and delay. Unfortunately, there is little data available reporting the minimum accuracy required of the activation function approximation in order to obtain good system-level performance; this is particularly the case for table-based approximation methods. In this paper, we demonstrate that table-based approximation methods are very well suited for implementing the tanh activation function, as well as its derivative in a variety of feed-forward artificial neural network topologies which employ the popular RPROP or Levenberg-Marquardt training algorithms. It is shown that when these training methods are used, an activation function possessing a relatively high maximum error can be used to obtain results comparable to floating point. This discovery suggests that these table-based methods can be employed with extreme efficiency in terms of area and speed, rendering them a promising option for any VLSI or FPGA artificial neural network hardware design.

Iet Circuits Devices & Systems | 2010

High-speed hardware implementation of a serial-in parallel-out finite field multiplier using reordered normal basis

Ashkan Hosseinzadeh Namin; Karl Leboeuf; Roberto Muscedere; Huapeng Wu; Majid Ahmadi

A high-speed VLSI implementation of a 233-bit serial-in parallel-out finite field multiplier is presented. The proposed design performs multiplication using a reordered normal basis; a permutation of a type II optimal normal basis. The multiplier was realised in a 0.18-?m CMOS technology using multiples of a domino logic block. The multiplier was simulated, and functioned correctly up to a clock rate of 1.587 ?GHz, achieving greater performance while occupying less area compared to similar designs. The presented design methodology can also be used for other finite field multipliers possessing regular architectures. This multiplier?s size of 233 bits is currently recommended by the National Institute of Standards and Technology (NIST) in their elliptic curve digital signature standard (ECDSS), and is used in practice for binary field multiplication in Elliptic Curve Cryptography (ECC).

electro information technology | 2009

Artificial neural networks activation function HDL coder

Ashkan Hosseinzadeh Namin; Karl Leboeuf; Huapeng Wu; Majid Ahmadi

The sigmoid and hyperbolic tangent functions are usually used as the activation functions in Artificial Neural Networks (ANNs). The exponential nature of these functions make them difficult for hardware implementation. Hence, several different methods for approximating them in hardware are proposed. In this work, we present a MATLAB toolbox called the “SigTan HDL Coder”, that generates synthesizable HDL Code which approximates these functions in hardware according to the specific user requirements. The HDL code is platform independent and can be used for FPGA as well as ASIC implementations. Input parameters to the system are the approximation error, input range, and the approximation method. Three different user-selectable methods for approximating the functions are programmed in the toolbox. All implemented approximation methods avoid the use of multipliers for their implementation, as multipliers are expensive hardware components in terms of area and speed.

international symposium on circuits and systems | 2012

High performance prime field multiplication for GPU

Karl Leboeuf; Roberto Muscedere; Majid Ahmadi

This paper presents a high performance algorithm for modular multiplication on a graphics processing unit (GPU) implemented in assembler. The proposed algorithm carries out finite field multiplication over the NIST prime fields of size 192, 224, 256 and 384 bits. Included is a detailed explanation of our algorithm, an instruction count analysis, and a comparison to recently published work; compared to the next fastest design, the proposed algorithms execution time is 27 to 71 times faster.

international symposium on circuits and systems | 2008

A dynamic address decode circuit for implementing range addressable look-up tables

Roberto Muscedere; Karl Leboeuf

A Range Addressable Look Up Table (RALUT) is a non-linear memory storage element that has been shown to significantly reduce hardware requirements for matching data in particular applications. However, its ability to perform parallel pattern matching on large words can be applied in many areas. Most of the RALUT circuits presented in literature thus far are built with logic gates and tri-state buffers so that they are easily synthesizable and implemented with other components of the overall design. These circuits are not competitive with modern memory in terms of area, timing, power and functionality. The only significant difference between a RALUT and a standard LUT is the address decoding system. In this paper, we will show a preliminary dynamic address decode circuit which can be used to build a scalable full custom read-only RALUT implementations. We will show significant reductions in area, timing and power compared to a previously published synthesized version.

international conference on image analysis and recognition | 2011

Wavelet domain blur invariants for 1D discrete signals

Iman Makaremi; Karl Leboeuf; Majid Ahmadi

Wavelet domain blur invariants, which were proposed for the first time in [10] by the authors, are modified in order to suit a wider range of applications. With the modified blur invariants, it is possible to address the applications in which the blur systems are not necessarily energy-preserving. Also, for a simpler implementation of the wavelet decomposition for discrete signals, we use a method which preserves an important property of the invariants: shift invariance. The modified invariants are utilized in two different experiments in order to evaluate their performance.

international midwest symposium on circuits and systems | 2010

Efficient VLSI implementation of a finite field multiplier using reordered normal basis

Karl Leboeuf; Ashkan Hosseinzadeh Namin; Huapeng Wu; Roberto Muscedere; Majid Ahmadi

A new VLSI implementation for a finite field multiplier using reordered normal basis is presented. The hardware architecture uses domino logic building blocks as well as True Single Phase Clock (TSPC) flip-flops to achieve exceptional performance. The multiplier has been realized in a 0.18 µm CMOS process and can perform multiplication correctly up to a clock rate of 1.789 GHz, requiring 62048 µm2 of silicon area. Compared to similar implementations, the new design yields a 43% reduction in area utilization, and a 12% increase in maximum operating speed. The size of the multiplier, 233, is recommended by the National Institute of Standard and Technology (NIST) for elliptic key cryptography. Finite field multipliers such as the proposed one have applications in public key cryptography for constrained devices such as smart cards or hand held devices.

Explore More