L. de Macedo Mourelle

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where L. de Macedo Mourelle is active.

Explore More

Publication

Featured researches published by L. de Macedo Mourelle.

symposium on integrated circuits and systems design | 2002

Two hardware implementations for the Montgomery modular multiplication: sequential versus parallel

Nadia Nedjah; L. de Macedo Mourelle

Modular multiplication is the most dominant part of the computation performed in public-key cryptography systems such as the RSA cryptosystem. The operation is time consuming for large operands. This paper describes the characteristics of two architectures designed to implement modular multiplication using the fast Montgomery algorithm: the first FPGA prototype has an iterative sequential architecture while the second has a systolic array-based architecture. The paper compares both prototypes using the time/spl times/area classic factor.

digital systems design | 2002

Reconfigurable hardware implementation of Montgomery modular multiplication and parallel binary exponentiation

Nadia Nedjah; L. de Macedo Mourelle

Modular exponentiation and modular multiplication are the cornerstone computations performed in public-key cryptography systems such as RSA cryptosystem. The operations are time consuming for large operands. Much research effort is directed towards an efficient hardware implementation of both operations. This paper describes the characteristics of two architectures: the first one implements modular multiplication using a systolic version of the fast Montgomery algorithm and the other to implement the parallel binary exponentiation algorithm. The latter uses two Montgomery modular multipliers. Results in terms of space and time requirements for an FPGA prototype are given.

international conference on control applications | 2003

A reconfigurable recursive and efficient hardware for Karatsuba-Ofman's multiplication algorithm

Nadia Nedjah; L. de Macedo Mourelle

Multiplication of long integers is a cornerstone primitive in most public-key cryptosystems. Multiplication for big numbers can be performed best using Karatsuba-Ofmans divide-and-conquer approach. We propose a recursive and efficient hardware for Karatsuba-Ofmans multiplication algorithm. The hardware is efficient in terms of response time and fairly compact in terms of hardware description language VHDL. The performance of the synthesised hardware in terms of time and area requirements is compared with that of Synopsys/spl trade/ library multiplier as well as two different multipliers that implement Booths algorithm. The proposed hardware multiplies faster that the other three. However, it requires more hardware area. Nevertheless our design improves the area/spl times/time product as well as time requirement while the other three improve area at the expense of both time requirement and the factor area/spl times/time.

latin american symposium on circuits and systems | 2013

Parallel GPU-based implementation of high dimension Particle Swarm Optimizations

Rogério de Moraes Calazan; Nadia Nedjah; L. de Macedo Mourelle

Particle Swarm Optimization (PSO) is an evolutionary heuristics-based method used for continuous function optimization. Compared to existing stochastic methods, PSO is very robust. Nevertheless, for real-world optimizations, it requires a high computational effort. In general, parallel implementations of PSO provide better performance. However, this depends heavily on the number and characteristics of the exploited processors. With the advent and large availability of Graphics Processing Units (GPUs) and the development and straightforward applicability of the Compute Unified Device Architecture platform (CUDA), several applications have benefited from the reduction of the execution time, exploiting massive parallelism. In this paper, we propose an alternative algorithm to massively parallelize the PSO algorithm and mapped it onto a GPU-based architecture. The algorithm focuses on the work done with respect to each of the problem dimension and does it in parallel.

symposium on computer architecture and high performance computing | 2009

High-performance hardware of the sliding-window method for parallel computation of modular exponentiations

Nadia Nedjah; L. de Macedo Mourelle

Modular exponentiation is a basic operation in various applications, such as cryptography. Generally, the performance of this operation has a tremendous impact on the efficiency of the whole application. Therefore, many researchers have devoted special interest to providing smart methods and efficient implementations for modular exponentiation. One of these methods is the sliding-window method, which pre- processes the exponent into zero and non-zero partitions. Zero partitions allow for a reduction of the number of modular multiplications required in the exponentiation process. In this paper, we devise a novel hardware for computing modular exponentiation using the sliding-window method. The partitioning strategy used allows variable-length non-zero partitions, which increases the average number of zero partitions and so decreases that of non-zero partitions. It performs the partitioning process in parallel with the pre-computation step of the exponent so no overhead is introduced. The implementation is efficient when compared against related existing hardware implementations.

international conference on information technology new generations | 2006

Four Hardware Implementations for the M-ary Modular Exponentiation

Nadia Nedjah; L. de Macedo Mourelle

Modular exponentiation is a cornerstone operation to several public-key cryptosystems. It is performed using successive modular multiplications. Clearly, one needs to reduce the total number of modular multiplication required. In this paper, we propose four hardware implementations for computing modular exponentiations using the m-ary method. During this step, the first implementation pre-computes all powers while the second computes only those that are necessary. The main difference between the first two implementations resides in the pre-processing step. However, the first implementation requires less hardware area than the second. The last two do require any pre-processing of the exponent. One of these two implementations is hardware only and the second uses the co-design methodology. We compare these two implementations using the performance factor, which takes into account both space and time requirements

international conference on information technology new generations | 2006

A Compact Piplined Hardware Implementation of the AES-128 Cipher

Nadia Nedjah; L. de Macedo Mourelle; M.P. Cardoso

Advanced encryption standard - AES is the new encryption standard. In this paper, we propose a very efficient pipelined hardware implementation of AES-128 cipher. It has a competitive throughput of more than 2 Gbits per second. Besides, improving the encryption throughput, the pipeline can be taken advantage of if the number of rounds (currently 10) must increase for security reasons

digital systems design | 2003

Stochastic reconfigurable hardware for neural networks

Nadia Nedjah; L. de Macedo Mourelle

In this paper, we propose reconfigurable, low-cost and readily available hardware architecture for an artificial neuron. This is used to build a feed-forward artificial neural network. For this purpose, we use field-programmable gate arrays, i.e. FPGAs. However, as the state-of-the-art FPGAs still lack the gate density necessary to the implementation of large neural networks of thousands of neurons, we use a stochastic process to implement the computation performed by a neuron. The multiplication and addition of stochastic values is simply implemented by an ensemble of XNOR and AND gates respectively.

international conference on information technology coding and computing | 2005

Reconfigurable hardware for addition chains based modular exponentiation

L. de Macedo Mourelle; Nadia Nedjah

In several public-key cryptosystems, the main operation consists of the modular exponentiation, which is performed using successive modular multiplications. The size of the operands that are used in these cryptosystems is considerably large (1024 bits), consuming a considerable amount of time. This impacts on the performance of the cryptosystem, especially in real time applications. In order to reduce the execution time in these cryptosystems, the total number of modular multiplications must be reduced. There are several methods that attempt to reduce this number either by partitioning the exponent in windows or by reducing the number of elements to be multiplied. In this paper, we propose a fast and compact reconfigurable hardware for computing modular exponentiation using the addition-chain methods.

digital systems design | 2005

Massively parallel hardware architecture for genetic algorithms

Nadia Nedjah; L. de Macedo Mourelle

In this paper, we propose a massively parallel architecture for hardware implementation of genetic algorithms. This is design is quite innovative as it provides a viable solution to the fitness computation problem, which depends heavily on the problem-specific knowledge. The proposed architecture is completely independent of such specifics. It implements the fitness computation using a neural network. The hardware implementation of the used neural network is stochastic and thus minimise the required hardware area without much increase in response time. Finally, we compare the proposed hardware and existing ones.

Explore More