Daniel M. Muñoz
University of Brasília
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Daniel M. Muñoz.
2010 VI Southern Programmable Logic Conference (SPL) | 2010
Daniel M. Muñoz; Diego F. Sánchez; Carlos H. Llanos; Mauricio Ayala-Rincón
Computation of floating-point transcendental functions has a relevant importance in a wide variety of scientific applications, where the area cost, error and latency are important requirements to be attended. This paper describes a flexible FPGA implementation of a parameterizable floating-point library for computing sine, cosine, arctangent and exponential functions using the CORDIC algorithm. The novelty of the proposed architecture is that by sharing the same resources the CORDIC algorithm can be used in two operation modes, allowing it to compute the sine, cosine or arctangent functions. Additionally, in case of the exponential function, the architectures change automatically between the CORDIC or a Taylor approach, which helps to improve the precision characteristics of the circuit, specifically for small input values after the argument reduction. Synthesis of the circuits and an experimental analysis of the errors have demonstrated the correctness and effectiveness of the implemented cores and allow the designer to choose, for general-purpose applications, a suitable bit-width representation and number of iterations of the CORDIC algorithm.
symposium on integrated circuits and systems design | 2009
Diego F. Sánchez; Daniel M. Muñoz; Carlos H. Llanos; Mauricio Ayala-Rincón
Floating-point operations are an essential requisite in a wide range of computational and engineering applications that need good performance and high precision. Current advances in VLSI technology raised the density integration fast enough, allowing the designers to develop directly in hardware several floating-point operations commonly implemented in software. Until now, most of the research has not focused on the tradeoff among the need of high performance and the cost of the size of logic area, associated with the level of precision, parameters that are very important in a wide variety of applications such as robotics, image and digital signal processing. This paper describes an FPGA implementation of a parameterizable floating-point library for addition/subtraction, multiplication, division and square root operations. Architectures based on Goldschmidt algorithm were implemented for computing floating-point division and square root. The library is parameterizable by bit-width and number of iterations. An analysis of the mean square error and the cost in area consumption is done in order to find, for general purpose applications, the feasible bit-width representation, number of iterations and number of addressable words for storing initial seeds of the Goldschmidt algorithm.
bio-inspired computing: theories and applications | 2010
Daniel M. Muñoz; Carlos H. Llanos; Leandro dos Santos Coelho; Mauricio Ayala-Rincón
Particle Swarm Optimization (PSO) algorithms have been proposed to solve engineering problems that require to find an optimal point of operation. There are several embedded applications which requires to solve online optimization problems with a high performance. However, the PSO suffers on large execution times, and this fact becomes evident when using Reduced Instruction Set Computer (RISC) microprocessors in which the operational frequencies are low in comparison with the high operational frequencies of traditional personal computers (PCs). This paper compares two hardware implementations of the parallel PSO algorithm using an efficient floating-point arithmetic which perform computations with large dynamic range and high precision. The full-parallel and the partially-parallel PSO architectures allow the parallel capabilities of the PSO to be exploited in order to decrease the running time. Three well-known benchmark test functions have been used to validate the hardware architectures and a comparison in terms of cost in logic area, quality of the solution and performance is reported. In addition, a comparison of the execution time between the hardware and two C-code software implementations, based on a Intel Core Duo at 1.6GHz and a embedded Microblaze microprocessor at 50MHz, are presented.
symposium on integrated circuits and systems design | 2006
Daniel M. Muñoz; Carlos H. Llanos; Mauricio Ayala-Rincón; Rudi H. van Els; Renato P. Almeida
Elevator Group Control Systems (EGCSs) manage multiple elevators in a building transporting efficiently passengers. The performance of an EGCS is measured by means of several metrics such as the average waiting time of passengers, the percentage of the passengers waiting more than some predetermined time, power consumption, among others. Four elevator dispatching algorithms are analyzed and implemented using reconfigurable architectures based on FPGAs. The system is based on Local Controller Systems (LCSs), one for each elevator, and a protocol based on an RS485 network for interconnecting the LCSs. The FPGAs implement the LCSs. A Java interface was implemented for testing and monitoring the system and the EGCS function. The novelty of this approach is that the LCSs are capable to run the different dispatching algorithms, which are suitable for different passenger traffic situations, while the EGCS only must determine the best algorithm to be run in each LCS. The data traffic in the network is reduced given that the EGCS is not directly involved in calculating next floors to be visited. The algorithms were described in VHDL and implemented on Spartan3 FPGA based boards.
southern conference programmable logic | 2011
Daniel M. Muñoz; Carlos H. Llanos; Leandro dos Santos Coelho; Mauricio Ayala-Rincón
Achieving high performance optimization algorithms for embedded applications can be very challenging, particularly when several requirements such as high accuracy computations, short elapsed time, area cost, low power consumption and portability must be accomplished. This paper proposes a hardware implementation of the Particle Swarm Optimization algorithm with passive congregation (HPPSOpc), which was developed using several floating-point arithmetic libraries. The passive congregation is a biological behavior which allows the swarm to preserve its integrity, balancing between global and local search. The HPPSOpc architecture was implemented on a Virtex5 FPGA device and validated using two multimodal benchmark problems. Synthesis, simulation and execution time results demonstrates that the passive congregation approach is a low cost solution for solving embedded optimization problems with a high performance.
IEEE Transactions on Very Large Scale Integration Systems | 2009
Daniel M. Muñoz; Diego F. Sánchez; Carlos H. Llanos; Mauricio Ayala-Rincón
Several scientific applications need a high precision computation of transcendental functions. This paper presents a hardware implementation of a parameterizable floating-point library for computing sine, cosine and arctangent functions using both CORDIC algorithm and Taylor series expansion for different bit-width representations. The results include the accuracy as a design criterion of the proposed hardware architectures; therefore, a tradeoff analysis between the cost in area and the number of iterations against the error associated is done in order to choose a suitable format for computing transcendental functions. The proposed architectures were validated using the Matlab results as a statistical estimator in order to compute the Mean Square Error (MSE). Synthesis and simulation results demonstrate the correctness and effectiveness of the implemented hardware transcendental functions.
latin american symposium on circuits and systems | 2013
Sérgio Cruz; Daniel M. Muñoz; Milton E. Conde; Carlos H. Llanos; Geovany Araujo Borges
This work describes a hardware architecture for implementing a sequential approach of the Extended Kalman Filter (EKF) that is suitable for mobile robotics tasks, such as self-localization, mapping, and navigation problems. As such algorithm is computationally intensive, commonly it is implemented in Personal Computer (PC)-based platform to be employed on larger robots. In order to allow the development of small robotic platforms, as those required in many current state of the art research (for instance microrobotics area), small size, low-power and high floating-point computing capability targets are required, as well as specific architectures designed for them. Thus, the proposed architecture has been achieved, for self-localization task, using floating-point arithmetic operators (in simple precision), allowing the fusion of data coming from different sensors such as ultrasonic (Sonar) and Laser Range Finder (LRF). The system has been adapted for achieving a reconfigurable platform, and applied to a Pioneer 3-AT mobile robot.
southern conference programmable logic | 2011
Jones Y. Mori; Camilo Sánchez-Ferreira; Daniel M. Muñoz; Carlos H. Llanos; Pedro de Azevedo Berger
Currently the market and the academic community have required applications of image and video processing with several real-time constraints. In order to seek an alternative design that allows the rapid development of real time image processing systems this paper proposes an unified hardware architecture for some image filtering algorithms in space domain, such as windowing-based operations, which are implemented on FPGAs (Field Programmable Gate Arrays). For achieving this, six different filters have been implemented in a parallel approach, separating them in simple hardware structures, allowing the algorithms to explore their parallel capabilities by using a simple systolic architecture. In this system all implemented algorithms run in parallel allowing the user to select a defined output for depicting it in a display. Both image processing and synthesis results have demonstrated the feasibility of FPGAs for implementing the proposed filtering algorithms in a full parallel approach.
International Journal of Reconfigurable Computing | 2010
Diego F. Sánchez; Daniel M. Muñoz; Carlos H. Llanos; José Mauricio S. T. Motta
Hardware acceleration in high performance computer systems has a particular interest for many engineering and scientific applications in which a large number of arithmetic operations and transcendental functions must be computed. In this paper a hardware architecture for computing direct kinematics of robot manipulators with 5 degrees of freedom (5 D.o.f) using floating-point arithmetic is presented for 32, 43, and 64 bit-width representations and it is implemented in Field Programmable Gate Arrays (FPGAs). The proposed architecture has been developed using several floating-point libraries for arithmetic and transcendental functions operators, allowing the designer to select (pre-synthesis) a suitable bit-width representation according to the accuracy and dynamic range, as well as the area, elapsed time and power consumption requirements of the application. Synthesis results demonstrate the effectiveness and high performance of the implemented cores on commercial FPGAs. Simulation results have been addressed in order to compute the Mean Square Error (MSE), using the Matlab as statistical estimator, validating the correct behavior of the implemented cores. Additionally, the processing time of the hardware architecture was compared with the same formulation implemented in software, using the PowerPC (FPGA embedded processor), demonstrating that the hardware architecture speeds-up by factor of 1298 the software implementation.
symposium on integrated circuits and systems design | 2012
Jones Yudi Mori; Janier Arias-García; Camilo Sánchez-Ferreira; Daniel M. Muñoz; Carlos H. Llanos; José Mauricio S. T. Motta
This work presents the development of an integrated hardware/software sensor system for moving object detection and distance calculation, based on background subtraction algorithm. The sensor comprises a catadioptric system composed by a camera and a convex mirror that reflects the environment to the camera from all directions, obtaining a panoramic view. The sensor is used as an omnidirectional vision system, allowing for localization and navigation tasks of mobile robots. Several image processing operations such as filtering, segmentation and morphology have been included in the processing architecture. For achieving distance measurement, an algorithm to determine the center of mass of a detected object was implemented. The overall architecture has been mapped onto a commercial low-cost FPGA device, using a hardware/software co-design approach, which comprises a Nios II embedded microprocessor and specific image processing blocks, which have been implemented in hardware. The background subtraction algorithm was also used to calibrate the system, allowing for accurate results. Synthesis results show that the system can achieve a throughput of 26.6 processed frames per second and the performance analysis pointed out that the overall architecture achieves a speedup factor of 13.78 in comparison with a PC-based solution running on the real-time operating system xPC Target.