Julien Lamoureux | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Julien Lamoureux is active.

Explore More

Publication

Featured researches published by Julien Lamoureux.

international conference on computer aided design | 2003

On the Interaction Between Power-Aware FPGA CAD Algorithms

Julien Lamoureux; Steven J. E. Wilton

As Field-Programmable Gate Array (FPGA) power consumptioncontinues to increase, lower power FPGA circuitry, architectures,and Computer-Aided Design (CAD) tools need to be developed.Before designing low-power FPGA circuitry, architectures, orCAD tools, we must first determine where the biggest savings (interms of energy dissipation) are to be made and whether thesesavings are cumulative. In this paper, we focus on FPGA CADtools. Specifically, we describe a new power-aware CAD flow forFPGAs that was developed to answer the above questions.Estimating energy using very detailed post-route power and delaymodels, we determine the energy savings obtained by our power-awaretechnology mapping, clustering, placement, and routingalgorithms and investigate how the savings behave when thealgorithms are applied concurrently. The individual savings of thepower-aware technology-mapping, clustering, placement, androuting algorithms were 7.6%, 12.6%, 3.0%, and 2.6%respectively. The majority of the overall savings were achievedduring the technology mapping and clustering stages of the power-awareFPGA CAD flow. In addition, the savings were mostlycumulative when the individual power-aware CAD algorithmswere applied concurrently with an overall energy reduction of 22.6%.

field-programmable logic and applications | 2006

Activity Estimation for Field-Programmable Gate Arrays

Julien Lamoureux; Steven J. E. Wilton

This paper examines various activity estimation techniques in order to determine which are most appropriate for use in the context of field-programmable gate arrays (FPGAs). Specifically, the paper compares how different activity estimation techniques affect the accuracy of FPGA power models and the ability of power-aware FPGA CAD tools to minimize power. After comparing various existing techniques, the most suitable existing techniques are combined with two novel enhancements to create a new activity estimation tool called ACE-2.0. Finally, the new publicly available tool is compared to existing tools to validate the improvements. Using activities estimated by ACE-2.0, the power estimates and power savings were both within 1% of the results obtained using simulated activities

field-programmable logic and applications | 2007

Clock-Aware Placement for FPGAs

Julien Lamoureux; Steven J. E. Wilton

The programmable clock networks in FPGAs have a significant impact on overall power, area, and delay. Not only does the clock network itself dissipate a significant amount of power, since it connects to every latch on the FPGA and toggles every cycle, but the design of the clock network also affects how efficiently the rest of the application can be implemented since it imposes constraints on the CAD tools which map the application onto the FPGA. To examine this tradeoff, this paper describes and compares new clock-aware placement techniques and then examines how the clock network architecture affects overall power, area, and delay. Our results show that the placement techniques used to make placement clock-aware have a significant influence on power and delay. On average, circuits placed using the most effective techniques dissipate 9.9% less energy and were 2.4% faster than circuits placed using the least effective techniques. Moreover, the results show that the clock network architecture is also important. On average, FPGAs with an efficient clock network were up to 12.5% more energy efficient and 7.2% faster than other FPGAs.

IEEE Transactions on Very Large Scale Integration Systems | 2008

GlitchLess: Dynamic Power Minimization in FPGAs Through Edge Alignment and Glitch Filtering

Julien Lamoureux; Guy Lemieux; Steven J. E. Wilton

This paper describes GlitchLess, a circuit-level technique for reducing power in field-programmable gate arrays (FPGAs) by eliminating unnecessary logic transitions called glitches. This is done by adding programmable delay elements to the logic blocks of the FPGA. After routing a circuit and performing static timing analysis, these delay elements are programmed to align the arrival times of the inputs of each lookup table (LUT), thereby preventing new glitches from being generated. Moreover, the delay elements also behave as filters that eliminate other glitches generated by upstream logic or off-chip circuitry. On average, the proposed implementation eliminates 87% of the glitching, which reduces overall FPGA power by 17%. The added circuitry increases the overall FPGA area by 6% and critical-path delay by less than 1%. Furthermore, since it is applied after routing, the proposed technique requires little or no modifications to the routing architecture or computer-aided design (CAD) flow.

field programmable gate arrays | 2006

FPGA clock network architecture: flexibility vs. area and power

Julien Lamoureux; Steven J. E. Wilton

This paper examines the tradeoffs between flexibility, area, and power dissipation of programmable clock networks for Field-Programmable Gate Arrays (FPGAs). The paper begins by describing a parameterized clock network model that describes a broad range of programmable clock network architectures. Specifically, the model supports architectures with multiple local and global clock domains and varying amounts of flexibility at various levels of the clock network. Using the model, the architectural parameters that control the flexibility of the clock network are varied to determine the cost of this flexibility in terms of area and power dissipation. From these experiments, the study finds that area and power costs are highest for networks with flexibility close to the logic blocks. Furthermore, it found that clock networks with local clock domains have little overhead and are significantly more efficient than clock networks without local clock domains for applications with multiple clocks.

adaptive hardware and systems | 2008

An Overview of Low-Power Techniques for Field-Programmable Gate Arrays

Julien Lamoureux; Wayne Luk

This paper provides an overview of low-power techniques for field-programmable gate arrays (FPGAs). It covers system-level design techniques and device-level design techniques that have targeted current commercial devices. It also describes current research on circuit-level and architecture-level design techniques. Recent studies on power modelling and on low-power computer-aided design (CAD) are also reported. Finally, it proposes future work that would enable the use of FPGA technology in applications where power and energy consumption is critical, such as mobile devices.

field programmable gate arrays | 2007

GlitchLess: an active glitch minimization technique for FPGAs

Julien Lamoureux; Guy Lemieux; Steven J. E. Wilton

This paper describes a technique that reduces dynamic power in FPGAs by reducing the number of glitches in the global routing resources. The technique involves adding programmable delay elements within the logic blocks of an FPGA to programmably align the arrival times of early-arriving signals to the inputs of the lookup tables and to filter out glitches generated by earlier circuitry. On average, the proposed technique eliminates 91% of the glitching, which reduces overall FPGA power by 18%. The added circuitry increases overall area by 5% and critical-path delay by less than 1%. Furthermore, since it is applied after routing, the proposed technique requires no modifications to the existing FPGA routing architecture or CAD flow.

ACM Transactions on Reconfigurable Technology and Systems | 2008

On the trade-off between power and flexibility of FPGA clock networks

Julien Lamoureux; Steven J. E. Wilton

FPGA clock networks consume a significant amount of power, since they toggle every clock cycle and must be flexible enough to implement the clocks for a wide range of different applications. The efficiency of FPGA clock networks can be improved by reducing this flexibility; however, reducing the flexibility introduces stricter constraints during the clustering and placement stages of the FPGA CAD flow. These constraints can reduce the overall efficiency of the final implementation. This article examines the trade-off between the power consumption and flexibility of FPGA clock networks. Specifically, this article makes three contributions. First, it presents a new parameterized clock-network framework for describing and comparing FPGA clock networks. Second, it describes new clock-aware placement techniques that are needed to find a legal placement satisfying the constraints imposed by the clock network. Finally, it performs an empirical study to examine the trade-off between the power consumption of the clock network and the impact of the CAD constraints for a number of different clock networks with varying amounts of flexibility. The results show that the techniques used to produce a legal placement can have a significant influence on power and the ability of the placer to find a legal solution. On average, circuits placed using the most effective techniques dissipate 5% less overall energy and are significantly more likely to be legal than circuits placed using other techniques. Moreover, the results show that the architecture of the clock network is also important. On average, FPGAs with an efficient clock network are up to 14.6% more energy efficient compared to other FPGAs.

international symposium on circuits and systems | 2001

Fast and low-power inner product processor

Natalia Kazakova; R. Sung; Nelson G. Durdle; Martin Margala; Julien Lamoureux

This paper describes a novel fast and low-power twos complement fixed-point inner product processor targeted for 3D volume rendering applications. The design was simulated and fabricated in a 0.18 /spl mu/m, six-metal, single-poly CMOS process. Speed and power optimizations were achieved at both the architecture and circuit levels. The inner product operation, consisting of multiplications and additions, is merged into a composite function in order to minimize circuit area, improve speed and minimize power dissipation. Individual functional blocks including the partial product: encoder, reduction tree and final two operand adder were built entirely using complementary pass transistor logic (CPL) with reduced swing internal nodes to minimize the energy required during switching. Simulated results for a (8/spl times/8/spl times/8)-b implementation showed an average evaluation time of 2.6 ns at 1.8 V. The average power dissipation at 200 MHz was 2.53 mW.

southern conference programmable logic | 2008

The Coarse-Grained / Fine-Grained Logic Interface in FPGAs with Embedded Floating-Point Arithmetic Units

Chi Wai Yul; Julien Lamoureux; Steven J. E. Wilton; Philip Heng Wai Leong; Wayne Luk

This paper examines the interface between fine-grained and coarse-grained programmable logic in FPGAs. Specifically, it presents an empirical study that covers the location, pin arrangement, and interconnect between embedded floating point units (FPUs) and the fine-grained logic fabric in FPGAs. The results show that (1) FPUs should be square, (2) FPUs should be positioned tightly near the center of the FPGA and (3) that the FPU pins should be arranged on four sides of the FPU.

Explore More