Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Safeen Huda is active.

Publication


Featured researches published by Safeen Huda.


international solid-state circuits conference | 2010

Negative-resistance read and write schemes for STT-MRAM in 0.13µm CMOS

David Halupka; Safeen Huda; William Y. Song; Ali Sheikholeslami; Koji Tsunoda; Chikako Yoshida; Masaki Aoki

Spin-torque-transfer (STT) magnetoresistive random-access memory (MRAM) [1–3], a successor to field-induced magnetic switching MRAM [4,5], is an emerging non-volatile memory technology that is CMOS-compatible, scalable, and allows for high-speed access. However, two circuit-level challenges remain for STT-MRAM: potentially destructive read access due to device variation and a high-power write access. This paper presents two STT-MRAM access schemes: a negative-resistance read scheme (NRRS) that guarantees non-destructive read by design, and a negative-resistance write scheme (NRWS) that, on average, reduces the write power consumption by 10.5%. A fabricated and measured test-chip in 0.13µm CMOS confirms both properties.


field-programmable logic and applications | 2009

Clock gating architectures for FPGA power reduction

Safeen Huda; Muntasir Mallick; Jason Helge Anderson

Clock gating is a power reduction technique that has been used successfully in the custom ASIC domain. Clock and logic signal power are saved by temporarily disabling the clock signal on registers whose outputs do not affect circuit outputs. We consider and evaluate FPGA clock network architectures with built-in clock gating capability and describe a flexible placement algorithm that can operate with various gating granularities (various sizes of device regions containing clock loads that can be gated together). Results show that depending on the clock gating architecture and the fraction of time clock signals are enabled, clock power can be reduced by over 50%, and results suggest that a fine granularity gating architecture yields significant power benefits.


IEEE Transactions on Circuits and Systems | 2013

A Novel STT-MRAM Cell With Disturbance-Free Read Operation

Safeen Huda; Ali Sheikholeslami

This paper presents a three-terminal Magnetic Tunnel Junction (MTJ) and its associated two transistor cell structure for use as a Spin Torque Transfer Magnetoresistive Random Access Memory (STT-MRAM) cell. The proposed cell is shown to have guaranteed read-disturbance immunity; during a read operation, the net torque acting on the storage cell always acts in a direction to refresh the data stored in the cell. A simulation study is then performed to compare the merits of the proposed device against a conventional 1-Transistor-1-MTJ (1T1MTJ) cell, as well as a differential 2-Transistor 2-MTJ (2T2MTJ) cell. We also investigate In-Plane Anisotropy (IPA) and Perpendicular-to-Plane Anisotropy (PPA) versions of the proposed device. Simulation results confirm that the proposed device offers disturbance-free read operation while still offering significant performance advantages over the conventional 1T1MTJ cell in terms of average access time. The proposed cell also shows superior performance to the 2T2MTJ cell, particularly when the cells are targeted for read-mostly applications.


IEEE Transactions on Circuits and Systems | 2014

A Survey on Circuit Modeling of Spin-Transfer-Torque Magnetic Tunnel Junctions

Aynaz Vatankhahghadim; Safeen Huda; Ali Sheikholeslami

Accurate modeling of magnetic tunnel junction (MTJ) is critical for design of memories such as spin-transfer-torque magnetoresistive random access memory (STT-MRAM) and spin logic circuits such as spin flip flops. This paper reviews several static and dynamic models for the MTJ and compares them for their capabilities and limitations. Furthermore, a Verilog-A model is developed to predict dynamic characteristics of the MTJ. These models are used in simulating a prototype circuit to illustrate their strengths and weaknesses.


field programmable gate arrays | 2014

Optimizing effective interconnect capacitance for FPGA power reduction

Safeen Huda; Jason Helge Anderson; Hirotaka Tamura

We propose a technique to reduce the effective parasitic capacitance of interconnect routing conductors in a bid to simultaneously reduce power consumption and improve delay. The parasitic capacitance reduction is achieved by ensuring routing conductors adjacent to those used by timing critical or high activity nets are left floating - disconnected from either VDD or GND. In doing so, the effective coupling capacitance between the conductors is reduced, because the original coupling capacitance between the conductors is placed in series with other capacitances in the circuit (series combinations of capacitors correspond to lower effective capacitance). To ensure unused conductors can be allowed to float requires the use of tri-state routing buffers, and to that end, we also propose low-cost tri-state buffer circuitry. We also introduce CAD techniques to maximize the likelihood that unused routing conductors are made to be adjacent to those used by nets with high activity or low slack, improving both power and speed. Results show that interconnect dynamic power reductions of up to ~15.5% are expected to be achieved with a critical path degradation of ~1%, and a total area overhead of ~2.1%.


field-programmable custom computing machines | 2014

On Hard Adders and Carry Chains in FPGAs

Jason Luu; Conor McCullough; Sen Wang; Safeen Huda; Bo Yan; Charles Chiasson; Kenneth B. Kent; Jason Helge Anderson; Jonathan Rose; Vaughn Betz

Under some circumstances, the power flux density produced by emissions from a spacecraft suffers the presence of spurious frequencies. This occurs, for example, when idle data with long sequences of zeros are transmitted. At high data rates, randomizers may not be able to solve the problem. Because of the need to comply with the recommendations and standards, this can reflect on severe limits on the maximum data rates achievable. Such problem, experimentally observed in some recent missions, was first studied by Alvarez and Lesthievent, but an effective solution has not been found yet. We discuss the topic and formulate three proposals to compensate the drawback. We show they permit to reduce significantly the required margin at high data rates.Hardened adder and carry logic is widely used in commercial FPGAs to improve the efficiency of arithmetic functions. There are many design choices and complexities associated with such hardening, including circuit design, FPGA architectural choices, and the CAD flow. There has been very little study, however, on these choices and hence we explore a number of possibilities for hard adder design. We also highlight optimizations during front-end elaboration that help ameliorate the restrictions placed on logic synthesis by hardened arithmetic. We show that hard adders and carry chains, when used for simple adders, increase performance by a factor of four or more, but on larger benchmark designs that contain arithmetic, improve overall performance by roughly 15%. We measure an average area increase of 5% for architectures with carry chains but believe that better logic synthesis should reduce this penalty. Interestingly, we show that adding dedicated inter-logic-block carry links or fast carry look-ahead hardened adders result in only minor delay improvements for complete designs.Wideband channelization is a computationally intensive task within software-defined radio (SDR). To support this task, the underlying hardware should provide high performance and allow flexible implementations. Traditional solutions use field-programmable gate arrays (FPGAs) to satisfy these requirements. While FPGAs allow for flexible implementations, realizing a FPGA implementation is a difficult and time-consuming process. On the other hand, multicore processors while more programmable, fail to satisfy performance requirements. Graphics processing units (GPUs) overcome the above limitations. However, traditional GPUs are power-hungry and can consume as much as 350 watts, making them ill-suited for many SDR environments, particularly those that are battery-powered. Here we explore the viability of low-power mobile graphics processors to simultaneously overcome the limitations of performance, flexibility, and power. Via execution profiling and performance analysis, we identify major bottlenecks in mapping the wideband channelization algorithm onto these devices and adopt several optimization techniques to achieve multiplicative speed-up over a multithreaded implementation. Overall, our approach delivers a speedup of up to 43-fold on the discrete AMD Radeon HD 6470M GPU and 27-fold on the integrated AMD Radeon HD 6480G GPU, when compared to a vectorized and multithreaded version running on the AMD A4-3300M CPU.The ever increasing of product development and the scarcity of the energy resources that those manufacturing activities heavily rely on have made it of great significance the study on how to improve the energy efficiency in manufacturing environment. Energy consumption sensing and collection enables the development of effective solutions to higher energy efficiency. Further, it is found that the data on energy consumption of manufacturing machines also contains the information on the conditions of these machines. In this paper, methods of machine anomaly detection based on energy consumption information are developed and applied to cases on our Syil X4 computer numerical control (CNC) milling machine. Further, given massive amount of energy consumption data from large amount machining tasks, the proposed algorithms are being implemented on a Storm and Hadoop based framework aiming at online real-time machine anomaly detection.


field-programmable logic and applications | 2013

Charge recycling for power reduction in FPGA interconnect

Safeen Huda; Jason Helge Anderson; Hirotaka Tamura

We propose charge recycling (CR) to reduce power consumption in FPGAs. We take advantage of the property that many routing conductors are left unused in any FPGA implementation of an application. Charge recycling via the unused conductors reduces the amount of charge drawn from the supply, lowering energy consumption. We present a routing switch that operates in two modes: normal and CR, and describe the CAD tool changes needed to support CR at the routing and post-routing stages of the flow. Results show that dynamic power in the FPGA interconnect can be reduced by up to ~15-18.4% by the proposed techniques, depending on the performance constraints.


IEEE Transactions on Very Large Scale Integration Systems | 2016

Hybrid LUT/Multiplexer FPGA Logic Architectures

Stephen Alexander Chin; Jason Luu; Safeen Huda; Jason Helge Anderson

Hybrid configurable logic block architectures for field-programmable gate arrays that contain a mixture of lookup tables and hardened multiplexers are evaluated toward the goal of higher logic density and area reduction. Multiple hybrid configurable logic block architectures, both nonfracturable and fracturable with varying MUX:LUT logic element ratios are evaluated across two benchmark suites (VTR and CHStone) using a custom tool flow consisting of LegUp-HLS, Odin-II front-end synthesis, ABC logic synthesis and technology mapping, and VPR for packing, placement, routing, and architecture exploration. Technology mapping optimizations that target the proposed architectures are also implemented within ABC. Experimentally, we show that for nonfracturable architectures, without any mapper optimizations, we naturally save up to ~8% area postplace and route; both accounting for complex logic block and routing area while maintaining mapping depth. With architecture-aware technology mapper optimizations in ABC, additional area is saved, post-place-and-route. For fracturable architectures, experiments show that only marginal gains are seen after place-and-route up to ~2%. For both nonfracturable and fracturable architectures, we see minimal impact on timing performance for the architectures with best area-efficiency.


IEEE Transactions on Very Large Scale Integration Systems | 2017

Leveraging Unused Resources for Energy Optimization of FPGA Interconnect

Safeen Huda; Jason Helge Anderson

Conventional field-programmable gate arrays are typically overprovisioned with routing resources to ensure that they meet routeability targets, which results in increased routing static and dynamic power. In this paper, we leverage the excess routing conductors to reduce dynamic and static power. To reduce dynamic power, we propose to ensure that used routing conductors are adjacent to unused routing conductors, which are left floating to reduce the effective capacitance seen by active nets. To reduce static power, we observe that leakage in routing multiplexers is dominated by specific paths; if the routing conductors, which connect to the input pins on these paths, are unused and left floating, the leakage of the multiplexer may be significantly reduced. To ensure that unused conductors are allowed to float requires the use of tristate routing buffers, and thus we propose two low-cost tristate buffer topologies with different power and area-overhead tradeoffs. We also introduce CAD techniques to optimize the overall energy dissipation in the routing network using the proposed techniques. Results show that interconnect dynamic power reductions of up to 25%, interconnect static power reductions of up to 81%, and overall interconnect energy reductions ranging between 14.9%–42.7% are expected, with a critical path degradation of <1.8% and area-overhead of 2.6%–4.8%.


international symposium on physical design | 2016

Power Optimization of FPGA Interconnect Via Circuit and CAD Techniques

Safeen Huda; Jason Helge Anderson

We target power dissipation in field-programmable gate array (FPGA) interconnect and present three approaches that leverage a unique property of FPGAs, namely, the presence of unused routing conductors. A first technique attacks dynamic power by placing unused conductors, adjacent to used conductors, into a high-impedance state, reducing the effective capacitance seen by used conductors. A second technique, charge recycling, re-purposes unused conductors as charge reservoirs to reduce the supply current drawn for a positive transition on a used conductor. A third approach reduces leakage current in interconnect buffers by pulse-based signalling, allowing a driving buffer to be placed into a high impedance stage after a logic transition. All three techniques require CAD support in the routing stage to encourage specific positionings of unused conductors relative to used conductors.

Collaboration


Dive into the Safeen Huda's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jason Luu

University of Toronto

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bo Yan

University of New Brunswick

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kenneth B. Kent

University of New Brunswick

View shared research outputs
Researchain Logo
Decentralizing Knowledge