Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Uming Ko is active.

Publication


Featured researches published by Uming Ko.


IEEE Transactions on Very Large Scale Integration Systems | 1995

Low-power design techniques for high-performance CMOS adders

Uming Ko; T. Balsara; Wai Lee

A high-performance adder is one of the most critical components of a processor which determines its throughput, as it is used in the ALU, the floating-point unit, and for address generation in case of cache or memory access. In this paper, low-power design techniques for various digital circuit families are studied for implementing high-performance adders, with the objective to optimize performance per watt or energy efficiency as well as silicon area efficiency. While the investigation is done using 100 MHz, 32 b carry lookahead (CLA) adders in a 0.6 /spl mu/m CMOS technology, most techniques presented here can also be applied to other parallel adder algorithms such as carry-select adders (CSA) and other energy efficient CMOS circuits. Among the techniques presented here, the double pass-transistor logic (DPL) is found to be the most energy efficient while the single-rail domino and complementary pass-transistor logic (CPL) result in the best performance and the most area efficient adders, respectively. The impact of transistor threshold voltage scaling on energy efficiency is also examined when the supply voltage is scaled from 3.5 V down to 1.0 V. >


IEEE Transactions on Very Large Scale Integration Systems | 2000

High-performance energy-efficient D-flip-flop circuits

Uming Ko; Poras T. Balsara

This paper investigates performance, power, and energy efficiency of several CMOS master-slave D-flip-flops (DFFs). To improve performance and energy efficiency, a push-pull DFF and a push-pull isolation DFF are proposed. Among the five DFFs compared, the proposed push-pull isolation circuit is found to be the fastest with the best energy efficiency. Effects of using a double-pass-transistor logic (DPL) circuit and tri-state push-pull driver are also studied. Last, metastability characteristics of the five DFPs are also analyzed.


IEEE Transactions on Very Large Scale Integration Systems | 1998

Energy optimization of multilevel cache architectures for RISC and CISC processors

Uming Ko; Poras T. Balsara; Ashwini K. Nanda

In this paper, we present the characterization and design of energy-efficient, on chip cache memories. The characterization of power dissipation in on-chip cache memories reveals that the memory peripheral interface circuits and bit array dissipate comparable power. To optimize performance and power in a processors cache, a multidivided module (MDM) cache architecture is proposed to conserve energy in the bit array as well as the memory peripheral circuits. Compared to a conventional, nondivided, 16-kB cache, the latency and power of the MDM cache are reduced by a factor of 1.9 and 4.6, respectively. Based on the MDM cache architecture, the energy efficiency of the complete memory hierarchy is analyzed with respect to cache parameters in a multilevel processor cache design. This analysis was conducted by executing the SPECint92 benchmark programs with the miss ratios for reduced instruction set computer (RISC) and complex instruction set computer (CISC) machines.


international symposium on low power electronics and design | 1995

Energy optimization of multi-level processor cache architectures

Uming Ko; Poras T. Balsara; Ashwini K. Nanda

To optimize performance and power of a processor’s cache, a multiple-divided module (MDM) cache architecture is proposed to save power at memory peripherals as well as the bit array. For a MxB-divided MDM cache, latency is equivalent to that of the smallest module and power consumption is only 1/MxB of the regular, non-divided cache. Based on the architecture and given transistor budgets for onchip processor caches, this paper extends investigation to analyze energy effects from cache parameters in a multi-level cache design. The analysis is based on execution of SPECint92 benchmark programs with miss ratios of a RISC processor.


international conference on computer design | 1997

A repeater optimization methodology for deep sub-micron, high-performance processors

David Li; Andrew Pua; Pranjal Srivastava; Uming Ko

As process technology scales down to deep sub-micron and the frequency of a high-performance processor increases beyond 300 MHz, coupling induced signal integrity problems become more severe. Ignoring coupling effects can lead to functional failures or speed degradation. As a result, the traditional approach of repeater insertion driven by propagation delay and slew rate optimization becomes inadequate. The authors propose a design methodology to select optimal repeaters for high-performance processors by considering not only the delay and slew rate, but also crosstalk effects. A concurrent decision diagram (CDD) is further suggested to achieve crosstalk constraints with various trade-offs.


IEEE Transactions on Very Large Scale Integration Systems | 1995

Short-circuit power driven gate sizing technique for reducing power dissipation

Uming Ko; Poras T. Balsara

One major challenge in low-power technology is how to reduce overall power dissipation of a given subsystem without impacting its performance. In this paper we present a technique that can be applied to the nonspeed-critical nets in a circuit in order to reduce overall power dissipation. This technique involves a study of short-circuit power dissipation as a function of input signal slews and output load conditions, to aid in making a judicious choice of drive strengths for various gates in a circuit. The resulting low-power solution does not degrade the original performance and yields a circuit which occupies less silicon area. The technique described here can be incorporated into any power optimization or synthesis tool. Lastly, we present the savings in power and area for a 32-b carry lookahead adder which was designed using the technique described here. >


international symposium on low power electronics and design | 1996

Design techniques for high performance, energy efficient control logic

Uming Ko; Anthony M. Hill; Poras T. Balsara

This paper investigates delay, power and area of critical components in designing energy-efficient control logic. To improve performance and energy efficiency, a split-slave dual-path (SSDP) register is proposed which improves the energy efficiency of the prior art by 30%. For multiplexers (MUX) three MUXes are proposed and compared to existing solutions. The proposed MUXes improve performance by 50% or power by 22%. The impact of scaling supply voltage alone and scaling threshold voltage with supply voltage on delay and power is also examined.


international symposium on low power electronics and design | 1997

Hybrid dual-threshold design techniques for high-performance processors with low-power features

Uming Ko; Andrew Pua; Anthony M. Hill; Pranjal Srivastava

This paper investigates delay, power and area of several critical library components for high-performance, low-power microprocessor designs. To improve performance of a 0.18-/spl mu/m technology at a supply voltage of 1.8 V, the proposed hybrid dual-V/sub t/ (HDVT) circuit architectures enhance speed of low-V/sub t/ by 21% while reducing leakage power dissipation of low-V/sub t/ by an order of magnitude for combinatorial logic. For sequential elements, a HDVT split-slave dual-path (HSSDP) and Push-Pull Isolation (HPPI) registers are proposed to improve 29-92% performance over an HDVT-conventional registers with 20-89% less energy consumption. For the datapath, a HDVT hierarchical, reduced-swing, dual-V/sub t/ logic (HHRSL) comparator is proposed to improve the delay of prior arts by up to 50%.


1995 IEEE Symposium on Low Power Electronics. Digest of Technical Papers | 1995

High performance, energy efficient master-slave flip-flop circuits

Uming Ko; Poras T. Balsara

This paper investigates performance, power and energy efficiency of several CMOS master-slave D-flip-flops (DFFs). To improve performance and energy efficiency, a push-pull DFF and a push-pull isolation DFF are proposed. Among the five DFFs compared, the proposed push-pull isolation circuit is found to be the fastest with the highest energy efficiency and a minimum data pulse width property. Effects of using DPL circuit and tri-state push-pull driver are studied. The impact of scaling supply voltage alone and scaling transistor threshold voltage with supply voltage on speed and power consumption of these circuits is also examined.


international symposium on low power electronics and design | 1997

High-performance, low-power design techniques for dynamic to static logic interface

June Jiang; Kan Lu; Uming Ko

To optimize performance and power of a processor with both precharged and static circuit styles, a self-timed modified cascode latch (MCL) is proposed for dual-rail domino to static logic interface. Compared to conventional self-timed cascode and cross-coupled NAND latches, the innovative MCL achieves the highest performance and lowest power dissipation with reasonable noise immunity. Ease of embedding logic functions in these self-timed latches is also studied. For interfacing single-rail domino to static logic, the pseudo-inverter latch (PIL) is the most power efficient latch when compared with the conventional transparent and cross-coupled NAND latches. Based on a 0.18 /spl mu/m CMOS nominal process with a 1.6 V supply voltage, effects on these latches power dissipation and delay from scaling supply voltage and output load are presented respectively.

Collaboration


Dive into the Uming Ko's collaboration.

Top Co-Authors

Avatar

Poras T. Balsara

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge