Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Johnson Kin is active.

Publication


Featured researches published by Johnson Kin.


international symposium on microarchitecture | 1997

The filter cache: an energy efficient memory structure

Johnson Kin; Munish Gupta; William H. Mangione-Smith

Most modern microprocessors employ one or two levels of on-chip caches in order to improve performance. These caches are typically implemented with static RAM cells and often occupy a large portion of the chip area. Not surprisingly, these caches often consume a significant amount of power. In many applications, such as portable devices, low power is more important than performance. We propose to trade performance for power consumption by filtering cache references through an unusually small L1 cache. An L2 cache, which is similar in size and structure to a typical L1 cache, is positioned behind the filter cache and serves to reduce the performance loss. Experimental results across a wide range of embedded applications show that the filter cache results in improved memory system energy efficiency. For example, a direct mapped 256-byte filter cache achieves a 58% power reduction while reducing performance by 21%, corresponding to a 51% reduction in the energy-delay product over conventional design.


IEEE Transactions on Computers | 2000

Filtering memory references to increase energy efficiency

Johnson Kin; Munish Gupta; William H. Mangione-Smith

Most modern microprocessors employ one or two levels of on-chip caches in order to improve performance. Caches typically are implemented with static RAM cells and often occupy a large portion of the chip area. Not surprisingly, these caches can consume a significant amount of power. In many applications, such as portable devices, energy efficiency is more important than performance. We propose sacrificing some performance in exchange for energy efficiency by filtering cache references through an unusually small first level cache. We refer to this structure as the filter cache. A second level cache, similar in size and structure to a conventional first level cache, is positioned behind the filter cache and serves to mitigate the performance loss. Extensive experiments indicate that a small filter cache still can achieve a high hit rate and good performance. This approach allows the second level cache to be in a low power mode most of the time, thus resulting in power savings. The filter cache is particularly attractive in low power applications, such as the embedded processors used for communication and multimedia applications. For example, experimental results across a wide range of embedded applications show that a direct mapped 255-byte filter cache achieves a 58 percent power reduction while reducing performance by 21 percent. This trade-off results in a 51 percent reduction in the energy-delay product when compared to a conventional design.


international symposium on microarchitecture | 1997

Procedure based program compression

Darko Kirovski; Johnson Kin; William H. Mangione-Smith

Cost and power consumption are two of the most important design factors for many embedded systems, particularly consumer devices. Products such as Personal Digital Assistants, pagers with integrated data services, and smart phones have fixed performance requirements but unlimited appetites for reduced cost and increased battery life. Program compression is one technique that can be used to attack both of these problems. Compressed programs require less memory, thus reducing the cost of both direct materials and manufacturing. Furthermore, by relying on compressed memory, the total number of memory references is reduced. This reduction saves power by lowering the traffic on high capacitance buses. This paper will discuss a new approach to implementing transparent program compression that requires little or no hardware support. Procedures are compressed individually, and a directory structure is used to bind them together at runtime. Decompressed procedures are explicitly cached in ordinary RAM as complete units, thus resolving references within each procedure. This approach has been evaluated on a set of 25 embedded multimedia and communications applications, and results in an average memory reduction of 40% with a runtime performance overhead of 10%.


design automation conference | 1999

Power efficient mediaprocessors: design space exploration

Johnson Kin; Chunho Lee; William H. Mangione-Smith; Miodrag Potkonjak

We present a framework for rapidly exploring the design space of low power application-specific programmable processors (ASPP), in particular mediaprocessors. We focus on a category of processors that are programmable yet optimized to reduce power consumption for a specific set of applications. The key components of the framework presented in this paper are a retargetable instruction level parallelism (ILP) compiler, processor simulators, a set of complete media applications written in a high level language and an architectural component selection algorithm. The fundamental idea behind the framework is that with the aid of a retargetable ILP compiler and simulators it is possible to arrange architectural parameters (e.g., the issue width, the size of cache memory units, the number of execution units, etc.) to meet low power design goals under area constraints.


design automation conference | 1998

Media architecture: general purpose vs. multiple application-specific programmable processor

Chunho Lee; Johnson Kin; Miodrag Potkonjak; William H. Mangione-Smith

In this paper we report a framework that makes it possible for a designer to rapidly explore the application-specific programmable processor design space under area constraints. The framework uses a production-quality compiler and simulation tools to synthesize a high performance machine for an application. Using the framework we evaluate the validity of the fundamental assumption behind the development of application-specific programmable processors. Application-specific processors are based on the idea that applications differ from each other in key architectural parameters, such as the available instruction-level parallelism, demand on various hardware components (e.g. cache memory units, register files) and the need for different number of functional units. We found that the framework introduced in this paper can be valuable in making early design decisions such as area and architectural trade-off, cache and instruction issue width trade-off under area constraint, and the number of branch units and issue width.


asia and south pacific design automation conference | 2000

A technique for QoS-based system partitioning

Johnson Kin; Chunho Lee; William H. Mangione-Smith; Miodrag Potkonjak

Quality of service (QoS) has been an important topic of many research communities. Combined with an advanced and retargetable compiler, variability of applications-specific very large instruction word (VLIW) processors has been shown to provide an excellent platform for optimizing performance under area constraints. Although media servers and others that are natural candidates for QoS management are embedded systems and implementing QoS management using the applications-specific VLIW platform can provide a new horizon for efficient and effective system design, until now no effort has been reported. In this paper, we introduce a QoS-based system partitioning scheme that addresses the need for maximizing net benefit of a system under resource constraints by exploiting applications-specific VLIW processors. The approach utilizes the modern advances in compiler technology and architectural enhancements that are well matched to the compiler technology. Experimental results indicate that the approach presented in this paper is very effective in system partitioning for QoS given a set of heterogeneous processors.


international symposium on low power electronics and design | 1999

Designing power efficient hypermedia processors

Chunho Lee; Johnson Kin; Miodrag Potkonjak; William H. Mangione-Smith

Distributed hypermedia systems that support collaboration are an emerging platform for creation, discovery, management and delivery of information. We present an approach to low power system design space exploration for distributed hypermedia applications. Traditionally, low power design and synthesis of application specific programmable processors has been done in the context of a given number of operations required to complete a task. Our approach utilizes the modern advances in compiler technology and architectural enhancements that are well matched to the compiler technology. This work is, to the best of our knowledge, the first attempt to address the need for synthesis of low power hypermedia processors. Also, this is the first work to address the power efficiency through exploiting instruction level parallelism (ILP) found in hypermedia tasks by a production quality ILP compiler. Using the developed framework we conduct an extensive exploration of low power system design space for a hypermedia application under area and throughput constraints. The framework introduced in this paper is very valuable in making early low power design decisions such as architectural configuration trade-offs including the cache and issue width trade-off under area and throughput constraint, and the number of branch units and issue width.


IEEE Transactions on Very Large Scale Integration Systems | 2001

Exploring the diversity of multimedia systems

Johnson Kin; Chunho Lee; William H. Mangione-Smith; Miodrag Potkonjak

We evaluate the validity of the fundamental assumption behind application-specific programmable processors: that applications differ from each other in key parameters which are exploitable, such as the available instruction-level parallelism (ILP), demand on various hardware resources, and the desired mix of function units. Following the tradition of the CAD community, we develop an accurate chip area estimate and a set of aggressive hardware optimization algorithms. We follow the tradition of the architecture community by using comprehensive real-life benchmarks and production quality tools. This combination enables us to build a unique framework for system-level synthesis and to gain valuable insights about design and use of application-specific programmable processors for modern applications. We explore the application-specific programmable processor (ASSP) design space to understand the relationship between performance and area. The architecture model we used is the Hewlett Packard PA-RISC with single level caches. The system, including all memory and bus latencies, is simulated and no other specialized ALU or memory structures are being used. The experimental results reveal a number of important characteristics of the ASSP design space. For example, we found that in most cases a single programmable architecture performs similarly to a set of architectures that are tuned to individual application. A notable exception is highly cost sensitive designs, which we observe need a small number of specialized architectures that require smaller areas. Also, it is clear that there is enough parallelism in the typical media and communication applications to justify use of high number of function units. We found that the framework introduced in this paper can be very valuable in making early design decisions such as area and architectural configuration tradeoff, cache and issue width tradeoff under area constraint, and the number of branch units and issue width.


multimedia signal processing | 2001

Exploring Hypermedia Processor Design Space

Chunho Lee; Johnson Kin; Miodrag Potkonjak; William H. Mangione-Smith

Distributed hypermedia systems that support collaboration are important emerging tools for creation, discovery, management and delivery of information. These systems are becoming increasingly desired and practical as other areas of information technologies advance. A framework is developed for efficiently exploring the hypermedia design space while intelligently capitalizing on tradeoffs between performance and area. We focus on a category of processors that are programmable yet optimized to a hypermedia application.The key components of the framework presented in this paper are a retargetable instruction-level parallelism compiler, instruction level simulators, a set of complete media applications written in a high level language, and a media processor synthesis algorithm. The framework addresses the need for efficient use of silicon by exploiting the instruction-level parallelism found in media applications by compilers that target multiple-instruction-issue processors.Using the developed framework we conduct an extensive exploration of the design space for a hypermedia application. We find that there is enough instruction-level parallelism in the typical media and communication applications to achieve highly concurrent execution when throughput requirements are high. On the other hand, when throughput requirements are low, there is little value in multiple-instruction-issue processors. Increased area does not improve performance enough to justify the use of multiple-instruction-issue processors when throughput requirements are low.The framework introduced in this paper is valuable in making early architecture design decisions such as cache and issue width trade-off when area is constrained, and the number of branch units and instruction issue width.


multimedia signal processing | 1998

Hypermedia processors: design space exploration

Johnson Kin; Chunho Lee; William H. Mangione-Smith; Miodrag Potkonjak

We present a framework for area optimal system design space exploration for hypermedia applications. We focus on a category of processors that are programmable yet optimized to a hypermedia application. The key components of the framework presented in this paper are a retargetable instruction-level parallelism compiler, instruction level simulators, a set of complete media applications written in a high level language and a media processor synthesis algorithm. The framework addresses the need for area optimal system design by exploiting the instruction-level parallelism found in media applications by compilers that target multiple-instruction-issue processors. Using the framework we conduct an extensive exploration of area optimal system design space for a hypermedia application. We found that there is enough ILP in the typical media and communication applications to achieve highly concurrent execution when throughput requirements are high. On the other hand, when throughput requirements are low, there is no need to use multiple-instruction-issue processors.

Collaboration


Dive into the Johnson Kin's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Chunho Lee

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Munish Gupta

International Rectifier

View shared research outputs
Top Co-Authors

Avatar

Darko Kirovski

University of California

View shared research outputs
Researchain Logo
Decentralizing Knowledge