Chu Shik Jhon | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chu Shik Jhon is active.

Explore More

Publication

Featured researches published by Chu Shik Jhon.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 1996

COP: a Crosstalk OPtimizer for gridded channel routing

Kyoung-Son Jhang; Soonhoi Ha; Chu Shik Jhon

The interwire spacing in a VLSI chip becomes closer as the VLSI fabrication technology rapidly evolves. Accordingly, it becomes important to consider crosstalk caused by the coupling capacitance between adjacent wires in the layout design for the fast and safe VLSI circuits. The upper bounds of the allowable crosstalk for nets, called crosstalk constraints, are usually given in the design specification. This paper proposes a crosstalk minimization technique based on segment rearrangement for gridded channel routing. The technique repeatedly rearranges horizontal wire segments and/or increase the number of tracks to satisfy the crosstalk constraints. With experiments, we observed that the presented technique is more effective than the track permutation technique.

Information Processing Letters | 1998

Relaxed barrier synchronization for the BSP model of computation on message-passing architectures

Jin-Soo Kim; Soonhoi Ha; Chu Shik Jhon

Relaxed barrier synchronization for the BSP model of computation on message-passing architectures

annual conference on computers | 1993

A new write-invalidate snooping cache coherence protocol for split transaction bus-based multiprocessor systems

Seong Tae Jhang; Chu Shik Jhon

We present a new write-invalidate snooping cache coherence protocol called MMESSII cache protocol which addresses several significant drawbacks of existing write-invalidate snooping protocols under the split transaction bus based multiprocessor environment. In this protocol, each cache block maintains the ID information to identify the processor module that invalidated the block most recently. It also maintains seven cache states which consist of two updated states, one exclusive state, two shared states and two invalidated states. By using these states and the ID information, our protocol can reduce the contention for both memory modules and system bus significantly. We also present the simulation results which. Show better performance of our protocol than that of existing write-invalidate protocols.<<ETX>>

merged international parallel processing symposium and symposium on parallel and distributed processing | 1998

Efficient barrier synchronization mechanism for the BSP model on message-passing architectures

Jin-Soo Kim; Soonhoi Ha; Chu Shik Jhon

The Bulk Synchronous Parallel (BSP) model of computation can be used to develop efficient and portable programs for a range of machines and applications. However the cost of the barrier synchronization used in the BSP model is relatively expensive for message-passing architectures. In this paper we relax the barrier synchronization constraint in the BSP model for the efficient implementation on message-passing architectures. In our relaxed barrier synchronization, the synchronization occurs at the time of accessing non-local data only between the producer and the consumer processors, eliminating the exchange of global information. From the experimental evaluations on IBM SP2, we have observed that the relaxed barrier synchronization reduces the total synchronization time by 45.2% to 61.5% in FT, and 28.6% to 49.0% in LU with 32 processors.

international symposium on circuits and systems | 1994

Direct synthesis of efficient speed-independent circuits from deterministic signal transition graphs

Sung Tae Jung; Chu Shik Jhon

This paper presents a new algorithm which synthesizes a speed-independent circuit from a deterministic Signal Transition Graph (STG) representation which satisfies safeness, liveness and unique state coding property. Existing works translate a higher level description such as a nondeterministic STG representation into a State Graph (SG) representation and then translate the SG representation into a speed-independent circuit. In contrast our algorithm synthesizes a speed-independent circuit directly from the STG representation by analyzing the relations between the signal transitions in the representation. It turns out that our algorithm yields more efficient circuits than the existing works. Also, it turns out that the run time of our algorithm is, in most cases, faster than that of the existing works.<<ETX>>

Microprocessors and Microsystems | 2006

PP-cache: A partitioned power-aware instruction cache architecture

Cheol Hong Kim; Sung Woo Chung; Chu Shik Jhon

Abstract Microarchitects should consider energy consumption, together with performance, when designing instruction cache architecture, especially in embedded processors. This paper proposes a new instruction cache architecture, named Partitioned Power-aware instruction cache (PP-cache), for reducing dynamic energy consumption in the instruction cache by partitioning it to small sub-caches. When a request comes into the PP-cache, only one sub-cache is accessed by utilizing the locality of applications. In the meantime, the other sub-caches are not activated. The PP-cache reduces dynamic energy consumption by reducing the activated cache size and eliminating the energy consumed in tag matching. Simulation results show that the PP-cache reduces dynamic energy consumption by 34–56%. This paper also proposes the technique to reduce leakage energy consumption in the PP-cache, which turns off the lines that do not have valid data dynamically. Simulation results show that the proposed technique reduces leakage energy consumption in the PP-cache by 74–85%.

conference on high performance computing (supercomputing) | 1998

Reducing Coherence Overhead of Barrier Synchronization in Software DSMs

Jae Bum Lee; Chu Shik Jhon

Software Distributed Shared Memory (SDSM) systems usually have the large coherence granularity that is imposed by the underlying virtual memory page size. To alleviate the coherence overheads such as the network traffic to preserve the coherence, or page misses caused by false sharing, relaxed memory models are widely accepted for the SDSM systems. In the relaxed memory models, when a shared page is modified, invalidation requests to other copies are deferred until a synchronization point and, in addition, the requests are transferred only to the processor acquiring the synchronization variable. On a barrier, however, the invalidation requests must be transferred to all the processors that participate in the barrier. As a result, it tends to induce heavy network traffic, and also may lead to useless page misses by false sharing. In this paper, we propose a method to alleviate the coherence overheads of barrier synchronization in shared-memory parallel programs. It performs static analysis to examine data dependency between processors across global barriers, and then inserts special primitives into the program in order to exploit the dependency information at run time. The static analysis finds out code regions where a processor modifies data that will be used only by some of the other processors. At run time, the coherence messages for the data are transferred only to the processors with the help of the inserted primitives. In particular, if the modified data will not be used by any other processors, the primitives enforce that the coherence messages are delivered only to master processor when the parallel execution of the program is finished. We evaluated the performance of this method in a 16-node software DSM system supporting AURC protocol. Program-driven simulation was performed with five benchmark programs: Jacobi, Red-black SOR, Expl, LU, and Water-nsquared. For the applications, the experimental results show that our method can reduce the coherence messages by up to about 98%, and also can improve the execution time by up to about 26%.

asia pacific conference on circuits and systems | 1994

A segment rearrangement approach to channel routing under the crosstalk constraints

Kyoung-Son Jhang; Soonhoi Ha; Chu Shik Jhon

The inter-wire spacing in a VLSI chip becomes closer as the VLSI fabrication technology rapidly evolves. Accordingly, it becomes important to consider crosstalk caused by the coupling capacitance between adjacent wires in the layout design for the fast and safe VLSI circuits. The upper bounds of allowable crosstalks, called crosstalk constraint, are usually given for each net in the design specification. This paper presents a segment rearrangement approach to channel routing to satisfy all the crosstalk constraints. Starting from the given routing, the proposed technique repeatedly rearranges the horizontal wire segments around the nets that violate the crosstalk constraints to reduce crosstalk. Our objective is to find a routing with the minimum number of tracks under crosstalk constraints. With experiments, we observed the presented technique is more effective than the track permutation technique.

international conference on parallel and distributed systems | 1998

PANDA: ring-based multiprocessor system using new snooping protocol

Sung Woo Chung; Seong Tae Jhang; Chu Shik Jhon

The PANDA is a ring-based Cache Coherent Non-Uniform Memory Access (CC-NUMA) multiprocessor system under implementation at the Seoul National University. Its main goal is to ameliorate the data miss latency by using the unidirectional point-to-point interconnection network. We introduce the PANDA architecture and present a new snooping protocol for this system. We evaluate the performance of the PANDA for a small to medium scale multiprocessor system using analytical models and a program-driven simulator. We compare the proposed system to other alternatives of point-to-point connected machines, such as the Express Ring and full map directory based system. The simulation results show up to 29% performance improvement against the Express Ring. They also show that the PANDA performs no worse than the full map directory based system, which has the additional hardware costs for the directory management.

custom integrated circuits conference | 1992

A Branch-and-bound Method For The Optimal Scheduling

Seong Yong Ohm; Chu Shik Jhon

This paper presents a new approach to the scheduling problem in high level synthesis. In this approach, iterative rescheduling processes are performed in a branch-and-bound manner, starting w i th the as soon as possible scheduling result, so as to f ind the scheduling result of the lowest hardware cost under the given t iming constraint. A t each iteration step, only the candidate nodes are selected t o be considered for rescheduling and the lower bound estimation is performed so as t o help increase the number of cut-offs in the search space, and thus reducing the run time. Our algorithm also supports mutually exclusive operations, multiple operations per cycle, multi-cycling operations, and pipelined data paths. Experimental results are given t o show that our algorithm derives an optimal scheduling result within a reasonably short CPU time.This paper presents a new approach to the scheduling problem in high level synthesis. In this approach, iterative rescheduling processes are performed in a branch-and-bound manner, starting with the as soon as possible scheduling result, so as to find the scheduling result of the lowest hardware cost under the given timing constraint. At each iteration step, only the candidate nodes are selected to be considered for rescheduling and the lower bound estimation is performed so as to help increase the number of cut-offs in the search space, and thus reducing the run time. Our algorithm also supports mutually exclusive operations, multiple operations per cycle, multi-cycling operations, and pipelined data paths. Experimental results are given to show that our algorithm derives an optimal scheduling result within a reasonably short CPU time.

Explore More