Chu Shik Jhon
Seoul National University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Chu Shik Jhon.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 1996
Kyoung-Son Jhang; Soonhoi Ha; Chu Shik Jhon
The interwire spacing in a VLSI chip becomes closer as the VLSI fabrication technology rapidly evolves. Accordingly, it becomes important to consider crosstalk caused by the coupling capacitance between adjacent wires in the layout design for the fast and safe VLSI circuits. The upper bounds of the allowable crosstalk for nets, called crosstalk constraints, are usually given in the design specification. This paper proposes a crosstalk minimization technique based on segment rearrangement for gridded channel routing. The technique repeatedly rearranges horizontal wire segments and/or increase the number of tracks to satisfy the crosstalk constraints. With experiments, we observed that the presented technique is more effective than the track permutation technique.
Information Processing Letters | 1998
Jin-Soo Kim; Soonhoi Ha; Chu Shik Jhon
Relaxed barrier synchronization for the BSP model of computation on message-passing architectures
annual conference on computers | 1993
Seong Tae Jhang; Chu Shik Jhon
We present a new write-invalidate snooping cache coherence protocol called MMESSII cache protocol which addresses several significant drawbacks of existing write-invalidate snooping protocols under the split transaction bus based multiprocessor environment. In this protocol, each cache block maintains the ID information to identify the processor module that invalidated the block most recently. It also maintains seven cache states which consist of two updated states, one exclusive state, two shared states and two invalidated states. By using these states and the ID information, our protocol can reduce the contention for both memory modules and system bus significantly. We also present the simulation results which. Show better performance of our protocol than that of existing write-invalidate protocols.<<ETX>>
merged international parallel processing symposium and symposium on parallel and distributed processing | 1998
Jin-Soo Kim; Soonhoi Ha; Chu Shik Jhon
The Bulk Synchronous Parallel (BSP) model of computation can be used to develop efficient and portable programs for a range of machines and applications. However the cost of the barrier synchronization used in the BSP model is relatively expensive for message-passing architectures. In this paper we relax the barrier synchronization constraint in the BSP model for the efficient implementation on message-passing architectures. In our relaxed barrier synchronization, the synchronization occurs at the time of accessing non-local data only between the producer and the consumer processors, eliminating the exchange of global information. From the experimental evaluations on IBM SP2, we have observed that the relaxed barrier synchronization reduces the total synchronization time by 45.2% to 61.5% in FT, and 28.6% to 49.0% in LU with 32 processors.
international symposium on circuits and systems | 1994
Sung Tae Jung; Chu Shik Jhon
This paper presents a new algorithm which synthesizes a speed-independent circuit from a deterministic Signal Transition Graph (STG) representation which satisfies safeness, liveness and unique state coding property. Existing works translate a higher level description such as a nondeterministic STG representation into a State Graph (SG) representation and then translate the SG representation into a speed-independent circuit. In contrast our algorithm synthesizes a speed-independent circuit directly from the STG representation by analyzing the relations between the signal transitions in the representation. It turns out that our algorithm yields more efficient circuits than the existing works. Also, it turns out that the run time of our algorithm is, in most cases, faster than that of the existing works.<<ETX>>
Microprocessors and Microsystems | 2006
Cheol Hong Kim; Sung Woo Chung; Chu Shik Jhon
Abstract Microarchitects should consider energy consumption, together with performance, when designing instruction cache architecture, especially in embedded processors. This paper proposes a new instruction cache architecture, named Partitioned Power-aware instruction cache (PP-cache), for reducing dynamic energy consumption in the instruction cache by partitioning it to small sub-caches. When a request comes into the PP-cache, only one sub-cache is accessed by utilizing the locality of applications. In the meantime, the other sub-caches are not activated. The PP-cache reduces dynamic energy consumption by reducing the activated cache size and eliminating the energy consumed in tag matching. Simulation results show that the PP-cache reduces dynamic energy consumption by 34–56%. This paper also proposes the technique to reduce leakage energy consumption in the PP-cache, which turns off the lines that do not have valid data dynamically. Simulation results show that the proposed technique reduces leakage energy consumption in the PP-cache by 74–85%.
conference on high performance computing (supercomputing) | 1998
Jae Bum Lee; Chu Shik Jhon
Software Distributed Shared Memory (SDSM) systems usually have the large coherence granularity that is imposed by the underlying virtual memory page size. To alleviate the coherence overheads such as the network traffic to preserve the coherence, or page misses caused by false sharing, relaxed memory models are widely accepted for the SDSM systems. In the relaxed memory models, when a shared page is modified, invalidation requests to other copies are deferred until a synchronization point and, in addition, the requests are transferred only to the processor acquiring the synchronization variable. On a barrier, however, the invalidation requests must be transferred to all the processors that participate in the barrier. As a result, it tends to induce heavy network traffic, and also may lead to useless page misses by false sharing. In this paper, we propose a method to alleviate the coherence overheads of barrier synchronization in shared-memory parallel programs. It performs static analysis to examine data dependency between processors across global barriers, and then inserts special primitives into the program in order to exploit the dependency information at run time. The static analysis finds out code regions where a processor modifies data that will be used only by some of the other processors. At run time, the coherence messages for the data are transferred only to the processors with the help of the inserted primitives. In particular, if the modified data will not be used by any other processors, the primitives enforce that the coherence messages are delivered only to master processor when the parallel execution of the program is finished. We evaluated the performance of this method in a 16-node software DSM system supporting AURC protocol. Program-driven simulation was performed with five benchmark programs: Jacobi, Red-black SOR, Expl, LU, and Water-nsquared. For the applications, the experimental results show that our method can reduce the coherence messages by up to about 98%, and also can improve the execution time by up to about 26%.
asia pacific conference on circuits and systems | 1994
Kyoung-Son Jhang; Soonhoi Ha; Chu Shik Jhon
The inter-wire spacing in a VLSI chip becomes closer as the VLSI fabrication technology rapidly evolves. Accordingly, it becomes important to consider crosstalk caused by the coupling capacitance between adjacent wires in the layout design for the fast and safe VLSI circuits. The upper bounds of allowable crosstalks, called crosstalk constraint, are usually given for each net in the design specification. This paper presents a segment rearrangement approach to channel routing to satisfy all the crosstalk constraints. Starting from the given routing, the proposed technique repeatedly rearranges the horizontal wire segments around the nets that violate the crosstalk constraints to reduce crosstalk. Our objective is to find a routing with the minimum number of tracks under crosstalk constraints. With experiments, we observed the presented technique is more effective than the track permutation technique.
international conference on parallel and distributed systems | 1998
Sung Woo Chung; Seong Tae Jhang; Chu Shik Jhon
The PANDA is a ring-based Cache Coherent Non-Uniform Memory Access (CC-NUMA) multiprocessor system under implementation at the Seoul National University. Its main goal is to ameliorate the data miss latency by using the unidirectional point-to-point interconnection network. We introduce the PANDA architecture and present a new snooping protocol for this system. We evaluate the performance of the PANDA for a small to medium scale multiprocessor system using analytical models and a program-driven simulator. We compare the proposed system to other alternatives of point-to-point connected machines, such as the Express Ring and full map directory based system. The simulation results show up to 29% performance improvement against the Express Ring. They also show that the PANDA performs no worse than the full map directory based system, which has the additional hardware costs for the directory management.
custom integrated circuits conference | 1992
Seong Yong Ohm; Chu Shik Jhon
This paper presents a new approach to the scheduling problem in high level synthesis. In this approach, iterative rescheduling processes are performed in a branch-and-bound manner, starting w i th the as soon as possible scheduling result, so as to f ind the scheduling result of the lowest hardware cost under the given t iming constraint. A t each iteration step, only the candidate nodes are selected t o be considered for rescheduling and the lower bound estimation is performed so as t o help increase the number of cut-offs in the search space, and thus reducing the run time. Our algorithm also supports mutually exclusive operations, multiple operations per cycle, multi-cycling operations, and pipelined data paths. Experimental results are given t o show that our algorithm derives an optimal scheduling result within a reasonably short CPU time.This paper presents a new approach to the scheduling problem in high level synthesis. In this approach, iterative rescheduling processes are performed in a branch-and-bound manner, starting with the as soon as possible scheduling result, so as to find the scheduling result of the lowest hardware cost under the given timing constraint. At each iteration step, only the candidate nodes are selected to be considered for rescheduling and the lower bound estimation is performed so as to help increase the number of cut-offs in the search space, and thus reducing the run time. Our algorithm also supports mutually exclusive operations, multiple operations per cycle, multi-cycling operations, and pipelined data paths. Experimental results are given to show that our algorithm derives an optimal scheduling result within a reasonably short CPU time.