Ilhyun Kim | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ilhyun Kim is active.

Explore More

Publication

Featured researches published by Ilhyun Kim.

international symposium on computer architecture | 2003

Half-price architecture

Ilhyun Kim; Mikko H. Lipasti

Current-generation microprocessors are designed to process instructions with one and two source operands at equal cost. Handling two source operands requires multiple ports for each instruction in structures--such as the register file and wakeup logic--which are often in the processors critical timing paths. We argue that these structures are overdesigned since only a small fraction of instructions require two source operands to be processed simultaneously. In this paper, we propose the half-price architecture that judiciously removes this overdesign by restricting the processors capability to handle two source operands in certain timing-critical cases. Two techniques are proposed and evaluated: one for the wakeup logic is sequential wakeup, which decouples half of the tag matching logic from the wakeup bus to reduce the load capacitance of the bus. The other technique for the register file is sequential register access, which halves the register read ports by sequentially accessing two values using a single port when needed. We show that a pipeline that optimizes scheduling and register access for a single operand achieves nearly the same performance as an ideal base machine that fully handles two operands, with 2.2% (worst case 4.8%) IPC degradation.

international symposium on microarchitecture | 2003

Macro-op scheduling: relaxing scheduling loop constraints

Ilhyun Kim; Mikko H. Lipasti

Ensuring back-to-back execution of dependent instructions in a conventional out-of-order processor requires scheduling logic that wakes up and selects instructions at the same rate as they are executed. To sustain high performance, integer ALU instructions typically have single-cycle latency, consequently requiring scheduling logic with the same single-cycle latency. Prior proposals have advocated the use of speculation in either the wakeup or select phases to enable pipelining of scheduling logic to achieve higher clock frequency. In contrast, this paper proposes macro-op scheduling which systematically removes instructions with single-cycle latency from the machine by combining them into macro-ops, and performs nonspeculative pipelined scheduling of multi-cycle operations. Macro-op scheduling also increases the effective size of the scheduling window by enabling multiple instructions to occupy a single issue queue entry. We demonstrate that pipelined 2-cycle macro-op scheduling performs comparably or even better than atomic scheduling or prior proposals for select-free scheduling.

high-performance computer architecture | 2004

Understanding scheduling replay schemes

Ilhyun Kim; Mikko H. Lipasti

Modern microprocessors adopt speculative scheduling techniques where instructions are scheduled several clock cycles before they actually execute. Due to this scheduling delay, scheduling misses should be recovered across the multiple levels of dependence chains in order to prevent further unnecessary execution. We explore the design space of various scheduling replay schemes that prevent the propagation of scheduling misses, and find that current and proposed replay schemes do not scale well and require instructions to execute in correct data dependence order, since they track dependences among instructions within the instruction window as a part of the scheduling or execution process. We propose token-based selective replay that moves the dependence information propagation loop out of the scheduler, enabling lower complexity in the scheduling logic and support for data-speculation techniques at the expense of marginal IPC degradation compared to an ideal selective replay scheme.

high-performance computer architecture | 2006

An approach for implementing efficient superscalar CISC processors

Shiliang Hu; Ilhyun Kim; Mikko H. Lipasti; James E. Smith

An integrated, hardware/software co-designed CISC processor is proposed and analyzed. The objectives are high performance and reduced complexity. Although the x86 ISA is targeted, the overall approach is applicable to other CISC ISAs. To provide high performance on frequently executed code sequences, fully transparent dynamic translation software decomposes CISC superblocks into RISC-style micro-ops. Then, pairs of dependent micro-ops are reordered and fused into macro-ops held in a large, concealed code cache. The macro-ops are fetched from the code cache and processed throughout the pipeline as single units. Consequently, instruction level communication and management are reduced, and processor resources such as the issue buffer and register file ports are better utilized. Moreover, fused instructions lead naturally to pipelined instruction scheduling (issue) logic, and collapsed 3-1 ALUs can be used, resulting in much simplified result forwarding logic. Steady state performance is evaluated for the SPEC2000 benchmarks, and a proposed x86 implementation with complexity similar to a two-wide superscalar processor is shown to provide performance (instructions per cycle) that is equivalent to a conventional four-wide superscalar processor.

international symposium on computer architecture | 2002

Implementing optimizations at decode time

Ilhyun Kim; Mikko H. Lipasti

The number of pipeline stages separating dynamic instruction scheduling from instruction execution has increased considerably in recent out-of-order microprocessor implementations, forcing the scheduler to allocate functional units and other execution resources several cycles before they are actually used. Unfortunately, several proposed microarchitectural optimizations become less desirable or even impossible in such an environment, since they require instantaneous or near-instantaneous changes in execution behavior and resource usage in response to dynamic events that occur during instruction execution. Since they are detected several cycles after scheduling decisions have already been made, such dynamic responses are infeasible. To overcome this limitation, we propose to implement optimizations by performing what we call speculative decode. Speculative decode alters the mapping between user-visible instructions and the implemented core instructions based on observed runtime characteristics and generates speculative instruction sequences. In these sequences, optimizations are pre-scheduled in a manner compatible with realistic pipelines with multicycle scheduling latency. We present case studies on memory reference combining and silent store squashing, and demonstrate that speculative decode performs comparably or even better than impractical in-core implementations that require zero-cycle scheduling latency.

Archive | 2015

Method, apparatus, and system for speculative abort control mechanisms

Martin G. Dixon; Ravi Rajwar; Konrad K. Lai; Robert S. Chappell; Rajesh S. Parthasarathy; Alexandre J. Farcy; Ilhyun Kim; Prakash Math; Matthew C. Merten; Vijaykumar B. Kadgi

Archive | 2010