Henk Neefs
Ghent University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Henk Neefs.
international symposium on performance analysis of systems and software | 2000
Lieven Eeckhout; K. De Bosschere; Henk Neefs
Most research in the area of microarchitectural performance analysis is done using trace-driven simulations. Although trace-driven simulations are fairly accurate, they are both time- and space-consuming which makes them sometimes impractical. Modeling the execution of a computer program by a statistical profile and generating a synthetic benchmark trace from this statistical profile can be used to accelerate the design process. Thanks to the statistical nature of this technique, performance characteristics quickly converge to a steady state solution during simulation, which makes this technique suitable for fast design space explorations. In this paper, it is shown how more detailed statistical profiles can be obtained and how the synthetic trace generation mechanism should be designed to generate syntactically correct benchmark traces. As a result, the performance predictions in this paper are far more accurate than those reported in previous research.
field programmable logic and applications | 1994
Jo Depreitere; Henk Neefs; Herwig Van Marck; Jan Van Campenhout; Roel Baets; Bart Dhoedt; Hugo Thienpont; Irina Veretennicoff
Traditional Field-Programmable Gate Arrays suffer from a lack of routing resources when implementing complex logic designs. This paper proposes two possible improvements to the FPGA structure that could alleviate these problems. We suggest extending the FPGA class to 3-D architectures. The 3-D architectures could be constructed of a stack of optically interconnected 2-D planes. Furthermore, we suggest a hierarchical distribution of routing resources that closely matches the wire length distributions of the intended class of applications.
high performance computer architecture | 2000
Henk Neefs; Hans Vandierendonck; K. De Bosschere
One of the problems in future processors will be the resource conflicts caused by several load/store units competing to access the same cache bank. The traditional approach for handling this case is by introducing buffers combined with a cross-bar. This approach suffers from (i) the non-deterministic latency of a load/store and (ii) the extra latency caused by the cross-bar and the buffer management. A deterministic latency is of the utmost importance for the forwarding mechanism of out-of-order processors because it enables back-to-back operation of instructions. We propose a technique by which we eliminate the buffers and cross-bars from the critical path of the load/store execution. This results in both, a low and a deterministic latency. Our solution consists of predicting which bank is to be accessed. Only in the case of a wrong prediction a penalty results.
Proceedings of Optics in Computing, Vol. 3490, SPIE, Chavel, P., Miller, D., Thienpont, H. (ed.), Brugge, juli | 1998
Henk Neefs; Pim Van Heuven; Jan Van Campenhout
It is the ideal of a computer designer to have a huge, yet very fast memory connected to a uniprocessor core. But in reality, these two requirements, fast and huge, are not reconcilable. For this reason, a memory hierarchy was introduced that consists of very fast and small memory close to the processor core (the registers) but slower and larger memory further away from the processor (Figure 1). When data is needed, it is fetched from the slower memory into faster memory, from which it can be quickly accessed.
Journal of Systems Architecture | 1999
Henk Neefs; Koen De Bosschere; Jan Van Campenhout
Abstract Through simulations, the effect of several microarchitectural parameters on the performance of a dynamic out-of-order executing microprocessor is shown. Next, we show that memory instructions, especially stores, limit the available instruction level parallelism (ILP) considerably. Techniques are proposed to mitigate the memory instructions effect: A statical, a mixed statical/dynamical and a fully dynamical technique are proposed. We focus on the fully dynamical technique which enables the out-of-order execution of loads/stores. If a memory dependence fault is detected, the traditional branch misprediction recovery hardware is used for recovery. Since this scheme is not very performant, a dependence-fault predicting cache is introduced.
workshop on computer architecture education | 1998
Jan Van Campenhout; P. Verplaetse; Henk Neefs
We have developed ESCAPE, an easy-to-use, highly interactive portable PC-based simulation environment aimed at the support of computer architecture education. The environment can simulate both a microprogrammed architecture and a pipelined architecture with single pipeline. Both architectures are custom-made, with a certain amount of configurability. Other tools, such as a memory monitor, assembler/disassembler and analysis tools, such as on-the-fly generation of pipeline activity and usage diagrams, are integrated with the environment. Based upon our limited experience with the material so far, we can state that the results are excellent. Students invariantly respond very positively, and the evaluations indicate a far deeper understanding than was previously attainable by using only the traditional textbook-and-paper-problems approach.
annual computer security applications conference | 2000
Lieven Eeckhout; K. De Bosschere; Henk Neefs
Scaling contemporary superscalar microarchitectures to higher levels of parallelism in future technologies seems to be impractical due to the increasing complexity. In this paper, we show that a fixed-length block structured instruction set architecture (BSA), is capable of reducing the hardware complexity and is therefore feasible as an alternative architectural paradigm for traditional architectures with large virtual window sizes for future technologies. This is reached through two major interventions. First, statically, grouping instructions from various basic blocks into larger atomic units of work with a fixed length, called blocks, makes fetching easier. Second, a decentralized microarchitecture reduces the processor core logic significantly resulting in higher clock frequencies. The performance evaluation methodology used in this paper both considers IPC (number of useful instructions retired per clock cycle) and clock cycle period. In addition, a broad design space is explored by quantifying the influence of various microarchitectural parameters on overall performance.
Proceedings 25th EUROMICRO Conference. Informatics: Theory and Practice for the New Millennium | 1999
Lieven Eeckhout; Henk Neefs; K. De Bosschere; J. Van Campenhout
When designing a new micro-architecture, it is difficult to estimate the influence of the architectural parameters on clock period and chip area. In this paper, we use automatic synthesis to investigate the implementation of a novel processor architecture, namely a block structured instruction set architecture (BSA). In a BSA, instructions are statically grouped into fixed-length blocks by the compiler and the execution policy within a block is data-flow. The use of automatic synthesis is forced by the fact that a broad design space is investigated in an early design stage. The three parameters that we specifically focus on are blocksize, instruction selection window size and issue width. Various pipeline configurations are investigated. Moreover, we investigate the effect of technology scaling on the selection of the best architecture and pipeline configuration; we consider both a 0.8 /spl mu/m 2-metal layer CMOS technology and a more advanced 0.25 /spl mu/m 6-metal layer CMOS technology. From this paper we can conclude that a BSA has several implementational benefits over traditional architectures due to the partitioned design and the reduced wiring delays.
Journal of Systems Architecture | 2000
Lieven Eeckhout; Henk Neefs; Koen De Bosschere
An important challenge concerning the design of future microprocessors is that current design methodologies are becoming impractical due to long simulation runs and due to the fact that chip layout considerations are not incorporated in early design stages. In this paper, we show that statistical modeling can be used to speed up the architectural simulations and is thus viable for early design stage explorations of new microarchitectures. In addition, we argue that processor layouts should be considered in early design stages in order to tackle the growing importance of interconnects in future technologies. In order to show the applicability of our methodology which combines statistical modeling and processor layout considerations in an early design stage, we have applied our method on a novel architectural paradigm, namely a fixed-length block structured architecture. A fixed-length block structured architecture is an answer to the scalability problem of current architectures. Two important factors prevent contemporary out-of-order architectures from being scalable to higher levels of parallelism in future deep-submicron technologies: the increased complexity and the growing domination of interconnect delays. In this paper, we show by using statistical modeling and processor layout considerations, that a fixed-length block structured architecture is a viable architectural paradigm for future microprocessors in future technologies thanks to the introduction of decentralization and a reduced register file pressure.
parallel computing | 2000
Lieven Eeckhout; Henk Neefs; Koen De Bosschere