Kozo Kimura | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kozo Kimura is active.

Explore More

Publication

Featured researches published by Kozo Kimura.

international symposium on computer architecture | 1992

An elementary processor architecture with simultaneous instruction issuing from multiple threads

Hiroaki Hirata; Kozo Kimura; Satoshi Nagamine; Yoshiyuki Mochizuki; Akio Nishimura; Yoshimori Nakase; Teiji Nishizawa

In this paper, we propose a multithreaded processor architecture which improves machine throughput. In our processor architecture, instructions from different threads (not a single thread) are issued simultaneously to multiple functional units, and these instructions can begin execution unless there are functional unit conflicts. This parallel execution scheme greatly improves the utilization of the functional unit. Simulation results show that by executing two and four threads in parallel on a nine-functional-unit processor, a 2.02 and a 3.72 times speed-up, respectively, can be achieved over a conventional RISC processor. Our architecture is also applicable to the efficient execution of a single loop. In order to control functional unit conflicts between loop iterations, we have developed a new static code scheduling technique. Another loop execution scheme, by using the multiple control flow mechanism of our architecture, makes it possible to parallelize loops which are difficult to parallelize in vector or VLIW machines.

international symposium on industrial electronics | 1994

Evaluation method of microarchitecture for multithreaded processor

Kozo Kimura; Hiroaki Hirata; Tokuzo Kiyohara; S. Ashara; Takayuki Sagishima; Takao Onoye; Isao Shirakawa

A multithreaded processor is a good approach to increase the performance by utilizing coarse grain parallelism. The execution of multiple threads in parallel makes a performance prediction difficult because of a complicated behavior. Thus instruction-level simulation is necessary for a performance evaluation. In practice, it is very difficult to select optimum configuration of microarchitecture through a simulation of wide variety of candidates because of a long simulation time. The paper presents an evaluation method of microarchitecture for multithreaded processors. The method consists of three steps; first, the characteristics of the application are analysed, secondly, the candidates of microarchitecture are selected in consideration of the characteristics, lastly, the selected architectures are evaluated through the instruction-level simulation using practical application program. The experimental results using computer graphics application show that the proposed evaluation method of microarchitecture are very effective in order to increase the performance of multithreaded processors.<<ETX>>

international symposium on circuits and systems | 1994

Multithreaded processor for image generation

Takayuki Sagishima; Kozo Kimura; Hiroaki Hirata; Tokuzo Kiyohara; Shigeo Asahara; Takao Onoye; Isao Shirakawa

Multiple instruction execution is a major approach to designing high-performance processors. Superscalar and VLIW processor that utilize instruction level parallelism are usually focused on. On the other hand, the multithreaded processor can be expected to achieve a high degree of multiple instruction execution by utilizing coarse grain parallelism. Many computer graphics applications (such as the radiosity method and ray-tracing method) can be optimized by reorganizing the code to take advantage of coarse grain parallelism, but the degree of instruction level parallelism is not sufficient for a superscalar processor. Experimental result using the radiosity method shows that the 4-thread multithreaded processor achieves 2.9 times speedup over single thread, while the 4-issue superscalar processor manages around 1.5 times. By duplicating two kinds of function units, the performance of a multithreaded processor increases to 3.7 times, but the performance of a superscalar processor is saturated at around 1.5 times. Therefore, for computer graphics applications, the multithreaded processor is a better approach than the superscalar processor.<<ETX>>

The Journal of The Institute of Image Information and Television Engineers | 1998

Control method of Data Cache for Multithreaded Processor.

Kozo Kimura; Hiroyuki Okuhata; Takao Onoye; Isao Shirakawa; Tokuzo Kiyohara; Takayuki Sagishima

In this paper, we present a control method of data cache for a multithreaded processor and its evaluation. A multithreaded processor is effective for 3D-CG, however the increase of the working set size is unavoidable, and this limits the effectiveness of the data cache. Usually, the size and/or the associativity of the cache are increased in order to achieve a higher cache hit rate. This causes the chip size to increase, but the performance remains limited. An inter-thread non-blocking cache control method is proposed for reducing cache miss penalties. This control method achieves higher performance than the blocking cache method and also requires much less hardware cost than a traditional non-blocking cache method. In the case of the proposed cache control method, the performance degradation decreases to half and the performance ratio achieves 80-90% of an ideal cache case.

Archive | 1997