Shorin Kyo
NEC
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shorin Kyo.
international symposium on computer architecture | 2005
Shorin Kyo; Shin’ichiro Okazaki; Tamio Arai
Embedded processors for video image recognition require to address both the cost (die size and power) versus real-time performance issue, and also to achieve high flexibility due to the immense diversity of recognition targets, situations, and applications. This paper describes IMAP, a highly parallel SIMD linear processor and memory array architecture that addresses these trading-off requirements. By using parallel and systolic algorithmic techniques, despite of its simple architecture IMAP achieves to exploit not only the straightforward per image row data level parallelism (DLP), but also the inherent DLP of other memory access patterns frequently found in various image recognition tasks, under the use of an explicit parallel C language (IDC). We describe and evaluate IMAP-CE, a latest IMAP processor, which integrates 128 of 100MHz 8 bit 4-way VLIW PEs, 128 of 2KByte RAMs, and one 16 bit RISC control processor, into a single chip. The PE instruction set is enhanced for supporting IDC codes. IMAP-CE is evaluated mainly by comparing its performance running IDC codes with that of a 2.4GHz Intel P4 running optimized C codes. Based on the use of parallelizing techniques, benchmark results show a speedup of up to 20 for image filter kernels, and of 4 for a full image recognition application.
international conference on intelligent transportation systems | 1999
Shorin Kyo; Takuya Koga; Kazuyuki Sakurai; Shin'ichiro Okazaki
We present a robust vehicle detecting and tracking system for highway scenes of both dry and wet weather conditions taken from a forward-looking vehicle mounted camera. The system comprises the potential vehicle search, vehicle validation, and vehicle tracking processes. In order to overcome reduced visibility conditions, image normalization is performed automatically according to input image contrast and a weak edge grouping technique is used for preventing mass detections during the potential vehicle search process. The system runs at a rate of 15 frames/sec using a PC with the IMAP-VISION realtime image processing board. Results of experiments on image sequences of various highway scenes are presented.
IEEE Transactions on Computers | 2007
Shorin Kyo; Shin'ichiro Okazaki; Tamio Arai
Embedded processors for video image recognition in most cases not only need to address the conventional cost (die size and power) versus real-time performance issue, but must also maintain high flexibility due to the immense diversity of recognition targets, situations, and applications. This paper describes IMAP, a highly parallel SIMD linear processor and memory array architecture that addresses these trade-off requirements. By using parallel and systolic algorithmic techniques, but based on a simple linear array architecture, IMAP successfully exploits not only the straightforward per-image row data level parallelism (DLP), but also the inherent DLP of other memory access patterns frequently found in various image recognition tasks, while allowing programming to be done using an explicit parallel C language (1DC). We describe and evaluate IMAP-CE, one of the latest IMAP processors, integrating 128 100 MHz 8 bit 4-way VLIW PEs, 128 2 KByte RAMs, and one 16 bit RISC control processor onto a single chip. The PE instruction set is enhanced to support 1DC code. The die size of IMAP-CE is 11 times11 mm2 integrating 32.7 M transistors, while the power consumption is, on average, approximately 2 watts. IMAP-CE is evaluated mainly by comparing its performance while running 1DC code with that of a 2.4 GHz Intel P4 running optimized C code. Based on the use of parallelizing techniques, benchmark results show a speed increase of up to 20 times for image filter kernels and of 4 times for a full image recognition application
international conference on supercomputing | 2007
Shorin Kyo; Takuya Koga; Lieske Hanno; Shouhei Nomoto; Shin'ichiro Okazaki
A scalable SIMD/MIMD mixed-mode parallel processor architecture called XC core is proposed to meet the high and diverse performance requirements of embedded multimedia applications. XC core supports both the SIMD and MIMD computing models at low hardware cost by dynamically reconfiguring itself into datapath circuits or control circuits, i.e., trading off between performance and flexibility. A control processor is used to broadcast instructions to a whole SIMD PE (Processing Element) array or to a part of it while assigning a separate program to each PU (Processing Unit), that is mainly composed of the hardware resources of several PEs. RTL synthesis results show that area overhead for reconfiguration is merely 10% of the total area. Benchmark results show that the SIMD mode is effectively achieving high performance towards the regular and massive data parallelism portions of applications, while the MIMD mode enables acceleration of the remaining part of applications whose implementation using a pure highly parallel SIMD architecture would otherwise be impossible. The results show that the XC core design is competitive against more complex processors, with respect to both its cost efficiency as a highly parallel SIMD processor and its flexibility as a multicore MIMD processor, against a wide range of applications.
international conference on image processing | 2001
Shorin Kyo; Takuya Koga; Shin'ichiro Okazaki
IMAP-CE is the fourth generation of a series of SIMD linear processor arrays based on the IMAP (integrated memory array processor) architecture. The aim of IMAP-CE is to provide a compact, cost effective and yet high performance solution for various embedded real-time vision applications, especially for vision based driving assistance applications in the ITS (intelligent transportation system) fields. IMAP-CE integrates 128 VLIW processing elements, and a RISC control processor which provides the single instruction stream for the processor array. The peak performance of IMAP-CE is up to 51.2 GOPS operating under 100 MHz. This paper describes the design features of IMAP-CE, its enhanced instruction set for image processing, and the estimated performance.
asia and south pacific design automation conference | 2008
Shorin Kyo; Shin’ichiro Okazaki
This paper describes existing designs and future design trends of in-vehicle vision processors for driver assistance systems. First, requirements of vision processors for driver assistance systems are summarized. Next, the characteristics of vision tasks for safety are described. Then several in-vehicle vision processor LSI implementations are reviewed, and the design approach of one of them, the IMAPCAR highly parallel processor, is further described in detail. Finally, future trends of in-vehicle vision processors focusing on their architectures and application coverage expansion such as integration of vision for safety, Digital TV codec, and 3D graphics functions of future car navigation, are discussed.
symposium on vlsi circuits | 2008
Shorin Kyo; Shin'ichiro Okazaki; Takuya Koga; Fumiyuki Hidano
A 100GOPS vision processor LSI (IMAPCAR) for in-vehicle image recognition which consumes less than 2 watts of power has been developed. 128 of 4-way VLIW with MAC (multiply add accumulation) processor elements (PE) to which data are assigned efficiently by DMA companion scaling capability, has achieved high performance in low cost. Compared with a previous design, performance for major vision tasks has been improved by a factor of 2.5 while 50% of power is reduced.
IEICE Transactions on Electronics | 2007
Ichiro Kuroda; Shorin Kyo
This paper presents media processor architectures for automotive applications. Media processing applications with their requirements for LSI implementations are first described for vision based driver assistance as well as graphical user interface for car navigation using 3D graphics. Then, parallel processing architectures for vision and graphics in these applications are reviewed with their performance and cost. After that, future trends of automotive media processing such as integration of vision and 3D graphics functions are shown with their applications and the required performance. Moreover, parallel processing architectures are discussed for the integration of vision and graphics. Finally, an prospect of a next-generation media processing LSI for automotives is provided.
international conference on intelligent transportation systems | 2003
Shorin Kyo; Takuya Koga; Shin'ichiro Okazaki; Ichiro Kuroda
This paper describes a fully programmable parallel processor LSI which integrates 128 SIMD RISC microprocessors, each operates in 100 MHz. The LSI achieves simultaneous and real-time multiple processing of driver assistance video recognition applications in software, while at the same time satisfies power efficiency requirement of an in-vehicle LSI. Based on four basic parallel methods and a software development environment including an optimizing compiler of an extended C language and video-based GUI tools, efficient development of real-time video recognition applications which effectively utilize the 128 micro-processors are facilitated. Result of a benchmark test using a high level language written for a robust lane-mark and vehicle detection application shows that the LSI can provide a four times better performance compared with a 2.4 GHz general purpose processor.
Ipsj Transactions on System Lsi Design Methodology | 2010
Shouhei Nomoto; Shorin Kyo; Shin'ichiro Okazaki
We have developed an “XC core” processor that achieves low cost, high performance, and low power consumption through the use of a highly parallel SIMD architecture (the SIMD mode), as well as achieves high flexibility by morphing into a MIMD architecture (MIMD mode). In this paper, we evaluate the effectiveness of the MIMD mode by using a white line detection algorithm for open roads. Our evaluation shows that the algorithm can be processed in real time (less than 33ms) by using the MIMD mode to execute verification of white line segments, which is a part of the algorithm not suitable to be executed by the SIMD mode. We also show that the verification can be executed five times faster by using region of interest (ROI) transfer instructions to efficiently transfer the ROI of an image. Furthermore, we also measured the execution time in the MIMD mode with changing the number of processing units (PUs) used, from 2 to 4, 8, 16 and 32. The measured results show that the performance improvement rate slows down when using more than 16 PUs in the MIMD mode, mainly due to insufficient parallelism in the verification process. Overall, a 10.68 times speedup was achieved by using 32 PUs in the MIMD mode, compared with only using the SIMD mode.