Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Takashi Miyamori is active.

Publication


Featured researches published by Takashi Miyamori.


international solid-state circuits conference | 2008

A 9.7mW AAC-Decoding, 620mW H.264 720p 60fps Decoding, 8-Core Media Processor with Embedded Forward-Body-Biasing and Power-Gating Circuit in 65nm CMOS Technology

Shuou Nomura; Fumihiko Tachibana; Tetsuya Fujita; Chen Kong Teh; Hiroyuki Usui; Fumiyuki Yamane; Yukimasa Miyamoto; Chaiyasit Kumtornkittikul; Hiroyuki Hara; Takahiro Yamashita; Jun Tanabe; Masato Uchiyama; Yoshiro Tsuboi; Takashi Miyamori; Takeshi Kitahara; Hironori Sato; Yuya Homma; Shuuji Matsumoto; Keiko Seki; Yoshinori Watanabe; Mototsugu Hamada; Masafumi Takahashi

A AAC-decoding, H.264 decoding, media processor with embedded forward-body-biasing and power-gating circuit in CMOS technology is proposed. Since all the components necessary for the scheme are simple MOS circuits requiring no extra supply voltages, they can be placed and routed by a commercial CAD tool. A data-mapping flip-flop was proposed as a high performance and low-power flip-flop. It is concluded that the power dissipation in H.264 720p 60fps decoding of 620mW at the process fast corner is the lowest among the processor-based solutions.


IEEE Journal of Solid-state Circuits | 2003

A single-chip MPEG-2 codec based on customizable media embedded processor

Shunichi Ishiwata; Tomoo Yamakage; Yoshiro Tsuboi; Takayoshi Shimazawa; Tomoko Kitazawa; Shuji Michinaka; Kunihiko Yahagi; Hideki Takeda; Akihiro Oue; Tomoya Kodama; Nobu Matsumoto; Takayuki Kamei; Mitsuo Saito; Takashi Miyamori; Goichi Ootomo; Masataka Matsui

A single-chip MPEG-2 MP@ML codec, integrating 3.8M gates on a 72-mm/sup 2/ die, is described. The codec employs a heterogeneous multiprocessor architecture in which six microprocessors with the same instruction set but different customization execute specific tasks such as video and audio concurrently. The microprocessor, developed for digital media processing, provides various extensions such as a very-long-instruction-word coprocessor, digital signal processor instructions, and hardware engines. Making full use of the extensions and optimizing the architecture of each microprocessor based upon the nature of specific tasks, the chip can execute not only MPEG-2 MP@ML video/audio/system encoding and decoding concurrently, but also MPEG-2 MP@HL decoding in real time.


IEEE Journal of Solid-state Circuits | 2011

A 40 nm 222 mW H.264 Full-HD Decoding, 25 Power Domains, 14-Core Application Processor With x512b Stacked DRAM

Yu Kikuchi; Makoto Takahashi; Tomohisa Maeda; Masatoshi Fukuda; Yasuhiro Koshio; Hiroyuki Hara; Hideho Arakida; Hideaki Yamamoto; Yousuke Hagiwara; Tetsuya Fujita; Manabu Watanabe; Hirokazu Ezawa; Takayoshi Shimazawa; Yasuo Ohara; Takashi Miyamori; Mototsugu Hamada; Masafumi Takahashi; Yukihito Oowaki

In this paper we introduce a 14-core application processor for multimedia mobile applications, implemented in 40 nm, with a 222 mW H.264 full high-definition (full-HD) video engine, a 124 mW 40 M-polygons/s 3D/2D graphics engine, and a video/audio multiprocessor for various Codecs and image processing. The application processor has 25 power domains to achieve coarse-grain power gating for adjusting to the required performance of wide range of multimedia applications. The simple on-chip power switch circuits perform less than 1 μs switching while reducing rush current. Furthermore, the Stacked Chip SoC (SCS) technology enables rewiring to the DRAM chip during assembly/packaging phase using a wire with 10 μm minimum pitch on Re-Distribution Layer (RDL) using electroplating. The peak memory bandwidth is 10.6 GB/s with an x512b SCS-DRAM interface, and the power consumption of this interface is 3.9 mW at 2.4 GB/s workload.


symposium on vlsi circuits | 2012

A low power many-core SoC with two 32-core clusters connected by tree based NoC for multimedia applications

Hui Xu; Jun Tanabe; Hiroyuki Usui; Soichiro Hosoda; Toru Sano; Kazumasa Yamamoto; Takeshi Kodaka; Nobuhiro Nonogaki; Nau Ozaki; Takashi Miyamori

A low-power many-core SoC for multimedia applications is implemented in 40nm CMOS technology. Within a 210mm2 die, two 32-core clusters are integrated with dynamically reconfigurable processors, hardware accelerators, 2-channel DDR3 I/Fs, and other peripherals. Processor cores in the cluster share a 2MB L2 cache connected through a tree-based Network-on-Chip (NoC). The high scalability and low power consumption are accomplished by parallelized firmware for multimedia applications, such as the H.264 1080p 30fps decoding under 500mW and the super resolution 4K2K 15fps image processing under 800mW.


international solid-state circuits conference | 2010

A 222mW H.264 Full-HD decoding application processor with x512b stacked DRAM in 40nm

Yu Kikuchi; Makoto Takahashi; Tomohisa Maeda; Hiroyuki Hara; Hideho Arakida; Hideaki Yamamoto; Yousuke Hagiwara; Tetsuya Fujita; Manabu Watanabe; Takayoshi Shimazawa; Yasuo Ohara; Takashi Miyamori; Mototsugu Hamada; Masafumi Takahashi; Yukihito Oowaki

Todays multimedia mobile devices must support a wide range of multimedia applications in addition to full high-definition (Full-HD) video processing. Conventional hardware engine approaches [1-3] cannot handle new applications that may be required once the chips are fabricated. We report an application processor with a hybrid architecture that combines a software solution with a multi-core processor [4] for various applications and a hardware solution with hardware engines for low-power and specific high-performance tasks such as Full-HD video and 3D graphics. Another issue faced in multimedia mobile devices is to achieve high memory bandwidth with low power consumption. DDR memory connections in System-in-Package (SiP) technologies need a large number of I/Os or high interface frequency at the expense of high power consumption. A Chip-on-Chip (CoC) connection using micro-bumps [5] is a power-efficient technology to achieve high memory bandwidth and low power. However, in the case of the conventional CoC technique, customized DRAM chips are necessary, because wiring between a logic chip and a DRAM chip is implemented on the metal layers in the DRAM chip. To use a DRAM chip for multiple logic LSIs, the Stacked-Chip SoC (SCS) technology used for this application processor enables rewiring at the assembly/packaging phase using minimum 5µm-pitch metal wiring on the Re-Distribution Layer (RDL). We also report an on-chip power switch with a simple structure that inhibits rush currents. The application processor has 25 power domains and controls these domains finely to optimize for various ranges of performance requirements.


international solid-state circuits conference | 2015

18.2 A 1.9TOPS and 564GOPS/W heterogeneous multicore SoC with color-based object classification accelerator for image-recognition applications

Jun Tanabe; Sano Toru; Yutaka Yamada; Tomoki Watanabe; Mayu Okumura; Manabu Nishiyama; Tadakazu Nomura; Kazushige Oma; Nobuhiro Sato; Moriyasu Banno; Hiroo Hayashi; Takashi Miyamori

Image recognition technologies have gained prominence in a variety of fields, such as automotive and surveillance, with dedicated image-recognition ICs being developed recently [1-2]. Image recognition ICs for an advanced driver assistance system (ADAS) have also been proposed [3]. However, future ADAS applications must support greater numbers of real-time recognition processes simultaneously, with higher detection rates and lower false-positive rates. For instance, adaptive cruise control (ACC), an application of ADAS, comprises many image recognition processes, such as pedestrian detection (PD), vehicle detection (VD), general obstacle detection (GOD), lane detection (LD), traffic light recognition (TLR), and traffic sign recognition (TSR). ACC also requires high detection accuracy to prevent unnecessary braking or acceleration. To satisfy these requirements, we have developed an SoC with two 4-core processor clusters and 14 hard-wired accelerators. It is designed to realize the six recognition processes (PD, VD, GOD, LD, TLR, and TSR) for ACC and automatic high beam (AHB) for headlight control. It achieves 1.9TOPS peak performance in 3.37W. This low power consumption enables the SoC to operate with passive cooling in a high-temperature automotive environment.


IEEE Transactions on Very Large Scale Integration Systems | 2009

A VLIW Vector Media Coprocessor With Cascaded SIMD ALUs

Takahisa Wada; Shunichi Ishiwata; Katsuyuki Kimura; Keiri Nakanishi; Masato Sumiyoshi; Takashi Miyamori; Masaki Nakagawa

High-definition video applications, such as digital TV and digital video cameras, require high processing performance for high-quality visual images in addition to a complex video CODEC. Pre-/postprocessing to improve video quality is becoming much more important because requirements for pre-/postprocessing vary among applications and processing algorithms have not been stabilized. Therefore, a new processor architecture that has a highly parallel datapath is needed. In this paper, we introduce a VLIW vector media coprocessor, ldquovector coprocessor (VCP),rdquo that includes three asymmetric execution pipelines with cascaded SIMD ALUs. To improve performance efficiency, we reduce the area ratio of the control circuit while increasing the ratio of the arithmetic circuit. The total gate count of VCP is 1268 kgates and its maximum operating frequency is 300 MHz at 90-nm CMOS process. Some of the processing kernels in an adaptive prefilter that is applied to preprocessing for video encoding are evaluated. In the case of the edgeness and the sum of absolute differences, the performance is 183 giga operations per second. VCP offers enough performance for HD video processing and good cost-performance while all processing pipeline units operate effectively.


custom integrated circuits conference | 2002

A single-chip MPEG-2 codec based on customizable media microprocessor

Shunichi Ishiwata; Tomoo Yamakage; Yoshiro Tsuboi; Takayoshi Shimazawa; Tomoko Kitazawa; Shuji Michinaka; Kunihiko Yahagi; Hideki Takeda; Akihiro Oue; Tomoya Kodama; Nobu Matsumoto; Takayuki Kamei; Takashi Miyamori; Goichi Ootomo; Masataka Matsui

A single-chip MPEG2 MP@ML codec, integrating 3.8M gates on a 72mm/sup 2/ die, is described. It has a heterogeneous multiprocessor architecture in which six microprocessors with the same instruction set but different customization execute specific tasks such as video, audio etc. concurrently. The microprocessor, developed for digital media processing, provides various extensions such as a VLIW one and a DSP one inherent in its architecture. Making full use of the extensions, the chip executes encoding and decoding of video, audio and system concurrently in real time.


ieee computer society international conference | 1989

The effectiveness of TRONCHIP instructions in the TX1 system

Hidechika Kishigami; Takashi Miyamori; Misao Miyata

A description is given of the architecture of the TX1, which is the first 32-bit microprocessor of the Toshiba TX series. The TX1 supports 92 instructions including high-level instructions for efficient use of compilers and operating systems. The effectiveness of the high-level instructions was evaluated by comparing their execution cycles on the TX1 board computer with their equivalent programs using only basic instructions, and it was found that they could execute about two or four times as fast as the equivalent programs. About 27% performance improvement was achieved by using the high-level instructions in the Dhrystone benchmark program.<<ETX>>


ieee computer society international conference | 1988

Design considerations for 32-bit microprocessor TX3

Kosei Okamoto; Misao Miyata; Hidechika Kishigami; Takashi Miyamori; T. Sato

The architecture of the TX3 implementation of the TRON-CHIP32 specification is discussed. TX3 supports the full instruction set, including the decimal, floating-point, and other complex instructions. Average performance above 10-MIPS is expected. This performance level is obtained by the use of an 8-kB instruction cache, 8-kB data cache, decoded instruction loop buffer, three instruction execution units, and the ability to issue up to two instructions per cycle.<<ETX>>

Collaboration


Dive into the Takashi Miyamori's collaboration.

Researchain Logo
Decentralizing Knowledge