Toru Awashima
NEC
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Toru Awashima.
field-programmable technology | 2004
Masayasu Suzuki; Yohei Hasegawa; Yutaka Yamada; Naoto Kaneko; Katsuaki Deguchi; Hideharu Amano; Kenichiro Anjo; Masato Motomura; Kazutoshi Wakabayashi; Takao Toi; Toru Awashima
Dynamically reconfigurable processor (DRP) developed by NEC Electronics is a coarse grain reconfigurable processor that selects a data path from the on-chip repository of sixteen circuit configurations, or contexts, to implement different logic on one single DRP chip. Several stream applications have been implemented on the DRP-1, the first prototype chip, and evaluation results are presented. By pipelining the executions, DRP-1 outperformed Pentium III/4, embedded CPU MIPS64, and Texas Instruments DSP TMS320C67J3 in some stream application examples. We also present programming techniques applicable on dynamically reconfigurable processors and discuss their feasibility in boosting system performance.
field-programmable custom computing machines | 2004
Noriaki Suzuki; Shunsuke Kurotaki; Masayasu Suzuki; Naoto Kaneko; Yutaka Yamada; Katsuaki Deguchi; Yohei Hasegawa; Hideharu Amano; Kenichiro Anjo; Masato Motomura; Kazutoshi Wakabayashi; Takeo Toi; Toru Awashima
Dynamically reconfigurable processor (DRP) developed by NEC electronics is a coarse grain reconfigurable processor that selects a data path from the on-chip repository of sixteen circuit configurations, or contexts, to implement different logic on one single DRP chip. Several stream applications have been implemented on DRP-1, the first prototype chip, and evaluation results are presented. By computing parallelly using the processing elements(PEs) and distributed memory modules, DRP-1 outperformed pentium III/4 and embedded CPU MIPS64 in some stream application examples. We also present programming techniques applicable on reconfigurable processors and discuss their feasibility in boosting system performance.
field-programmable technology | 2013
Takao Toi; Noritsugu Nakamura; Taro Fujii; Toshiro Kitaoka; Katsumi Togawa; Koichiro Furuta; Toru Awashima
One of the characteristics of our coarse-grained dynamically reconfigurable processor is that it uses the same operational resource for both control-intensive and dataintensive code segments. We maximize throughput from the knowledge of high-level synthesis under timing constraints. Because the optimal clock speeds for both code segments are different, a dynamic frequency control is introduced to shorten the total execution time. A state transition controller (STC) that handles the control step can change the clock speed for every cycle. For control-intensive code segments, the STC delay is shortened by a rollback mechanism, which looks ahead to the next control step and rolls back if a different control step is actually selected. For the data-intensive code segments, the delay is shortened by fully synchronized synthesis. Experimental results show that throughputs have increased from 18% to 56% with the combination of these optimizations. A chip was fabricated with our 40-nm low-power process technology.
Ipsj Transactions on System Lsi Design Methodology | 2010
Takao Toi; Noritsugu Nakamura; Yoshinosuke Kato; Toru Awashima; Kazutoshi Wakabayashi
This paper presents a high-level synthesizer to map a complete program efficiently on a dynamically reconfigurable processor (DRP). Initially, we introduce our DRP architecture, which is suitable for control-intensive programs since it has a stand-alone finite state machine that switches “contexts” consisting of many processing elements (PEs). Then, we propose three new techniques optimized for our DRP. Firstly, we explain how synthesized control steps are mapped onto the contexts. Several control steps are combined as a context to utilize PEs efficiently since each control step does not require the same amount of operational units. Secondly, we describe a modulo scheduling algorithm for loop pipelining, considering both spatial and time dimensions of our DRP. Lastly, we explain a scheduling technique to optimize clock frequency, which can take advantage of multiplexer, wire and routing switch delays. We have demonstrated a JPEG-based image decoder example to evaluate our methods. Experimental results show that high area efficiency is achieved by balancing the number of PEs between contexts. Despite an overall increase in performance on pipelining of 3.6 times that without pipelining, the number of operational units increased by a factor of 2.2. The clock frequency is maximized with accurate delay estimation.
Archive | 2003
Taro Fujii; Koichiro Furuta; Masato Motomura; Kenichiro Anjo; Yoshikazu Yabe; Toru Awashima; Takao Toi; Noritsugu Nakamura
international conference on computer aided design | 2006
Takao Toi; Noritsugu Nakamura; Yoshinosuke Kato; Toru Awashima; Kazutoshi Wakabayashi; Li Jing
field-programmable technology | 2005
Yohei Hasegawa; Shohei Abe; Hiroki Matsutani; Hideharu Amano; Kenichiro Anjo; Toru Awashima
Archive | 2006
Toru Awashima; Taro Fujii; Kouichirou Furuta; Yoshiyuki Miyazawa; Masato Motomura; Noritsugu Nakamura; Takao Toi; 典嗣 中村; 浩一朗 古田; 義幸 宮沢; 崇雄 戸井; 真人 本村; 亨 粟島; 太郎 藤井
Archive | 2002
Takao Toi; Toru Awashima; Yoshiyuki Miyazawa; Noritsugu Nakamura; Taro Fujii; Koichiro Furuta; Masato Motomura
Archive | 2002
Takao Toi; Toru Awashima; Yoshiyuki Miyazawa; Noritsugu Nakamura; Taro Fujii; Koichiro Furuta; Masato Motomura