Noritsugu Nakamura | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Noritsugu Nakamura is active.

Explore More

Publication

Featured researches published by Noritsugu Nakamura.

international solid-state circuits conference | 2000

A 16 Mb 400 MHz loadless CMOS four-transistor SRAM macro

Koichi Takeda; Yoshiharu Aimoto; Noritsugu Nakamura; H. Toyoshima; Takahiro Iwasaki; Kenji Noda; Koujirou Matsui; Shinya Itoh; Sadaaki Masuoka; Tadahiko Horiuchi; Atsushi Nakagawa; Kenju Shimogawa; Hiroyuki Takahashi

0.18 /spl mu/m logic process technologies have recently been used to develop a loadless CMOS four-transistor SRAM cell (4T-cell) whose size (1.934 /spl mu/m/sup 2/) is only 56% that of a conventional six-transistor SRAM cell (6T-cell). Using this 4T-cell technology. The authors present a 16 Mb, 400 MHz SRAM macro which features: (1) an end-point dual-pulse driver (EDD) for stable data hold and minimum cycle time, (2) word-line-voltage-level compensation (WLC) for stable static data hold, and (3) an all-adjoining twist bit-line (ATBL) to reduce bit-line coupling capacitance.

IEEE Transactions on Electron Devices | 2001

A loadless CMOS four-transistor SRAM cell in a 0.18-/spl mu/m logic technology

Kenji Noda; Koujirou Matsui; Koichi Takeda; Noritsugu Nakamura

This paper presents a loadless CMOS four-transistor (4T) cell for very high density embedded SRAM applications. Using 0.18-/spl mu/m CMOS technology, the memory cell size is 1.9344 /spl mu/m/sup 2/ (1.04 /spl mu/m/spl times/1.86 /spl mu/m), which is 35% smaller than a six-transistor (6T) cell using the same design rule. The newly developed CMOS 4T-SRAM cell operates with high stability at 1.8 V, even though its designed cell ratio is 1.0 to minimize the area. A pair of pMOS transfer transistors is used to store and retain full-swing signals in the cell without a refresh cycle. The fabrication process is fully compatible with high-performance CMOS logic technologies, because there is no need to integrate a poly-Si resistor or a TFT load.

IEEE Journal of Solid-state Circuits | 2001

An ultrahigh-density high-speed loadless four-transistor SRAM macro with twisted bitline architecture and triple-well shield

Kenji Noda; Koichi Takeda; Koujirou Matsui; Sadaaki Masuoka; H. Kawamoto; N. Ikezawa; Yoshiharu Aimoto; Noritsugu Nakamura; Takahiro Iwasaki; H. Toyoshima; Tadahiko Horiuchi

We have developed two schemes for improving access speed and reliability of a loadless four-transistor (LL4T) SRAM cell: a dual-layered twisted bitline scheme, which reduces coupling capacitance between adjacent bitlines in order to achieve highspeed READ/WRITE operations, and a triple-well shield, which protects the memory cell from substrate noise and alpha particles. We incorporated these schemes in a high-performance 0.18-/spl mu/m-generation CMOS technology and fabricated a 16-Mb SRAM macro with a 2.18-/spl mu/m/sup 2/ memory cell. The macro size of the LL4T-SRAM is 56 mm/sup 2/, which is 30% to 40% smaller than a conventional six-transistor SRAM when compared with the same access speed. The developed macro functions at 500 MHz and has an access time of 2.0 ns. The standby current has been reduced to 25 /spl mu/A/Mb with a low-leakage nMOSFET in the memory cell.

field-programmable technology | 2013

Optimizing time and space multiplexed computation in a dynamically reconfigurable processor

Takao Toi; Noritsugu Nakamura; Taro Fujii; Toshiro Kitaoka; Katsumi Togawa; Koichiro Furuta; Toru Awashima

One of the characteristics of our coarse-grained dynamically reconfigurable processor is that it uses the same operational resource for both control-intensive and dataintensive code segments. We maximize throughput from the knowledge of high-level synthesis under timing constraints. Because the optimal clock speeds for both code segments are different, a dynamic frequency control is introduced to shorten the total execution time. A state transition controller (STC) that handles the control step can change the clock speed for every cycle. For control-intensive code segments, the STC delay is shortened by a rollback mechanism, which looks ahead to the next control step and rolls back if a different control step is actually selected. For the data-intensive code segments, the delay is shortened by fully synchronized synthesis. Experimental results show that throughputs have increased from 18% to 56% with the combination of these optimizations. A chip was fabricated with our 40-nm low-power process technology.

Ipsj Transactions on System Lsi Design Methodology | 2010

High-level Synthesis Challenges for Mapping a Complete Program on a Dynamically Reconfigurable Processor

Takao Toi; Noritsugu Nakamura; Yoshinosuke Kato; Toru Awashima; Kazutoshi Wakabayashi

This paper presents a high-level synthesizer to map a complete program efficiently on a dynamically reconfigurable processor (DRP). Initially, we introduce our DRP architecture, which is suitable for control-intensive programs since it has a stand-alone finite state machine that switches “contexts” consisting of many processing elements (PEs). Then, we propose three new techniques optimized for our DRP. Firstly, we explain how synthesized control steps are mapped onto the contexts. Several control steps are combined as a context to utilize PEs efficiently since each control step does not require the same amount of operational units. Secondly, we describe a modulo scheduling algorithm for loop pipelining, considering both spatial and time dimensions of our DRP. Lastly, we explain a scheduling technique to optimize clock frequency, which can take advantage of multiplexer, wire and routing switch delays. We have demonstrated a JPEG-based image decoder example to evaluate our methods. Experimental results show that high area efficiency is achieved by balancing the number of PEs between contexts. Despite an overall increase in performance on pipelining of 3.6 times that without pipelining, the number of operational units increased by a factor of 2.2. The clock frequency is maximized with accurate delay estimation.

Archive | 2003

Array-type processor

Taro Fujii; Koichiro Furuta; Masato Motomura; Kenichiro Anjo; Yoshikazu Yabe; Toru Awashima; Takao Toi; Noritsugu Nakamura

international conference on computer aided design | 2006

High-level synthesis challenges and solutions for a dynamically reconfigurable processor

Takao Toi; Noritsugu Nakamura; Yoshinosuke Kato; Toru Awashima; Kazutoshi Wakabayashi; Li Jing

Archive | 2006

Data processing device and method, computer program, information storage medium, parallel arithmetic unit and data processing system

Toru Awashima; Taro Fujii; Kouichirou Furuta; Yoshiyuki Miyazawa; Masato Motomura; Noritsugu Nakamura; Takao Toi; 典嗣中村; 浩一朗古田; 義幸宮沢; 崇雄戸井; 真人本村; 亨粟島; 太郎藤井

Archive | 2002