Nobuyuki Oba | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nobuyuki Oba is active.

Explore More

Publication

Featured researches published by Nobuyuki Oba.

international conference on computer design | 1990

QRAM-Quick access memory system

Hideto Niijima; Nobuyuki Oba

A quick access memory (QRAM) was developed that realizes a cost-effective high-performance memory architecture. The QRAM improves the effective data access speed by making maximum use of the page mode of memory, and hence acts like a pseudo-cache memory. For high performance and usability, it has three special features: built-in address latches/comparators, a direct handshake facility, and a multiple active island structure. It can communicate directly with the microprocessor by handshaking of the memory request and the memory ready. This reduces the amount of external logic needed for the memory system. The QRAM can be made with conventional technology.<<ETX>>

international phoenix conference on computers and communications | 1990

Top-1: a snoop-cache-based multiprocessor

Nobuyuki Oba; Atsushi Moriwaki; Shigenori Shimizu

A novel cache coherence protocol and the performance analysis of a snoop-cache-based multiprocessor, TOP-1, which is tightly coupled and has pure shared memory, are presented. TOP-1 has two 64-b buses with interleaved address access to provide a high data-transfer rate, and a large snoop cache to provide a high cache hit ratio. It also has a TOP-1 hybrid coherence protocol, which allows a write-update protocol and a write-invalidate protocol to exist simultaneously. These protocols can be dynamically changed on the fly without any coherence problem. Each processor card has a statistics unit which collects various important statistical data without any hardware overhead. An overview of the TOP-1 architecture and its concepts is presented. The authors also present the TOP-1 hybrid protocol and explain how it works. They discuss the TOP-1 protocol and its performance by comparing the write-update and write-invalidate protocols.<<ETX>>

international conference on computer design | 2001

3DCGiRAM: an intelligent memory architecture for photo-realistic image synthesis

Hiroaki Kobayashi; Ken-Ichi Suzuki; Kentaro Sano; Yoshiyuki Kaeriyama; Yasumasa Saida; Nobuyuki Oba; Tadao Nakamura

This paper proposes an intelligent memory architecture for photo-realistic image synthesis, named 3DCGiRAM. The 3DCGiRAM has a hardware-accelerated 3D line generator which finds objects that are likely to intersect traced rays. It also has functional memory cells, each of which is composed of graphics logic and its local memory to detect intersecting objects and to calculate intensities. A distributed frame buffer is employed to alleviate the access conflicts of functional cells to the frame buffer as well as to compose globally illuminated intensities at screen pixels. As the graphics processing capability is localized to data in the 3DCGiRAM through memory-logic merged LSI technology, a scalability and modularity similar to those of conventional memory modules can be expected. The experimental results show that a single 3DCGiRAM module running at 200 MHz with a memory bandwidth of 6.4 GB will be able to synthesize a ray-traced walk-through animation at a rate of one frame per second.

global communications conference | 1994

StarCore: a high-speed ATM switching system

Nobuyuki Oba; Ken-Ichi Suzuki; Hiroaki Kobayashi; Tadao Nakamura

This paper presents a cell scheduling algorithm and its hardware implementation used in an ATM switching system, StarCore. Output contention is resolved by the hardware arbiters in a weighted round-robin fashion, which takes account of the bandwidths allocated to the virtual circuits as well as the priority classes. The arbiter consists of primitive logic gates, which are beneficial for CMOS VLSI implementation, and therefore it gives high-speed arbitration. The circuit simulations indicate that the time for the arbitration of a 64-input switch is 4.2 nsec using 0.7-/spl mu/m CMOS VLSI technology. The simulations show that StarCore provides lower cell loss probabilities than the conventional round-robin method.

international conference on computer communications | 1993

An adaptive network routing method by electrical-circuit modeling

Nobuyuki Oba; Hiroaki Kobayashi; Tadao Nakamura

A routing control method called potential routing is proposed for packet communication in computer networks. Potential routing models a computer network as an electrical circuit, and performs packet routing according to the potential differences between adjacent nodes. The node potentials are first given by Kirchhoffs law and are then dynamically adjusted according to the traffic situation. Potential routing can be applied to arbitrary network topologies; it takes account of the global network topology in determining the route. The routing table is easily and therefore quickly computed by Kirchhoffs law, by solving simple simultaneous equations; no convergence problem arises. Moreover, potential routing does not involve the ping-pong (loop) problem. It is verified by simulation that potential routing shortens transmission delays, especially when the traffic is heavy or unbalanced.<<ETX>>

Systems and Computers in Japan | 1997

Decoupled modified‐bit cache

Masafumi Takahashi; Nobuyuki Oba; Hiroaki Kobayashi; Tadao Nakamura

Cache memory not only allows one to decrease average memory access time, but also relieves bus traffic and decreases bus latency. In this paper, to further relieve bus traffic, a write-back cache memory (DMC, Decoupled Modified-bit Cache) is proposed that provides data modification at byte level. DMC supports selective write-back of only modified data to memory which contributes to further relief of bus traffic. To avoid considerable requirements for additional hardware in implementing DMC, a method is proposed to separate status bits that indicate data modification from the cache, allocating them as necessary. Benchmark tests with a variety of applications were performed to validate DMC. The results show that, with an additional 3% of the cache memory allocated as memory cells for status bits, memory usage intensity and data flow through the bus are reduced by about 35% and 10%, respectively, when compared to a conventional write-back cache.

Ibm Journal of Research and Development | 1991

Design choices for the TOP-1 multiprocessor workstation

Shigenori Shimizu; Nobuyuki Oba; Takeo Nakada; Moriyoshi Ohara; Atsushi Moriwaki

A snoopy-cache-based multiprocessor workstation called TOP-1 (TOkyo research Parallel processor-1) was developed to evaluate multiprocessor architecture design choices as well as to conduct research on operating systems, compilers, and applications for multiprocessor workstations. TOP-1 is a ten-way multiprocessor using the Intel 80386TM microprocessor chip and the Weitek WTL 1167TM floating-point coprocessor chip. It is currently running under a multiprocessor version of AIX®, which was also developed at the IBiy/l Tokyo Research Laboratory. Our research interest was focused on the design of an effective snoopy cache (all caches monitor all memory-cache traffic) system and the quantitative evaluation of its performance. One of the unique aspects of the TOP-1 design is that the cache supports four different, original snoopy protocols, which may coexist in the system. To evaluate the performance, we implemented a hardware statistics monitor that gathers statistical data. This paper focuses mainly on the TOP-1 cache design—its protocol, and its evaluation by means of the statistics monitor. Besides its cache design, TOP-1 has three other unique architectural features: two independently arbitrated 64-bit buses supported by two snoopy-cache controllers per processor, a communication and interruption mechanism for notifying other processors of asynchronous events, and an efficient arbitration mechanism to allow prioritized quasi-round-robin service with distributed control. These features are also described in detail.

Systems and Computers in Japan | 1987

An adaptive routing method for computer networks by electric‐circuit modeling

Nobuyuki Oba; Tadao Nakamura; Yoshiharu Shigei

The computer network and multiprocessor system have been developed and studied. They are based on a network composed of nodes containing processors, aiming at the improvement of performance by distributed processing as well as the improvement of reliability by resource distribution. To realize high system performance, adequate routing and flow controls are required in the communication of information among nodes. This paper proposes a new routing control scheme to be used in the packet communication in the computer network or multiprocessor system. The scheme is called potential routing, which models the computer network by an electric circuit, and the packet routing from the source node to the destination node is performed to the potential difference between the adjacent nodes. The node potential is determined first by Kirchhoffs law and is modified dynamically according to the traffic situation during the routing procedure, providing an adequate criterion for the routing. The proposed scheme has a feature in that ping-pong and loop phenomena, which cause traffic congestion, are not produced in principle. It was verified by simulation that the transmission delay is reduced when the traffic is high or unbalanced.

asia and south pacific design automation conference | 1998

Automated design of wave pipelined multiport register files

Kouji Takano; Takehito Sasaki; Nobuyuki Oba; Hiroaki Kobayashi; Tadao Nakamura

Recent high-performance microprocessors have two or more functional units (FUs) to exploit instruction-level parallelism. To make full use of this capability, multiport register files are generally used. However, conventional multiport register files need a considerable amount of hardware. This paper proposes a multiport register file scheme, which uses time-division multiplexing with wave pipelining in order to save the needed hardware resources. For adjusting propagation delay timings, we develop a tool which automatically inserts dummy buffers into combinatorial logic.

international phoenix conference on computers and communications | 1996

Decoupled modified-bit cache

Masafumi Takahashi; Nobuyuki Oba; Hiroaki Kobayashi; Tadao Nakamura

Cache memories are extensively used to reduce memory latency and memory bus traffic. This paper presents a cache memory control mechanism, called decoupled modified-bit cache (DMC), which manages the clean/modified state of cached data in units of bytes to further reduce the bus traffic. Unlike conventional cache memories, the DMC has modified-bit arrays that are separated from a cache tag memory, and uses the modified-bits on demand. The DMC allows a non-fetch allocation on a write miss, cache line fills and replacements in units of bytes, and eliminates unnecessary data transfers. Our simulations with uni-processor and multiprocessor applications indicate that, with 3% more hardware, the DMC reduces the bus traffic and the number of transactions to between 10% and 40% of the levels in a conventional write-back cache memory. It also has strong potential for use in bus-interconnected multiprocessor systems, where the bus traffic dominates the system performance.

Explore More