Takanobu Baba
Utsunomiya University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Takanobu Baba.
conference on high performance computing (supercomputing) | 1990
Takanobu Baba; Tsutomu Yoshinaga; Tohru Iijima; Yoshifumi Iwamoto; Masahiro Hamada; Mitsuru Suzuki
A-NET is a parallel object-oriented total architecture for highly parallel computation. Starting with a computation model, the authors describe parallel constructs of the designed language, called A-NETL; the A-NETL-oriented machine instruction set architecture; the hardware organization of a node processor, which consists of a 40-bit processing element and a router; and a local operating system on each of the node processors. Statistics for the designed language and the machine are derived from experimental results.<<ETX>>
conference on high performance computing (supercomputing) | 1990
Takanobu Baba; Yoshifumi Iwamoto; Tsutomu Yoshinaga
For mapping a task graph to a processor graph, this strategy evaluates several functions that represent some intuitively feasible properties of the graphs. Several strategies are defined to guide the mapping process, utilizing the indicated values. An allocation system has been designed and implemented based on this strategy. The experimental results indicate the following: the system can yield 2.14 times better allocation than an arbitrary allocation; it is difficult to select a single strategy capable of providing the best solutions for a wide range of task-processor combinations; and the computation time of the allocator is reasonable. The effect of task and processor topology combinations on the allocation results is also discussed.<<ETX>>
Optics Letters | 2014
Boaz Jessie Jackin; Hiroaki Miyata; Takeshi Ohkawa; Kanemitsu Ootsu; Takashi Yokota; Yoshio Hayasaki; Toyohiko Yatagai; Takanobu Baba
A method has been proposed to reduce the communication overhead in computer-generated hologram (CGH) calculations on parallel and distributed computing devices. The method uses the shifting property of Fourier transform to decompose calculations, thereby avoiding data dependency and communication. This enables the full potential of parallel and distributed computing devices. The proposed method is verified by simulation and optical experiments and can achieve a 20 times speed improvement compared to conventional methods, while using large data sizes.
Journal of the Physical Society of Japan | 2006
Takashi Yokota; Kanemitsu Ootsu; Fumihito Furukawa; Takanobu Baba
A large-scale interconnection network is a complex system that comprises numerous independent switching elements (routers). Its performance is proportional to the offered traffic load if the load i...
automation, robotics and control systems | 2008
Takashi Yokota; Kanemitsu Ootsu; Takanobu Baba
Predictors essentially predicts the most recent events based on the record of past events, history. It is obvious that prediction performance largely relies on regularity-randomness level of the history. This paper concentrates on extracting effective information from branch history, and discusses expected performance of branch predictors. For this purpose, this paper introduces entropy point-of-views for quantitative characterization of both program behavior and prediction mechanism. This paper defines four new entropies from different viewpoints; two of them are independent of prediction methods and the others are dependent on predictor organization. These new entropies are useful tools for analyzing upper-bound of prediction performance. This paper shows some evaluation results of typical predictors.
Proceedings of the 2007 workshop on Experimental computer science | 2007
Takashi Yokota; Kanemitsu Ootsu; Takanobu Baba
Predictors are inherent components of state-of-the-art microprocessors. Branch predictors are discussed actively from diverse perspectives. Performance of a branch predictor largely depends on the dynamic behavior of the executing program. Nevertheless, we have no effective metrics to represent the nature of program behavior quantitatively. In this paper, we introduce an information entropy idea to represent program behavior and branch predictor performance. Through simple application of Shannons information entropy, we introduce new entropy, Branch History Entropy, which quantitatively represents the regularity level of program behavior. We show that the entropy also represents an index of prediction performance that is independent of prediction mechanisms. We further discuss branch predictor performance from a stereoscopic view of their typical organization. We propose two entropies: Table Reference Entropy and Table Entry Entropy. The former represents an unbalanced level of references of table entries. The latter offers the maximum expectation in prediction performance. We evaluated the proposed three entropies and prediction performance in various situations. Artificially generated branch patterns, as preliminary experiments, show an overview of the entropies and prediction performance. Subsequently, we present a comparison to the 2nd Championship Branch Predictor competition results and show the high potential of the proposed entropy. Finally, we present an actual view of our entropies and prediction performance as application results to SPEC CPU2000 benchmarks.
Systems and Computers in Japan | 1988
Masayuki Inagawa; Takanobu Baba; Katsuhiro Yamazaki; Kenzo Okuda; Ken Ishikawa
The unification is a basic component of Prolog processing. However, its parallel processing has not been well studied because the number of arguments, which corresponds to the degree of the unification parallelism, is small, and a consistency check operation is necessary after a parallel unification operation. On these issues, we have implemented the following ideas: (1) enhancing the degree of parallelism by decomposing a compound term into a functor and the arguments at compile-time; (2) allocating decomposed unification processing to multiple processor units (PUs) at run-time; (3) decreasing the number of consistency checks by the compile-time clustering and reducing the overhead by embedding the consistency check operations into the unification processing; and (4) stopping the operations of the other processors if the unification fails. To clarify the effect, we have developed and evaluated a Prolog processor on a multiprocessor system. The results show that statistically: (1) the decomposition of compound terms makes the number of arguments 3.2 on the average even after clustering, and that dynamically, (1) the unification parallelism performs 41 percent speed up, and the effect is evident at a small number of processors; (2) the compile-time clustering makes the consistency check unnecessary; (3) the stop operation of processors, running in parallel, attains 0.5 – 6 percent (and 10 percent for some problems) performance improvement; and (4) the processing of clause head occupies 60 – 70 percent of dynamic microsteps and is an important object of parallel processing.
field-programmable custom computing machines | 2002
Takashi Yokota; Masamichi Nagafuchi; Yoshito Mekada; Tsutomu Yoshinaga; Kanemitsu Ootsu; Takanobu Baba
Concentration index filter is a kind of spatial filters of images, and its typical application is diagnosis from medical images. This paper presents a dedicated computing engine for concentration index filtering. Original algorithm is modified to extract full parallelism and data width is optimized for maximizing clock speed and minimizing hardware scale. Evaluation results reveal that the system runs 100 times faster than current workstation and enables real-time diagnosis.
systems man and cybernetics | 1999
Naoki Kohata; T. Yamaguchi; Makoto Takahide; Takanobu Baba; Hideki Hashimoto
Proposes a chaotic evolutionary computation algorithm instead of a conventional genetic algorithm for such intelligent agents as welfare robots which assist humans. This evolutionary computation is realized by applying chaotic retrieval and soft DNA (soft computing oriented data driven functional scheduling architecture) on associative memories. We apply this evolutionary computation to multi-agent robots which move abreast and intelligent transport systems. Essentially, the process of this evolutionary computation is parallel processing. Therefore, we implement its parallel processing algorithm on A-NET (actors network) parallel object-oriented computer, and show the usefulness of parallel processing for the proposed evolutionary computation.
parallel and distributed computing: applications and technologies | 2009
Yuanming Zhang; Kanemitsu Ootsu; Takashi Yokota; Takanobu Baba
Multi-core processors have emerged as predominant architecture. Parallelizing applications into multithreaded ones executing on multiple cores is the key to achieving performance improvements. Recently proposed pipelined multithreading (PMT) techniques have shown great promise to parallelizing general applications. However, significant inter-core communication overheads limit the potential performance and hinder the wide commercial use. While dedicated inter-core communication mechanism has been proposed, it demands chip redesign effort, costs so much and needs extensions to ISA. Software queues avoid these problems. In this paper, we propose a clustered software queue technique, which applies a new clustered communication mechanism, to minimize the communication overheads from the average standpoint. Our research shows that very low average communication overheads (ACOs) can be achieved by sacrificing a certain amount of parallelisms. The principle of clustered communication mechanism and how to reduce the ACOs with it are presented in detail. A concurrent lock-free clustered software queue algorithm is given and then evaluated on commodity multi-core processors. Experimental results show that the communication performance of clustered software queue is over 10x faster than that of conventional software queue, and much higher PMT performances of real applications are achieved.