Is this you? Create Your Porfile

Hirofumi Sakane

National Institute of Advanced Industrial Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hirofumi Sakane is active.

Explore More

Publication

Featured researches published by Hirofumi Sakane.

field-programmable logic and applications | 2008

Bitstream encryption and authentication with AES-GCM in dynamically reconfigurable systems

Yohei Hori; Akashi Satoh; Hirofumi Sakane; Kenji Toda

A high-speed and secure dynamic partial reconfiguration (DPR) system is realized with AES-GCM that guarantees both confidentiality and authenticity of FPGA bitstreams. In DPR systems, bitstream authentication is essential for avoiding fatal damage caused by unintended bitstreams. An encryption-only system can prevent bitstream cloning and reverse engineering, but cannot prevent erroneous or malicious bitstreams from being configured. Authenticated encryption is a relatively new concept that provides both message encryption and authentication, and AES-GCM is one of the latest authenticated encryption algorithms suitable for hardware implementation. We implemented the AES-GCM-based DPR system targeting the Virtex-5 device on an off-the-shelf board, and evaluated its throughput and hardware resource utilization. For comparison, we also implemented AES-CBC and SHA-256 modules on the same device. The experimental results showed that the AES-GCM-based system achieved higher throughput with less resource utilization than the AES/SHA-based system. The AES-GCM-module achieved more than 1 Gbps throughput and the entire system achieved about 800 Mbps throughput with reasonable resource utilization. This paper clarifies the advantage of using AES-GCM for protecting DPR systems.

international conference on parallel architectures and compilation techniques | 1996

Identifying the capability of overlapping computation with communication

Andrew Sohn; Jui Ku; Yuetsu Kodama; Mitsuhisa Sato; Hirofumi Sakane; Hayato Yamana; Shuichi Sakai; Yoshinori Yamaguchi

Overlapping computation with communication is central to obtaining high performance on distributed-memory multiprocessors. This report explicates the overlapping capability of two distributed-memory multiprocessors: the EM-X and IBM SP-2. The well-known bitonic sorting algorithm is selected for experiments. Various message sizes are used to determine when, where, how much and why overlapping takes place. Experimental results indicate that both multiprocessors would yield up to 30% to 40% overlap of communication time when the message size is approximately 1K integers. EM-X is found to be message-size insensitive yielding high overlap for various message sizes, while SP-2 was effective for the window of message size 512 to 2K integers.

IEICE Transactions on Information and Systems | 2008

A Secure Content Delivery System Based on a Partially Reconfigurable FPGA

Yohei Hori; Hiroyuki Yokoyama; Hirofumi Sakane; Kenji Toda

We developed a content delivery system using a partially reconfigurable FPGA to securely distribute digital content on the Internet. With partial reconfigurability of a Xilinx Virtex-II Pro FPGA, the system provides an innovative single-chip solution for protecting digital content. In the system, a partial circuit must be downloaded from a server to the client terminal to play content. Content will be played only when the downloaded circuit is correctly combined (= interlocked) with the circuit built in the terminal. Since each circuit has a unique I/O configuration, the downloaded circuit interlocks with the corresponding built-in circuit designed for a particular terminal. Thus, the interface of the circuit itself provides a novel authentication mechanism. This paper describes the detailed architecture of the system and clarify the feasibility and effectiveness of the system. In addition, we discuss a fail-safe mechanism and future work necessary for the practical application of the system.

acm symposium on parallel algorithms and architectures | 1997

Fine-grain multithreading with the EM-X multiprocessor

Andrew Sohn; Yuetsu Kodama; Jui Ku; Mitsuhisa Sato; Hirofumi Sakane; Hayato Yamana; Shuichi Sakai; Yoshinori Yamaguchi

Multithreading aims to tolerate latency by overlapping communication with computation. This report explicates the multithreading capabilities of the EM-X distributed-memory multiprocessor through empirical studies. The EM-X provides hardware supports for fine-grain multithreading, including a by-passing mechanism for direct remote reads and writes, hardware FIFO thread scheduling, and dedicated instructions for generating fixedsized communication packets. Bitonic sorting and Fast Fourier Transform are selected for experiments. Parameters that characterize the performance of multi threading are investigated, including the number of threads, the number of thread switches, the run length, and the number of remote reads. Experimental results indicate that the best communication performance occurs when the number of threads is two to four. FIW yielded over 95% overlapping due to a large amount of computation and communication parallelism across threads. Even in the absence of thread computation parallelism, multithreading helps overlap over 3570 of the communication time for bitonic sorting.

international conference on supercomputing | 1995

A macrotask-level unlimited speculative execution on multiprocessors

Hayato Yamana; Mitsuhisa Sato; Yuetsu Kodama; Hirofumi Sakane; S. Sakai; Yoshinori Yamaguchi

The purpose of this paper is to propose a new fast execution scheme of FORTRAN programs. The proposed scheme enables the fast initiation of macrotask when ita data dependence are satisfied even if the control flow has not been reached. The previous schemes to parallelize a program including conditional branches have a number of problems 1) Though the theoretical speedup ratio is up to N when N conditional branches are jumped on either a VLIW or a superscalstr machine, the number of N is restricted up to the number of ALUs on a chip, 2) Since conventional control schemes use a few processors to control macrotasks, the overhead to control them is large. The proposed scheme solves these problems 1) The proposed scheme enables speculative execution between coarse grain tasks, i.e., macro tasks, on multiprocessors by jumping many conditional branches, 2) A distributed control scheme is proposed and implemented on the EM-4 multiprocessor to decrease the control overhead of macrotasks. Preliminary evaluations show that the control overhead of the proposed scheme is smaller than that of the other control schemes. Moreover, it is confirmed that the distributed control can be implemented by using software when the average macrotssk execution time is larger than 14.4ps on the EM-4 multiprocessor.

international symposium on parallel architectures algorithms and networks | 1994

Message-based efficient remote memory access on a highly parallel computer EM-X

Yuetsu Kodama; Hirofumi Sakane; Mitsuhisa Sato; Shuichi Sakai; Yoshinori Yamaguchi

Communication latency is central to multiprocessor design. This report presents the design principles of EM-X multiprocessor towards tolerating communication latency. Multi-threading principle is built in the EM-X to overlap communication and computation for latency tolerance. In particular, we present two types of hardware support for remote memory access: (1) priority-based packet scheduling for thread invocation, and (2) direct remote memory access mechanism. The priority-based scheduling policy extends a FIFO ordered thread invocation policy to adapt to different computational needs. The direct remote memory access based on non-preemptive thread execution is designed to overlap remote memory operations while executing threads. We give two examples to explain our approach. The 80-processor prototype of EM-X is currently being fabricated and is expected to be operational in the near future. Preliminary evaluation indicates that the EM-X can effectively overlap computation and communication, toward tolerating communication latency for high performance parallel computing.<<ETX>>

international conference on parallel architectures and compilation techniques | 1997

Parallel execution of radix sort program using fine-grain communication

Yuetsu Kodama; Hirofumi Sakane; Koike Hanpei; Mitsuhisa Sato; Shuichi Sakai; Yoshinori Yamaguchi

The report presents empirical results of fine-grain communication on the 80-processor EM-X distributed-memory multiprocessor. EM-X has hardware support for low latency, high throughput fine-grain communication-this hardware support includes packet generation integrated into the instruction execution pipeline for single-cycle communication overhead, direct memory access for remote references, and rapid context switching for latency tolerance. The authors study the fine-grain communication performance of integer radix sort, a code with irregular communication, on EM-X, and compare it to the Fujitsu AP1000+ and the Cray Server CS6400. The experimental results indicate that EM-X achieves high throughput and low overhead for fine-grain communication. Whereas EM-Xs communication performance scales perfectly as one increases the number of processors, other coarse-grain message-passing machines exhibit fluctuation and performance degradation for larger configurations due to network contention.

international acm sigir conference on research and development in information retrieval | 1998

Fast speculative search engine on the highly parallel computer EM-X

Hayato Yamana; Hanpei Koike; Yuetsu Kodama; Hirofumi Sakane; Yoshinori Yamaguchi

This paper presents the new World Wide Web search engine called “Fast Speculative Search Engine JJ that uses speculative execution on multiprocessor systems to shorten the total time to retrieve information from the WWW. The proposed search engine predicts the user’s next queries and initiates the searches with the predicted queries before receiving them to accelerate narrowing the search space. This kind of speculation is classified as the data value speculation [2], which are mainly studied as the scheme to extract the instruction level parallelism in a processor. However, there have been no systems that adopt such speculation on multiprocessor systems. We have implemented the fast speculative search engine using the data speculation on the EM-X[4] which is shown in Fig.l(l). The EM-X, which consists of 80 processors, is a highly parallel computer which can tolerate communication latency by using low latency communication and multithreading. The peak performance of the EM-X is 1.6 GIPS / 3.2GFLOPS and the point to point network throughput is 37.2 MB/s. On the EM-X,

field-programmable technology | 2007

A Secure Digital Content Delivery System Based on Partially Reconfigurable Hardware

Yohei Hori; Hiroyuki Yokoyama; Hirofumi Sakane; Kenji Toda

We developed an FPGA-based content delivery system to securely distribute digital content on the Internet. With partial reconfigurability of a Xilinx Virtex-Il Pro FPGA, the system provides a flexible single-chip solution for protecting digital content. In the system, a partial circuit must be downloaded from a server to the client terminal to play content. Content will be played if and only if the downloaded circuit is correctly combined (= interlocked) with the circuit built in the terminal. Since each circuit has a unique I/O configuration, the downloaded circuit interlocks with the corresponding built-in circuit designed for a particular terminal. Thus, the interface of the circuit itself provides a novel authentication mechanism. In the present paper, we describe the detailed architecture of the proposed system and clarify the feasibility and effectiveness of this system experimentally using a single-chip partial reconfiguration. In addition, we discuss the fail-safe mechanisms, partially reconfigurable FPGA architecture, and future research necessary for the practical application of the system.

international parallel processing symposium | 1997

Experience with fine-grain communication in EM-X multiprocessor for parallel sparse matrix computation

Mitsuhisa Sato; Yuetsu Kodama; Hirofumi Sakane; Hayato Yamana; Shuichi Sakai; Yoshinori Yamaguchi

Sparse matrix problems require a communication paradigm different from those used in conventional distributed-memory multiprocessors. We present in this paper how fine-grain communication can help obtain high performance in the experimental distributed-memory multiprocessor, EM-X, developed at ETL, which can handle fine-grain communication very efficiently. The sparse matrix kernel, Conjugate Gradient, is selected for the experiments. Among the steps in CG is the sparse matrix vector multiplications we focus on in the study. Some communication methods are developed for performance comparison, including coarse-grain and fine-grain implementations. Fine-grain communication allows exact data access in an unstructured problem to reduce the amount of communication. While CG presents bottlenecks in terms of a large number of fine-grain remote reads, the multithreaded principles of execution is so designed to tolerate such latency. Experimental results indicate that the performance of fine-grain read implementation is comparable to that of coarse-grain implementation on 64 processors. The results demonstrate that fine-grain communication can be a viable and efficient approach to unstructured sparse matrix problems on large-scale distributed-memory multiprocessors.

Explore More