Is this you? Create Your Porfile

Keisuke Iwai

National Defense Academy of Japan

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Keisuke Iwai is active.

Explore More

Publication

Featured researches published by Keisuke Iwai.

international conference on networking and computing | 2010

AES Encryption Implementation on CUDA GPU and Its Analysis

Keisuke Iwai; Takakazu Kurokawa; Naoki Nisikawa

GPU has a good performance ratio and exhibits the capability for applications with high level of parallelism despite its inexpensive price. The support of integer and logical instructions on the latest generation of GPU makes us to implement cipher algorithms easier with the same instructions. However the decisions such as parallel processing granularity or memory allocation place imposed heavy burden on programmers. For this reason this paper shows the results of several experiments to study relation between memory allocation style of AES parameters and granularity as the parallelism exploited from AES encoding process using CUDA with NVIDIA Geforce GTX285. The result of experiments cleared up that the 16Byte/thread granularity had the highest performance and it achieved approximately 35Gbps throughput. Moreover, implementation with overlapping between processing and data transfer brought up 22.5Gbps throughput including data transfer time. Also, it cleared up that it is important to decide granularity and memory allocation to effective processing in AES encryption on GPU.

field programmable logic and applications | 2000

Dataflow Partitioning and Scheduling Algorithms for WASMII, a Virtual Hardware

Atsushi Takayama; Yuichiro Shibata; Keisuke Iwai; Hideharu Amano

This paper presents a new dataflow graph partitioning algorithm for a data driven virtual hardware system called WASMII. The algorithm divides a dataflow graph into multiple subgraphs so as not to cause a deadlock. Then the subgraphs are translated into an FPGA configuration and executed on WASMII in a time-multiplexed manner. The experimental results show the proposed algorithms can achieve 13% to 39% improvement of execution performance compared to other existing graph partitioning algorithms at the most.

international symposium on computing and networking | 2013

Throughput and Power Efficiency Evaluations of Block Ciphers on Kepler and GCN GPUs

Naoki Nishikawa; Keisuke Iwai; Hidema Tanaka; Takakazu Kurokawa

Computer systems with GPUs are expected to become a strong methodology for high-speed encryption processing. Moreover, power consumption is a primary deterrent for data center security on cloud services and handheld devices such as smartphones and tablet PCs. On the other hand, GPU vendors currently announce their future roadmaps of GPU architecture development, Nvidia Corp. accentuates Kepler architecture and AMD Corp. does GCN architecture. Thats why in this paper we evaluated throughput and power efficiency of three 128-bit block ciphers on GPUs with recent Nvidia Kepler and AMD GCN architectures. In accordance with our experiments, whereas the throughput and per-watt throughput of AES-128 on Radeon HD 7970 (2048 cores) with GCN architecture is respectively extremely high 219.9 Gbps and 1310.7 Mbps/W, those on Geforce GTX 680 (1536 cores) with Kepler architecture be respectively considerably low 68.6 Gbps and 471.7 Mbps/W. Next, in order to investigate this mysterious experimental result, we used our micro-benchmark suites. They cleared up the reason, arithmetic logical instructions are required by encryption processing but are eliminated from some of the processing cores in Kepler architecture, unlike GCNs.

icpp workshops on collaboration and mobile computing | 1999

Implementation and evaluation of the compiler for WASMII, a virtual hardware system

A. Takayama; Yuichiro Shibata; Keisuke Iwai; Hidenori Miyazaki; K. Higure; Xiao Ping Ling

WASMII is a reconfigurable system with data driven control which executes programs written in dataflow graphs. In WASMII, a target dataflow graph is divided into some subgraphs and executed on a programmable device called MPLD which is an extended FPGA. By replacing the configuration data on the MPLD, large scale programs which exceed the limit of hardware resources can be efficiently executed. As a software environment of WASMII, a compiler which translates a program written by a user in a high-level language into a corresponding dataflow graph and its HDL description is required. In this paper we show the design and implementation of the compiler for WASMII which generates the VHDL description from an input program. Compilation and execution results of a test program on a reconfigurable testbed called FLEMING are also shown.

international conference on algorithms and architectures for parallel processing | 2012

Power efficiency evaluation of block ciphers on GPU-integrated multicore processor

Naoki Nishikawa; Keisuke Iwai; Takakazu Kurokawa

Computer systems with discrete GPUs are expected to become the standard methodology for high-speed encryption processing, but they require large amounts of power consumption and are inapplicable to embedded devices. Therefore, we have specifically examined a new heterogeneous multicore processor with CPU---GPU integration architecture. We first implemented three 128-bit block ciphers (AES, Camellia, and SC2000) from several symmetric block ciphers in an e-government recommended ciphers list by CRYPTREC in Japan using OpenCL on AMD E-350 APU with CPU---GPU integration architecture and two traditional systems with discrete GPUs. Then we evaluated their respective power efficiencies. Result showed that performance per watt of AES-128 on the APU including 80 cores were 743.0 Mbps/W and 44.0 % increases compared with those on a system equipped with a discrete AMD Radeon HD 6770 including 800 cores. This paper is the first to describe a study to evaluate the per-watt performance of block ciphers on GPUs.

network and system security | 2017

Implementation of bitsliced AES encryption on CUDA-Enabled GPU

Naoki Nishikawa; Hideharu Amano; Keisuke Iwai

Table-based implementations have been mainly reported in research related to high-performance AES on GPUs, in which tables are stored in the shared memory. On the other hand, this kind of implementations is subject to timing attacks, due to the latency required to access tables in the shared memory. Thanks to the increasing number of registers every year, GPU programming has enabled memory intensive applications such as bitsliced AES algorithm to be easily implemented. However, researches of implementation of bitsliced AES algorithm on GPU have not so far been conducted sufficiently in terms of several parameters. For this reason, in this paper, we present an implementation of bitsliced AES encryption on CUDA-enabled GPU with several parameters, especially focusing on three kinds of parallel processing granularities. According to the conducted experiments, the throughput of bitsliced AES-ECB encryption with Bs64 granularity achieves 605.9 Gbps on Nvidia Tesla P100-PCIe resulting in an enhancement of 8.0% when compared to the table-based implementation.

international conference on cyber security and cloud computing | 2015

Computational Security Evaluation of Light-Weight Block Cipher Against Integral Attack by GPGPU

Haruhisa Kosuge; Hidema Tanaka; Keisuke Iwai; Takakazu Kurokawa

Integral distinguisher is the main factor of integral attack. In the conventional search strategy of integral distinguisher (ID), there are two steps. In the first step, first order ID is obtained. In the second step, first order ID is extended by increasing the order. We find it is problematic to apply the conventional strategy for Feistel ciphers whose number of sub blocks N is large such as TWINE and LBlock (N = 16). To solve the problem, we propose new search strategy which has large search scope and feasibility in realistic computational condition. By the reduction of the computational complexity, it is reduced from O((nN)×(2mn)) to O(N×2mn). And for the acceleration of the experiment, we use GPGPU (general-purpose computing on graphics processing units) platform. By using GPGPU platform, we can test substantially higher order ID than existing CPU platform. We execute computer experiment to discover the precise fifteenth order ID of TWINE and LBlock by proposal strategy. As a result, we find new fifteenth order ID which has 8 balanced sub blocks (32-bit) after 15-round encryption both in TWINE and LBlock. These results are the most precise evaluatiPon of TWINE and LBlock.

international conference on information systems security | 2013

Information Theoretical Analysis of Side-Channel Attack

Hiroaki Mizuno; Keisuke Iwai; Hidema Tanaka; Takakazu Kurokawa

This paper presents a proposal of a new information-the-oretical evaluation method for the side-channel resistance. This method provides some benefits: 1 It provides a rationale for evaluation. 2 Moreover, it enables numerical execution of mutual evaluation among countermeasures of several kinds. In an evaluation of side-channel resistance, the feasibility of attack, such as the number of observations or experimental time for revealing secrets, is discussed. In conventional methods, these numbers are examined experimentally using actual attacks. Therefore such experimental methods present several problems: 1 the rationale of the numbers used in evaluation is poor; 2 executing mutual evaluation is difficult; and 3 some experimental constraints exist such as time, cost, and equipment specifications. Our proposed method regards side-channel attack as a communication channel model. Therefore, this method estimates its channel capacity as the upper bound of the amount of leakage information. As described herein, we apply this approach to correlation power analysis against implementations of stream cipher Enocoro-128 v2 and underscore its effectiveness.

international conference on networking and computing | 2012

A Correlation Power Analysis Countermeasure for Enocoro-128 v2 Using Random Switching Logic

Hiroaki Mizuno; Keisuke Iwai; Hidema Tanaka; Takakazu Kurokawa

This paper shows Correlation Power Analysis (CPA) on Enocoro-128 v2 and application effectiveness of a countermeasure. Enocoro is a hardware-oriented stream cipher developed by Hitachi, Ltd. Previous work shows that Enocoro-128 v2 has a weakness against CPA. Another work shows that a countermeasure using algorithm level masking, threshold implementation, is effective for CPA. This paper proposes application of a gate level masking, Random Switching Logic (RSL) as another countermeasure for Enocoro-128 v2. We implement the circuit of Enocoro-128 v2 using RSL on SASEBO-GII (Side-Channel Attack Standard Evaluation Board), and evaluate its CPA countermeasure ability. As a result, we confirmed that it is impossible to reveal secret key by measurement using less than 100,000 power consumption waveforms.

international conference on cyber security and cloud computing | 2015

Integral Attack on Reduced-Round Rectangle

Haruhisa Kosuge; Hidema Tanaka; Keisuke Iwai; Takakazu Kurokawa

RECTANGLE is a 64-bit block cipher with 80 and 128-bit key length proposed by Zhang et al(Lightweight Cryptography Workshop 2015). Integral attack is one of the typical evaluation tools of block cipher. The designers showed 7-round integral distinguisher. On the other hand, we find 8-round integral distinguisher which has balanced columns by our proposal search method of integral distinguisher. In this paper, we present the first integral attack on reduced-round RECTANGLE. Based on 8-round distinguisher, we can attack 12-round RECTANGLE-128 with computational complexity 2^109.98 with partial sum technique. Also, we can attack 10-round RECTANGLE-80 with computational complexity 270.08.

Explore More