Hidetsugu Irie | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hidetsugu Irie is active.

Explore More

Publication

Featured researches published by Hidetsugu Irie.

architectural support for programming languages and operating systems | 2012

FLAT: a GPU programming framework to provide embedded MPI

Takefumi Miyoshi; Hidetsugu Irie; Keigo Shima; Hiroki Honda; Masaaki Kondo; Tsutomu Yoshinaga

For leveraging multiple GPUs in a cluster system, it is necessary to assign application tasks to multiple GPUs and execute those tasks with appropriately using communication primitives to handle data transfer among GPUs. In current GPU programming models, communication primitives such as MPI functions cannot be used within GPU kernels. Instead, such functions should be used in the CPU code. Therefore, programmer must handle both GPU kernel and CPU code for data communications. This makes GPU programming and its optimization very difficult. In this paper, we propose a programming framework named FLAT which enables programmers to use MPI functions within GPU kernels. Our framework automatically transforms MPI functions written in a GPU kernel into runtime routines executed on the CPU. The execution model and the implementation of FLAT are described, and the applicability of FLAT in terms of scalability and programmability is discussed. We also evaluate the performance of FLAT. The result shows that FLAT achieves good scalability for intended applications.

design, automation, and test in europe | 2007

Utilization of SECDED for soft error and variation-induced defect tolerance in caches

Luong Dinh Hung; Hidetsugu Irie; Masahiro Goshima; Shuichi Sakai

Combination of SECDED with a redundancy technique can effectively tolerate a high variation-induced defect rate in future processes. However, while a defective cell in a block can be repaired by SECDED, the block becomes vulnerable to soft errors. This paper proposes a technique to deal with the degraded resilience against soft errors. Only clean data can be stored in defective blocks of a cache. This constraint is enforced through selective write-through mechanism. An error occurring in a defective block can be detected and the correct data can be obtained from the lower level caches.

pacific rim international symposium on dependable computing | 2006

Base Address Recognition with Data Flow Tracking for Injection Attack Detection

Satoshi Katsunuma; Hiroyuki Kurita; Ryota Shioya; Kazuto Shimizu; Hidetsugu Irie; Masahiro Goshima; Shuichi Sakai

Vulnerabilities such as buffer overflows exist in some programs, and such vulnerabilities are susceptible to address injection attacks. The input data tracking method, which was proposed before, prevents I-data, which are the data derived from the input data, being used as addresses. However, the rules to determine address injection attacks are vague, which produces many false-positives and false-negatives in detection results. Generally, the data used as an address consist of a base address and an address offset. We propose an architectural technique to prevent I-data overwriting B-data, which are the data used as base addresses in this paper. It dynamically recognizes the I-data and the B-data. Address injection is detected if I-data that are not B-data are used as addresses. We implemented the proposed technique on a Pentium-based Bochs emulator and investigated its detection capability. We believe that the technique is the most accurate injection detection technique proposed thus far

IEICE Transactions on Electronics | 2008

Ultra Dependable Processor

Shuichi Sakai; Masahiro Goshima; Hidetsugu Irie

This paper presents the processor architecture which provides much higher level dependability than the current ones. The features of it are: (1) fault tolerance and secure processing are integrated into a modern superscalar VLSI processor; (2) light-weight effective soft-error tolerant mechanisms are proposed and evaluated; (3) timing errors on random logic and registers are prevented by low-overhead mechanisms; (4) program behavior is hidden from the outer world by proposed address translation methods; (5) information leakage can be avoided by attaching policy tags for all data and monitoring them for each instruction execution; (6) injection attacks are avoided with much higher accuracy than the current systems, by providing tag trackings; (7) the overall structure of the dependable processor is proposed with a dependability manager which controls the detection of illegal conditions and recovers to the normal mode; and (8) an FPGA-based testbed system is developed where the system clock and the voltage are intentionally varied for experiment. The paper presents the fundamental scheme for the dependability, elemental technologies for dependability and the whole architecture of the ultra dependable processor. After showing them, the paper concludes with future works.

international conference on networking and computing | 2011

CCCPO: Robust Prefetcher Optimization Technique Based on Cache Convection

Hidetsugu Irie; Takefumi Miyoshi; Goki Honjo; Kei Hiraki; Tsutomu Yoshinaga

One of the significant issues of processor architecture is to overcome memory latency. Prefetching can greatly improve cache performance, however, it has the drawback of cache pollution unless its aggressiveness is properly set. Although several techniques for prefetcher throttling have been proposed which use accuracy as a metric, their robustness were not sufficient due to the variations between program working set sizes and cache capacities. In this paper, we revisit cache behavior with the viwepoint of data lifetime in a cache with prefetching. Based on this observation Cache-Convection-Control-based Prefetch Optimization (CCCPO) is proposed, which exploits the characteristics of cache line reuse and controls the prefetcher aggressiveness. Evaluation results showed that this novel approach achieved 4.6% improvement against the most recent prefetcher throttling algorithms in the geometric mean of SPEC CPU 2006 benchmark suite with 256KB LLC.

2014 IEEE 8th International Symposium on Embedded Multicore/Manycore SoCs | 2014

An FPGA-Based Tightly Coupled Accelerator for Data-Intensive Applications

Masato Yoshimi; Ryu Kudo; Yasin Oge; Yuta Terada; Hidetsugu Irie; Tsutomu Yoshinaga

Computation beside a data source plays an important role in achieving a high performance with low energy consumption in Big Data processing. In contrast to that of a conventional workload, the processing of Big Data frequently requires that a massive amount of data in distributed storage be scanned. A key technique for reducing energy-consuming processor loads is to install a reconfigurable accelerator that is tightly coupled to a computational resource with interfaces. The accelerator is capable of configuring application-specific hardware modules to allow some logical and arithmetic operations for data stream transmission between interfaces, as well as the offloading of control protocols for communication with other computing nodes or storage. In this paper, an FPGA-based accelerator, which is directly attached to DRAM, the network, and storage, is proposed in order to realize an energy efficient computing system. A simple application that counts the words appearing in the data is implemented to evaluate a prototype system. As the accelerator outperforms by 80.66 to 429 times similar applications executed on an SSD-based Hadoop framework, we confirm that the accelerators utilization for Big Data processing is beneficial.

international symposium on computing and networking | 2013

An Efficient and Scalable Implementation of Sliding-Window Aggregate Operator on FPGA

Yasin Oge; Masato Yoshimi; Takefumi Miyoshi; Hideyuki Kawashima; Hidetsugu Irie; Tsutomu Yoshinaga

This paper presents an efficient and scalable implementation of an FPGA-based accelerator for sliding-window aggregates over disordered data streams. With an increasing number of overlapping sliding-windows, the window aggregates have a serious scalability issue, especially when it comes to implementing them in parallel processing hardware (e.g., FPGAs). To address the issue, we propose a resource-efficient, scalable, and order-agnostic hardware design and its implementation by examining and integrating two key concepts, called Window-ID and Pane, which are originally proposed for software implementation, respectively. Evaluation results show that the proposed implementation scales well compared to the previous FPGA implementation in terms of both resource consumption and performance. The proposed design is fully pipelined and our implementation can process out-of-order data items, or tuples, at wire speed up to 200 million tuples per second.

international conference on networking and computing | 2010

CODIE: Continuation-Based Overlapping Data-Transfers with Instruction Execution

Takefumi Miyoshi; Kenji Kise; Hidetsugu Irie; Tsutomu Yoshinaga

In this paper, a runtime system termed CODIE is proposed to execute sequential part of programs efficiently in a many-core architecture. All independent processing elements in a many-core architecture use a shared network and off-chip memory. Therefore, contentions on such resources substantially degrade the system performance. On the CODIE system, when a cache miss occurs, the system first initiates a data transfer operation. Next, the system creates a continuation of executing instructions related to the missing data. The continuation is stored into the buffer, and the instructions not related to the missing data are executed subsequently. In other words, data transfer and instruction executions can be performed simultaneously. In this way, the effect of the overhead of the updating cache entry (increased by memory access contention) is tolerated. The results of evaluation show that the proposed CODIE system realizes a 1.86x speed up of the execution of the sequential write/read program on the M-Core architecture at 36 cores and a 1.97x speed up of the execution of the blacks holes(from PARSEC benchmark suite) on the Cell/BE processor with 6 SPEs.

international symposium on computing and networking | 2014

Accelerating OLAP Workload on Interconnected FPGAs with Flash Storage

Masato Yoshimi; Ryu Kudo; Yasin Oge; Yuta Terada; Hidetsugu Irie; Tsutomu Yoshinaga

The data volume used in online analytical processing (OLAP) applications is rapidly increasing because of the increasing popularity of various Web services and emerging sensor technologies. Since the amount of accumulated data is frequently too large to store in an in-memory database, it is necessary to have a secondary storage to store such big data. On the basis of this premise, the most important factor to determine the performance of data-intensive applications is to reduce the number and the size of the data transfers between the secondary storage and the main memory. To achieve an energy-efficient computing environment, offloading a user-defined function (UDF) onto interconnected FPGA-boards that equip high-speed storage is effective due to FPGAs performance ratio of operations per I/O. In this paper, we focus on the aggregate operations that are popularly used UDF in OLAP, and propose an acceleration scheme utilizing interconnected FPGAs with flash storage. The scheme is by introducing an accelerator modules which apply operations to data-stream passing through the FPGA, in addition to appropriate data distribution and partitioning. We implemented an accelerator module that aggregates the data transferred from the flash storage to the DRAM in order to show availability. Through preliminary evaluations of the accelerator, we confirmed that aggregate operations supported by the active-disk mechanism outperforms a software-based database management system by more than 30 times.

augmented human international conference | 2013

A real-time gait improvement tool using a smartphone

Hirotaka Kashihara; Hiroki Shimizu; Hiroyoshi Houchi; Masato Yoshimi; Tsutomu Yoshinaga; Hidetsugu Irie

Recent handy devices are provided with various sensors and have realized a lot of functions as downsizing and speeding up of computers. Currently smartphones occupy significant positions as the multifunctional handy devices. One of the most observable feature is that the users carry the smartphone whenever leaving home. Analyzing the motion measured by such device can be useful to improve lifestyle habits. Gaits should be focused as the representative behavior of daily living, which is shown by the fact that there are a lot of exercises intended to improve gaits.

Explore More