Masato Inagi
Hiroshima City University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Masato Inagi.
field-programmable logic and applications | 2009
Masato Inagi; Yasuhiro Takashima; Yuichi Nakamura
Multi-FPGA systems are widely used for rapid prototyping and logic verification of VLSIs. To implement a huge logic circuit in a multi-FPGA system, the circuit needs to be partitioned into multiple FPGAs. Because of the limited interconnection resources between FPGAs, time-multiplexed I/Os are used for inter-FPGA connections. Due to the large delay of time-multiplexed I/Os, inter-FPGA connections strongly affect the system performance. In this paper, we extend an ILP-based optimization method of the inter-FPGA connections to improve the system performance. Our method uses both a normal I/O and a time-multiplexed I/O, and decides whether each inter-FPGA signal is transferred by a time-multiplexed I/O or not. Our extended method improves the system performance considering the variation of the amount of interconnection resources, and the variation of the number of inter-FPGA signals, from an FPGA pair to another FPGA pair. Experiments showed that our method improved the circuit performance on a 4-FPGA system by 26.4% compared with a conventional method, on average.
Ipsj Transactions on System Lsi Design Methodology | 2010
Masato Inagi; Yasuhiro Takashima; Yuichi Nakamura
Multi-FPGA prototyping systems are widely used to verify logic circuit designs. To implement a large circuit using such a system, the circuit is partitioned into multiple FPGAs. Subsequently, sub-circuits assigned to FPGAs are connected using interconnection resources among the FPGAs. Because of limited resources, time-multiplexed I/Os are used to accommodate all signals in exchange for system speed. In this study, we propose an optimization method of inter-FPGA connections for multi-FPGA systems with time-multiplexed I/Os to shorten the verification time by accelerating the systems. Our method decides whether each inter-FPGA signal is transferred by a normal I/O or a time-multiplexed I/O, which is slower than a normal I/O but can transfer multiple signals. Our method optimizes inter-FPGA connections not only between a single FPGA pair, but among all the FPGAs. Experiments showed that for four-way partitioned circuits, our method obtains an average system clock period 16.0% shorter than that of a conventional method.
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences | 2008
Masato Inagi; Yasuhiro Takashima; Yuichi Nakamura; Atsushi Takahashi
In multi-FPGA prototyping systems for circuit verification, serialized time-multiplexed I/O technique is used because of the limited number of I/O pins of an FPGA. The verification time depends on a selection of inter-FPGA signals to be time-multiplexed. In this paper, we propose a method that minimizes the verification time of multi-FPGA systems by finding an optimal selection of inter-FPGA signals to be time-multiplexed. In the experiments, it is shown that the estimated verification time is improved 38.2% on average compared with conventional methods.
digital systems design | 2013
Yasuhiro Shintani; Masato Inagi; Shinobu Nagayama; Shin'ichi Wakabayashi
Routing is one of the time-consuming processes in LSI design to connect previously placed terminals. In this study, we propose a multithreaded parallel routing algorithm for LSI design. In the proposed method, first, threads are created and the nets of the target net list are equally distributed to the threads. Sharing the routing regions, each of the threads searches a candidate path of a net in parallel without synchronization. Then, each thread exclusively writes a candidate path to the routing regions as a determined path. Although the exclusive control is necessary when updating the routing regions, this asynchronous parallel routing reduces the wait time of the threads. If a candidate path of a net does not satisfy constraints due to the asynchronous parallel routing, the net is re-routed. We experimentally confirmed that our proposed method running on a PC with eight cores was 7.1 times faster than the sequential execution. In addition, we also confirmed that the routing quality was not degraded compared to the sequential execution.
field-programmable logic and applications | 2011
Yoichi Wakaba; Masato Inagi; Shin'ichi Wakabayashi; Shinobu Nagayama
In this paper, we propose a systolic pattern-independent hardware regular expression matching (REM) engine which handles nested Kleene operators used in virus patterns. Pattern-independent systolic REM engines are suitable to network intrusion detection systems for quick updating of virus pattern. In the proposed engine, we introduce a compact pattern-independent NFA circuit, which can handle any small regular expression patterns, into a systolic REM engine to handle nested Kleene operators. Experimental results show that the extended engine implemented on an FPGA handles nested Kleene operators with efficient circuit size and high performance (2.17Gbps).
international symposium on circuits and systems | 2008
Masato Inagi; Yasuhiro Takashima; Yuichi Nakamura; Atsushi Takahashi
Due to the limited device capacity of an FPGA, multi-FPGA systems are used to verify huge state-of-the-art circuits. In the case, the number of I/O signals of each sub-circuit implemented in an FPGA tends to exceed the number of I/O-pins of the FPGA. To resolve the problem, time-multiplexed I/Os are used. Each of time-multiplexed I/Os is shared by multiple I/O signals of a sub-circuit by time-division. Since time-multiplexed I/Os introduce large delay, we propose algorithms which obtain the optimal number of required I/O-pins under the given timing constraint by choosing signals to be time-multiplexed.
field-programmable technology | 2006
Masato Inagi; Yasuhiro Takashima; Yuichi Nakamura; Yoji Kajitani
For multi-FPGA systems, the limitation of the number of FPGA I/O-pins is one of the most critical issues. Using time-multiplexed I/Os eases the limitation. While, a signal path through n time-multiplexed I/Os makes the system clock period n + 1 times longer at most. To capture this feature, we introduce a new cost total cut-hopcount. Under the total cut-hopcount, we propose a performance-driven bipartitioning method VIOP. VIOP combines three algorithms, such that i) min-cut partitioning, ii) coarse performance-driven partitioning, and iii) fine performance-driven partitioning. For min-cut and coarse performance-driven partitioning, we employ well-known bipartitioning algorithms CLIP-FM and DUBA, respectively. For fine performance-driven partitioning, we propose a partitioning algorithm CAVP. By VIOP, the average cost was improved by 11.5% compared with the state-of-the-art algorithms
Ipsj Transactions on System Lsi Design Methodology | 2014
Yoichi Wakaba; Shin'ichi Wakabayashi; Shinobu Nagayama; Masato Inagi
This paper proposes a method using partial reconfiguration to realize a compact regular expression matching engine, which can update a pattern quickly. In the proposed method, a set of partial circuits, each of which handles a different class of regular expressions, are provided in advance. When a regular expression pattern is given, a compact matching engine dedicated to the pattern is implemented on FPGA by combining the partial circuits according to the given pattern using partial reconfiguration. The method can update a pattern quickly, since it does not need re-design of a circuit. Experimental results show that the proposed method reduces 60% circuit size compared with the previous method without increasing the pattern updating time significantly.
database and expert systems applications | 2018
Yuri Itotani; Shin'ichi Wakabayashi; Shinobu Nagayama; Masato Inagi
This paper proposes an approximate nearest neighbor search algorithm for high-dimensional data. The proposed algorithm is based on a distance-based hashing called adaptive flexible distance-based hashing (AFDH). For a given query, AFDH returns a small-sized candidate set of nearest neighbors, and the one closest to the query is selected as the final result. The main advantage of the proposed algorithm is that, without fine tuning of parameter values of the algorithm, good search results can be obtained. Experimental results show that the proposed algorithm produces satisfactory results in terms of quality of results as well as execution time.
field-programmable technology | 2016
Yuto Arai; Shin'ichi Wakabayashi; Shinobu Nagayama; Masato Inagi
With the recent explosive growth of data in the real world, data mining techniques to obtain characteristics and knowledge from big data attract more attention. This paper focuses on a method to detect outliers in streaming data, and proposes a fast FPGA implementation of outlier detection based on the Mahalanobis distance. The proposed circuit is fully pipelined, and in every clock cycle, a given sample data can be judged as an outlier or not. Experimental evaluation shows that the proposed circuit is 37 times faster than the software implementation of the Mahalanobis distance-based outlier detection.