Yuichiro Shibata | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yuichiro Shibata is active.

Explore More

Publication

Featured researches published by Yuichiro Shibata.

field-programmable technology | 2011

Deep pipelined one-chip FPGA implementation of a real-time image-based human detection algorithm

Kazuhiro Negi; Keisuke Dohi; Yuichiro Shibata; Kiyoshi Oguri

In this paper, deep pipelined FPGA implementation of a real-time image-based human detection algorithm is presented. By using binary patterned HOG features, AdaBoost classifiers generated by offline training, and some approximation arithmetic strategies, our architecture can be efficiently fitted on a low-end FPGA without any external memory modules. Empirical evaluation reveals that our system achieves 62.5 fps of the detection throughput, showing 96.6% and 20.7% of the detection rate and the false positive rate, respectively. Moreover, if a highspeed camera device is available, the maximum throughput of 112 fps is expected to be accomplished, which is 7.5 times faster than software implementation.

field-programmable logic and applications | 2007

FPGA Implementation of a Data-Driven Stochastic Biochemical Simulator with the Next Reaction Method

M. Yoshiini; Yow Iwaoka; Yuri Nishikawa; Toshinori Kojima; Yasunori Osana; Yuichiro Shibata; Naoki Iwanaga; Hideki Yamada; Hiroaki Kitano; Akira Funahashi; Noriko Hiroi; Hideharu Amano

This paper introduces a scalable FPGA implementation of a stochastic simulation algorithm (SSA) called the next reaction method. There are some hardware approaches of SSAs that obtained high-throughput on reconfigurable devices such as FPGAs, but these works lacked in scalability. The design of this work can accommodate to the increasing size of target biochemical models, or to make use of increasing capacity of FPGAs. Interconnection network between arithmetic circuits and multiple simulation circuits aims to perform a data-driven multi-threading simulation. Approximately 8 times speedup was obtained compared to an execution on Xeon 2.80 GHz.

field-programmable logic and applications | 2011

Pattern Compression of FAST Corner Detection for Efficient Hardware Implementation

Keisuke Dohi; Yuji Yorita; Yuichiro Shibata; Kiyoshi Oguri

This paper shows stream-oriented FPGA implementation of the machine-learned Features from Accelerated Segment Test (FAST) corner detection, which is used in the parallel tracking and mapping (PTAM) for augmented reality (AR). One of the difficulties of compact hardware implementation of the FAST corner detection is a matching process with a large number of corner patterns. We propose corner pattern compression methods focusing on discriminant division and pattern symmetry for rotation and inversion. This pattern compression enables implementation of the corner pattern matching with a combinational circuit. Our prototype implementation achieves real-time execution performance with 7-9% of available slices of a Virtex-5 FPGA.

field programmable custom computing machines | 2000

A virtual hardware system on a dynamically reconfigurable logic device

Yuichiro Shibata; Masaki Uno; Hideharu Amano; Koichiro Furuta; Taro Fujii; Masato Motomura

WASMII is virtual hardware using a multi-context reconfigurable device with a data driven control. Since implementation of WASMII was infeasible due to the unavailability of such a device, the system has been only evaluated using an emulator so far. However, the first reconfigurable multi-context device called DRL has been developed by NEC. Making the use of its flexible reconfigurability, we have implemented a mechanism of WASMII on DRL.

application specific systems architectures and processors | 2010

Highly efficient mapping of the Smith-Waterman algorithm on CUDA-compatible GPUs

Keisuke Dohi; Khaled Benkridt; Cheng Ling; Tsuyoshi Hamada; Yuichiro Shibata

This paper describes a multi-threaded parallel design and implementation of the Smith-Waterman (SW) algorithm on graphic processing units (GPUs) with NVIDIA corporations Compute Unified Device Architecture (CUDA). Central to this is a divide and conquer approach which divides the computation of a whole pairwise sequence alignment matrix into multiple sub-matrices (or parallelograms) each running efficiently on the available hardware resources of the GPU in hand, with temporary intermediate data stored in global memory. Moreover, we use thread warps and padding techniques in order to decrease the cost of thread synchronization, as well as loop unrolling in order to reduce the cost of conditional branches. While intermediate data is stored in global memory for large queries, the most inner loop in our implementation will only access shared memory and registers. As a result of these optimizations, our implementation of the SW algorithm achieves a throughput ranging between 9.09 GCUPS (Giga Cell Update per Second) and 12.71 GCUPS on a single-GPU version, and a throughput between 29.46 GCUPS and 43.05 GCUPS on a quad-GPU platform. Compared with the best GPU implementation of the SW algorithm reported to date, our implementation achieves up to 46 % improvement in speed. The source code of our implementation is available in the public domain for Bioinformaticians to benefit from its performance.

field-programmable technology | 2005

The design of scalable stochastic biochemical simulator on FPGA

Masato Yoshimi; Yasunori Osana; Yow Iwaoka; Akira Funahashi; Noriko Hiroi; Yuichiro Shibata; Naoki Iwanaga; Hiroaki Kitano; Hideharu Amano

Biochemical simulations including whole-cell models require high performance computing systems. Reconfigurable systems are expected to be an alternative solution for conventional methods by PC clusters or vector computers. This paper shows the implementation of a stochastic biochemical simulation algorithm called Next Reaction Method for Virtex-II PRO. As the result of benchmarking with a small reaction system, the FPGA-based simulator outperforms the software implementation on Xeon 2.40 GHz by 17.1 times

international conference industrial engineering other applications applied intelligent systems | 2009

Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation with Nvidia CUDA Compatible Devices

Tomonari Masada; Tsuyoshi Hamada; Yuichiro Shibata; Kiyoshi Oguri

Next-Generation Applied Intelligence: 22nd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2009, Tainan, Taiwan, June 24-27, 2009.

field-programmable logic and applications | 2005

Efficient scheduling of rate law functions for ODE-based multimodel biochemical simulation on an FPGA

Naoki Iwanaga; Yuichiro Shibata; Masato Yoshimi; Yasunori Osana; Yow Iwaoka; Tomonori Fukushima; Hideharu Amano; Akira Funahashi; Noriko Hiroi; Hiroaki Kitano; Kiyoshi Oguri

A reconfigurable biochemical simulator by solving ordinary differential equations has received attention as a personal high speed environment for biochemical researchers. For efficient use of the reconfigurable hardware, static scheduling of high-throughput arithmetic pipeline structures is essential. This paper shows and compares some scheduling alternatives, and analyzes the tradeoffs between performance and hardware amount. Through the evaluation, it is shown that the sharing first scheduling reduces the hardware cost by 33.8% in average, with the up to 11.5% throughput degradation. Effects of sharing of rate law functions are also analyzed.

field programmable logic and applications | 2012

Deep-pipelined FPGA implementation of ellipse estimation for eye tracking

Keisuke Dohi; Yuma Hatanaka; Kazuhiro Negi; Yuichiro Shibata; Kiyoshi Oguri

This paper presents a deep-pipelined FPGA implementation of real-time ellipse estimation for eye tracking. The system is constructed by the Starburst algorithm on a stream-oriented architecture and the RANSAC algorithm without any external memories. In particular, the paper presents comparative results between three different hypothesis generators for the RANSAC algorithm based on Cramers rule, Gauss-Jordan elimination and LU decomposition. Comparison criteria include resource usage, throughput and energy consumption. The result shows that the three implementations have different characteristics and the optimal algorithm needs to be chosen depending on the amount of resources on FPGAs and required performance.

conference on information and knowledge management | 2009

Dynamic hyperparameter optimization for bayesian topical trend analysis

Tomonari Masada; Daiji Fukagawa; Atsuhiro Takasu; Tsuyoshi Hamada; Yuichiro Shibata; Kiyoshi Oguri

This paper presents a new Bayesian topical trend analysis. We regard the parameters of topic Dirichlet priors in latent Dirichlet allocation as a function of document timestamps and optimize the parameters by a gradient-based algorithm. Since our method gives similar hyperparameters to the documents having similar timestamps, topic assignment in collapsed Gibbs sampling is affected by timestamp similarities. We compute TFIDF-based document similarities by using a result of collapsed Gibbs sampling and evaluate our proposal by link detection task of Topic Detection and Tracking.

Explore More