Katsuhiro Yamazaki | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Katsuhiro Yamazaki is active.

Explore More

Publication

Featured researches published by Katsuhiro Yamazaki.

PDSE '98 Proceedings of the International Symposium on Software Engineering for Parallel and Distributed Systems | 1998

A Case-Based Parallel Programming System

Katsuhiro Yamazaki; Shoichi Ando

This paper describes how to reduce the burden of parallel programming by utilizing relevant parallel programs. Parallel algorithms are divided into four classes and a case base for parallel programming is developed by retrieving parallel programs in each class. Cases consist of indices, a skeleton, a program, parallelization effects and a history. Skeletons include the most important issues such as task division, synchronization, mutual exclusion, parallelization methods and threads. Parallel programs for image data storage, three dimensional spline, edge detection, thinning, knapsack problem and package wrapping algorithm are developed by retrieving the most relevant case and adapting it to the given problem. The experiment demonstrates that threads and synchronization can be reused from skeletons, and task division should be adapted by programmers.

field-programmable logic and applications | 2008

Three-stage pipeline implementation for SHA2 using data forwarding

Hoang Anh Tuan; Katsuhiro Yamazaki; Shigeru Oyanagi

The security hash algorithm 512 (SHA-512), which is used to verify the integrity of a message, involves computation iterations on data. The huge computation delay generated in that iteration limits the entire throughput of the system, and makes it difficult to pipeline the computation. To shorten the computation time in an iteration of the main loop, we used the data forwarding method. Here we introduce an architecture that simultaneously does data computation of an iteration and data movement of the next one. Then the computations are broken into two stages for one operand and three stages for another operand. The implementation occupies 1,520 hardware slices on Xilinx Virtex-4 family FPGA chip, and achieves nearly 2.2 Gbps. Thus, the implementation achieved a better area performance rate (throughput/area) in comparison with the related work.

euro-mediterranean conference | 2018

Ancient Asian Character Recognition for Literature Preservation and Understanding

Lin Meng; C. V. Aravinda; K. R. Uday Kumar Reddy; Tomonori Izumi; Katsuhiro Yamazaki

This paper introduces a project for automatically recognizing ancient Asian characters by image processing and deep learning with the aim of preserving Asian culture. The ancient characters examined include Chinese and Indian characters, which are the most mysterious, wildly used, and historic in the ancient world, and also feature multiply types. The automatic recognition method consists of preprocessing and recognition processing. The preprocessing includes character segmentation and noise reduction, and the recognition processing has a conventional recognition and deep learning. The conventional recognition method consists of feature extraction and similarity calculation or classification, and data augmentation is a key part of the deep learning. Experimental results show that deep learning achieves a better recognition accuracy than conventional image processing. Our aim is to preserve ancient literature by digitizing it and clarifying the characters and how they change throughout history by means of accurate character recognition. We also hope to help people discover new knowledge from ancient literature.

euro-mediterranean conference | 2018

Unlocking Potential Knowledge Hidden in Rubbing

Lin Meng; Masahiro Kishi; Kana Nogami; Michiko Nabeya; Katsuhiro Yamazaki

Rubbings are among the oldest ancient literatures and potentially contain a lot of knowledge waiting to be unlocked. Constructing a rubbing database has therefore become an important research topic in terms of discovering and clarifying the potential knowledge. However, current rubbing databases are very simply, and there is no process in place for discovering the potential knowledge discovery. Moreover, the rubbing characters need to be recognized manually because there are so many different character styles and because the rubbings are in various stages of damage due to the aging process, and this takes an enormous amount of time and effort. In this work, our aim is to construct a spatiotemporal rubbing database based on multi-style Chinese character recognition using deep learning, that visualizes the spatiotemporal information in the form of a keyword of rubbing images on a map. The idea is that the potential knowledge unlocked by the keyword will help with research on historical information organization, climatic variation, disaster prediction and response, and more.

field programmable gate arrays | 2015

FPGA-based BLOB Detection Using Dual-pipelining (Abstract Only)

Naoto Nojiri; Lin Meng; Katsuhiro Yamazaki

Binary Large OBject (BLOB) detection is utilized in various fields such as car cameras, traffic sign recognition and surveillance systems. Although labeling is an important component in BLOB detection, it is difficult to be parallelized using a look-up table (LUT) in terms of data dependency. Since BLOB detection takes a long time, recognition speed and accuracy need to be improved. This research aims to detect BLOBs as fast as possible by using dual-pipelining image processing on the FPGA. Dual-pipelining is to perform pipeline processing in parallel to the upper and lower portions of an original image after dividing it into two portions. We have to consider the timing of each module around the borderline because of the data dependency in label generation. The image processing consists of Gaussian filtering, binarization, labeling, and BLOB analysis. Generally, labeling uses a LUT to combine multiple numbers for one object into the smallest number of temporary labels. In order to simplify the labeling, the connected components of each BLOB are stored and revised just in the LUT. In our approach, a BLOB can be detected when multiple temporary labels are stored in a same entry of the LUT, thus enabling us to detect BLOBs by dual-pipelining. Although our labeling method does not revise temporary labels into a unified label, BLOBs can be detected and their numbers, areas, and centroids are correctly computed. We compared our approach with a related work, which consists of three steps: identifying the connected pixels in each row, labeling the counted pixels in different rows, computing the area and centroid. Experimental results show that the dual-pipelining system using FPGA can detect BLOBs in 0.06 ms, which is 3.92 times faster than the related work and 1.83 times faster than a single-pipelining system. The dual-pipelining system utilized 1.5% of Registers, 8.4% of LUT, 24.3% of LUT-FF pairs, 91.9% of BRAM in Virtex V. The dual-pipelining system is about twice as large as the single-pipelining system. Our approach can be applied for the other areas such as traffic sign recognition and vehicle detection.

field programmable gate arrays | 2014

Pipelining FPPGA-based defect detction in FPDs (abstract only)

Lin Meng; Keisuke Matsuyama; Naoto Nojiri; Tomonori Izumi; Katsuhiro Yamazaki

The real-time detection of defects in Flat-Panel Displays (FPDs) is very important during the production stages. This paper describes the manner in which defects induced by bubbles are detected as fast as possible by using 4-stage image processing pipelines with 3-line buffers on a Field-Programmable Gate Array (FPGA). The image processing consists of reading a Time Delay Integration (TDI) image, Laplacian filtering, binarization, and labeling. TDI is applied to the initial image of the FPD to reduce noises induced when taking the FPD images. Laplacian filtering and binarization are used to detect the edges in the image, and labeling is used to number the objects in the image for defect detection. In the 4-stage pipelining, the first stage reads the TDI image from the Block Random Access Memory (BRAM), the second stage implements Laplacian filtering and binarization, the third stage implements labeling, and the final stage revises the labels and writes them into the BRAM. The target pixel and its eight surrounding neighbors are required during Laplacian filtering, and four neighbors are necessary during labeling. Thus, three line registers (3-line buffer) are used as a general pipeline register between two neighboring stages in our system. The pipelining system accesses these 3-line buffers and runs four image processing steps in parallel. Therefore, the system uses four different addresses to access the BRAM and the 3-line buffers. Further, to facilitate performance comparison, we implemented sequential image processing systems with 3-line buffers on FPGA and CPU software. The experiments reveal that Laplacian filtering, binarization, and labeling for FPD defect detection can be executed in less than 1 ms by using four-stage pipelining on an FPGA, which is 3.62 times faster than the sequential system and 158.7 times faster than the CPU software. The pipelining system is 28% larger as compared to the sequential system in terms of the size of the LUTs.

ieee international conference on high performance computing data and analytics | 1999

Parallel Radiosity: Evaluation of Parallel Form Factor Calculations and a Static Load Balancing Algorithm

Akira Uejima; Katsuhiro Yamazaki

Although the radiosity algorithm can generate photo-realistic images due to global illumination effects, a large amount of form factor calculations are required. This paper describes how to parallelize the radiosity algorithm by subdividing hemispheres into multiple elements and allocating them statically to multiple processors. An enhanced communication procedure is proposed, where partial hemisphere data at each processor, is communicated and the complete hemisphere data prepared on all processors. In this procedure, the size of the communication data is independent of the number of elements. In addition, the load balancing efficiency of our static load balancing algorithm is evaluated. On the distributed memory parallel computer AP1000+ which has 64 processors, the speedup is 20.4¨1.8 for benchmark scenes, and 35.00.2 for classroom scenes. The load balancing efficiency is 0.93∮.96.

Ipsj Online Transactions | 2009