Kazuki Joe | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kazuki Joe is active.

Explore More

Publication

Featured researches published by Kazuki Joe.

conference on high performance computing (supercomputing) | 1993

A distributed shared memory multiprocessor: ASURA - Memory and cache architectures

Shin-ichiro Mori; Hideki Saito; Masahiro Goshima; Mamoru Yanagihara; Takashi Tanaka; David Fraser; Kazuki Joe; Hiroyuki Nitta; Shinji Tomita

ASURA is a large scale, cluster-based, distributed, shared memory, multiprocessor being developed at Kyoto University and Kubota Corporation. Up to 128 clusters are interconnected to form an ASURA system of up to 1024 processors. The basic concept of the ASURA design is to take advantage of the hierarchical structure of the system. Implementing this concept, a large shared cache is placed between each cluster and the inter-cluster network. The shared cache and the shared memories distributed among the clusters form part of ASURAs hierarchical memory architecture, providing various unique features to ASURA. In this paper, the hierarchical memory architecture of ASURA and its unique cache coherence scheme, including a proposal of a new hierarchical directory scheme, are described with some simulation results.

international conference on parallel architectures and compilation techniques | 1999

The modulo interval: a simple and practical representation for program analysis

Tsuneo Nakanishi; Akira Fukuda; Kazuki Joe; Constantine D. Polychronopoulos

In this paper, the modulo interval, an extension of the traditional interval on real numbers, and its useful mathematical properties are presented as a representation for program analysis. Only with two additional parameters to the interval on real numbers, namely the modulus and the residue, the modulo interval can represent information on programs having cyclicity such as loop indices, array subscripts etc., at reasonable complexity and more accuracy. Well-defined arithmetic and set operations on the modulo interval make implementation of compilers simple and reliable. Moreover, application of the modulo interval to program analysis for parallelizing compilers is discussed in this paper.

ieee international conference on high performance computing data and analytics | 1999

NaraView: An Interactive 3D Visualization System for Parallelization of Programs

Mariko Sasakura; Kazuki Joe; Yoshitoshi Kunieda; Keijiro Araki

For effective use of parallelizing compilers, an interactive environment which allows users to find more parallelism is needed. As the first step towards building such an environment, we have developed a program visualization system named NaraView. In this paper, we describe two visualization methods in NaraView. One is Program Structure View which illustrates the hierarchical loop structure of a given program and suggests which parts of the program can be parallelized. Another is the Data Dependence View which visualizes each data dependence on every variable or array element which is accessed in a specific loop. By using these views, users can easily understand which part of the program can be parallelized further. We also show several examples to demonstrate the efficiency of these methods.

IEICE Transactions on Information and Systems | 2008

Efficient Query-by-Content Audio Retrieval by Locality Sensitive Hashing and Partial Sequence Comparison

Yi Yu; Kazuki Joe; J. Stephen Downie

This paper investigates suitable indexing techniques to enable efficient content-based audio retrieval in large acoustic databases. To make an index-based retrieval mechanism applicable to audio content, we investigate the design of Locality Sensitive Hashing (LSH) and the partial sequence comparison. We propose a fast and efficient audio retrieval framework of query-by-content and develop an audio retrieval system. Based on this framework, four different audio retrieval schemes, LSH-Dynamic Programming (DP), LSH-Sparse DP (SDP), Exact Euclidian LSH (E2LSH)-DP, E2LSH-SDP, are introduced and evaluated in order to better understand the performance of audio retrieval algorithms. The experimental results indicate that compared with the traditional DP and the other three compititive schemes, E2LSH-SDP exhibits the best tradeoff in terms of the response time, retrieval accuracy and computation cost.

ieee international conference on high performance computing data and analytics | 2005

Development of an interactive visual data mining system for atmospheric science

Chiemi Watanabe; Eriko Touma; Kazuko Yamauchi; Katsuyuki Noguchi; Sachiko Hayashida; Kazuki Joe

In atmospheric science, 3D visualization techniques have been mainly used to create impressive presentation in recent decades. However, from the viewpoint of utilize for visual data mining, 3D visualization methodology has difficulties in becoming wide spread because most conventional and established way is to make 2D diagrams consisting of two dimensions of a temporal transitional 3D grid. From these observations, we have been developing a quick look tool of atmospheric science data for 3d visual data mining. We expect that scientists can utilize this tool for finding out 2D diagrams from the data by using various 2D or 3D visualization methods, and become accustomed themselves to 3D visualization methods.

international symposium on multimedia | 2008

Using Exact Locality Sensitive Mapping to Group and Detect Audio-Based Cover Songs

Yi Yu; J.S. Downie; Fabian Moerchen; Lei Chen; Kazuki Joe

Cover song detection is becoming a very hot research topic when plentiful personal music recordings or performance are released on the Internet. A nice cover song recognizer helps us group and detect cover songs to improve the searching experience. The traditional detection is to match two musical audio sequences by exhaustive pairwise comparisons. Different from the existing work, our aim is to generate a group of concatenated feature sets based on regression modeling and arrange them by indexing-based approximate techniques to avoid complicated audio sequence comparisons. We mainly focus on using exact locality sensitive mapping (ELSM) to join the concatenated feature sets and soft hash values. Similarity-invariance among audio sequence comparison is applied to define an optimal combination of several audio features. Soft hash values are pre-calculated to help locate searching range more accurately. Furthermore, we implement our algorithms in analyzing the real audio cover songs and grouping and detecting a batch of relevant cover songs embedded in large audio datasets.

languages and compilers for parallel computing | 1994

The Data Partitioning Graph: Extending Data and Control Dependencies for Data Partitioning

Tsuneo Nakanishi; Kazuki Joe; Akira Fukuda; Keijiro Araki; Hideki Saito; Constantine D. Polychronopoulos

Scalability and cost considerations suggest that distributed and distributed shared memory parallel computers will dominate future parallel architectures. These machines could not be used effectively unless efficient automatic and static solutions to the data partitioning and placement problem become available. Significant progress toward this end has been made in the last few years, but we are still far from having general solutions which are efficient for all classes of applications. In this paper we propose the data partitioning graph (DPG) as an intermediate representation for parallelizing compilers, which augments previous intermediate representations, and provides a framework for carrying out partitioning and placement of not only regular data structures (such as arrays), but also of irregular structures and scalar variables. Although recent approaches to task-graph-based intermediate representations focus on representing data and control dependencies between tasks, they largely ignore the use of program variables by the different tasks. Traditional data partitioning methods usually employ algorithm-dependent techniques, and are considered independently of processor assignments (which ought to be handled simultaneously with data partitioning). Moreover, approaches to data partitioning concentrate exclusively on array structures. By explicitly encapsulating the use of program variables by the task nodes, the DPG provides a framework for handling data partitioning as well as processor assignment in the same context. We also discuss the hierarchical data partitioning graph (HDPG) which encapsulates the hierarchy of the compiled programs and is used to map the hierarchy of computations to massively parallel computers with distributed memory system.

parallel, distributed and network-based processing | 2011

Scaleable Sparse Matrix-Vector Multiplication with Functional Memory and GPUs

Noboru Tanabe; Yuuka Ogawa; Masami Takata; Kazuki Joe

Sparse matrix-vector multiplication on GPUs faces to a serious problem when the vector length is too large to be stored in GPUs device memory. To solve this problem, we propose a novel software-hardware hybrid method for a heterogeneous system with GPUs and functional memory modules connected by PCI express. The functional memory contains huge capacity of memory and provides scatter/gather operations. We perform some preliminary evaluation for the proposed method with using a sparse matrix benchmark collection. We observe that the proposed method for a GPU with converting indirect references to direct references without exhausting GPUs cache memory achieves 4.1 times speedup compared with conventional methods. The proposed method intrinsically has high scalability of the number of GPUs because intercommunication among GPUs is completely eliminated. Therefore we estimate the performance of our proposed method would be expressed as the single GPU execution performance, which may be suppressed by the burst-transfer bandwidth of PCI express, multiplied with the number of GPUs.

conference on multimedia modeling | 2007

Similarity searching techniques in content-based audio retrieval via hashing

Yi Yu; Masami Takata; Kazuki Joe

With this work we study suitable indexing techniques to support efficient content-based music retrieval in large acoustic databases. To obtain the index-based retrieval mechanism applicable to audio content, we pay the most attention to the design of Locality Sensitive Hashing (LSH) and the partial sequence comparison, and propose a fast and efficient audio retrieval framework of query-by-content. On the basis of this indexable framework, four different retrieval schemes, LSH-Dynamic Programming (DP), LSH-Sparse DP (SDP), Exact Euclidian LSH (E2LSH)-DP, E2LSH-SDP, are presented and estimated in order to achieve an extensive understanding of retrieval algorithms performance. The experiment results indicate that compared to other three schemes, E2LSH-SDP exhibits best tradeoff in terms of the response time, retrieval ratio, and computation cost.

computer software and applications conference | 1997

A parallelizing compiler by object oriented design

Yoichi Omori; Kazuki Joe; Akira Fukuda

Applying conventional compiler design methodology to a parallelizing compiler, its internal data structures tend to get too complicated quickly. Thus, we introduce object oriented design from the problem analysis stage and achieve the following improvements: 1) consistent modeling from the theory to the implementation; 2) reduced program size through improved reusability based on better class design methodology; and 3) flexible coding through a stub class for parallelization. We extract objects based on stream and thus clarify similarities and differences to the conventional design schemes. Then we show the framework of the internal classes used in our parallelizing compiler which enhances design level reusability in C++ implementation of the compiler. Furthermore, we provide a virtual class to be used as a unit of MIMD style parallel execution and make it a common representation among different parallelization algorithms. Finally, we compare our design against SUIF, which also uses C++, and show the improvements on the design classes.

Explore More