Makoto Onizuka | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Makoto Onizuka is active.

Explore More

Publication

Featured researches published by Makoto Onizuka.

very large data bases | 2015

SCAN++: efficient algorithm for finding clusters, hubs and outliers on large-scale graphs

Hiroaki Shiokawa; Yasuhiro Fujiwara; Makoto Onizuka

Graph clustering is one of the key techniques for understanding the structures present in graphs. Besides cluster detection, identifying hubs and outliers is also a key task, since they have important roles to play in graph data mining. The structural clustering algorithm SCAN, proposed by Xu et al., is successfully used in many application because it not only detects densely connected nodes as clusters but also identifies sparsely connected nodes as hubs or outliers. However, it is difficult to apply SCAN to large-scale graphs due to its high time complexity. This is because it evaluates the density for all adjacent nodes included in the given graphs. In this paper, we propose a novel graph clustering algorithm named SCAN++. In order to reduce time complexity, we introduce new data structure of directly two-hop-away reachable node set (DTAR). DTAR is the set of two-hop-away nodes from a given node that are likely to be in the same cluster as the given node. SCAN++ employs two approaches for efficient clustering by using DTARs without sacrificing clustering quality. First, it reduces the number of the density evaluations by computing the density only for the adjacent nodes such as indicated by DTARs. Second, by sharing a part of the density evaluations for DTARs, it offers efficient density evaluations of adjacent nodes. As a result, SCAN++ detects exactly the same clusters, hubs, and outliers from large-scale graphs as SCAN with much shorter computation time. Extensive experiments on both real-world and synthetic graphs demonstrate the performance superiority of SCAN++ over existing approaches.

international world wide web conferences | 2005

Incremental maintenance for materialized XPath/XSLT views

Makoto Onizuka; Fong Yee Chan; Ryusuke Michigami; Takashi Honishi

This paper proposes an incremental maintenance algorithm that efficiently updates the materialized XPath/XSLT views defined using XPath expressions in XP([],*,//,vars). The algorithm consists of two processes. 1) The dynamic execution flow of an XSLT program is stored as an XT (XML Transformation) tree during the full transformation. 2) In response to a source XML data update, the impacted portions of the XT-tree are identified and maintained by partially re-evaluating the XSLT program. This paper discusses the XPath/XSLT features of incremental view maintenance for subtree insertion/deletion and applies them to the maintenance algorithm. Experiments show that the incremental maintenance algorithm outperforms full XML transformation algorithms by factors of up to 500.

very large data bases | 2013

Optimization for iterative queries on MapReduce

Makoto Onizuka; Hiroyuki Kato; Soichiro Hidaka; Keisuke Nakano; Zhenjiang Hu

We propose OptIQ, a query optimization approach for iterative queries in distributed environment. OptIQ removes redundant computations among different iterations by extending the traditional techniques of view materialization and incremental view evaluation. First, OptIQ decomposes iterative queries into invariant and variant views, and materializes the former view. Redundant computations are removed by reusing the materialized view among iterations. Second, OptIQ incrementally evaluates the variant view, so that redundant computations are removed by skipping the evaluation on converged tuples in the variant view. We verify the effectiveness of OptIQ through the queries of PageRank and k-means clustering on real datasets. The results show that OptIQ achieves high efficiency, up to five times faster than is possible without removing the redundant computations among iterations.

international parallel and distributed processing symposium | 2016

Rabbit Order: Just-in-Time Parallel Reordering for Fast Graph Analysis

Junya Arai; Hiroaki Shiokawa; Takeshi Yamamuro; Makoto Onizuka; Sotetsu Iwamura

Ahead-of-time data layout optimization by vertex reordering is a widely used technique to improve memory access locality in graph analysis. While reordered graphs yield better analysis performance, the existing reordering algorithms use significant amounts of computation time to provide efficient vertex ordering, hence, they fail to reduce end-to-end processing time. This paper presents a first algorithm for just-in-time parallel reordering, named Rabbit Order. It reduces end-to-end runtime by achieving high locality and fast reordering at the same time through two approaches. The first approach is hierarchical community-based ordering, which exploits the locality derived from hierarchical community structures in real-world graphs. Our ordering fully leverages low-latency cache levels by mapping hierarchical communities into hierarchical caches. The second approach is parallel incremental aggregation, which improves the runtime efficiency of reordering by decreasing the number of vertices to be processed. In addition, this approach utilizes lightweight atomic operations for concurrency control to avoid locking overheads and achieve high scalability. Our experiments show that Rabbit Order significantly outperforms state-of-the-art reordering algorithms.

international world wide web conferences | 2008

Application of bitmap index to information retrieval

Kengo Fujioka; Yukio Uematsu; Makoto Onizuka

We developed the HS-bitmap index for efficient information retrieval. The HS-bitmap index is a hierarchical document-term matrix: the original document-term matrix is called the leaf matrix and an upper matrix is the summary of its lower matrix. Our experiment results show the HS-bitmap index performs better than the inverted index with a minor space overhead.

acm multimedia | 2001

Content-based retrieval applications on a common database management system

Naoko Kosugi; Go Nishimura; Junji Teramoto; Kazuyoshi Mii; Makoto Onizuka; Seiichi Konya; Akira Kojima; Ryoji Kataoka; Takashi Honishi; Kazuhiko Kushima

The use of multimedia data is widespread among general PC users. Thus, it is important to construct large multimedia databases and to develop easy-to-use and fast retrieval systems for such databases. General retrieval systems accept only key words as queries. However, formulating a query in words is often difficult in the case of multimedia data retrieval. Thus, content-based retrieval is attracting increasing interest as a solution to this problem[lO]. To address this need, we have been developing a mediaindependent database management system (DBMS) as well as media-dependent data processing techniques. This enables us to provide various types of practical multimedia retrieval applications in a timely fashion by developing contentbased retrieval systems for various types of media and their extensions based on the common DBMS framework. This paper briefly describes the common DBMS framework and the latest content-based retrieval applications based on the framework.

database systems for advanced applications | 2010

Lazy view maintenance for social networking applications

Keita Mikami; Shinji Morishita; Makoto Onizuka

We introduce CAMEL, a lazy view maintenance system for social networking applications on a database server with a distributed memory cache. System administrators can control the throughput of the system by tuning the level of freshness of materialized views. CAMEL employs the existing view maintenance techniques of incremental maintenance, lazy maintenance, and control table. In addition, CAMEL optimizes view maintenance performance by pushing the top-k operation down to before join operations and by constructing a reverse index. We evaluate CAMEL using real data from a mini-blog service. The results show that CAMEL is 6.13 and 11.2 times faster than the method of eager view maintenance while keeping the freshness of materialized views at 66.2% and 38.0%, respectively.

Data Science and Engineering | 2017

Graph Partitioning for Distributed Graph Processing

Makoto Onizuka; Toshimasa Fujimori; Hiroaki Shiokawa

There is a large demand for distributed engines that efficiently process large-scale graph data, such as social graph and web graph. The distributed graph engines execute analysis process after partitioning input graph data and assign them to distributed computers, so the quality of graph partitioning largely affects the communication cost and load balance among computers during the analysis process. We propose an effective graph partitioning technique that achieves low communication cost and good load balance among computers at the same time. We first generate more clusters than the number of computers by extending the modularity-based clustering, and then merge those clusters into balanced-size clusters until the number of clusters becomes the number of computers by using techniques designed for graph packing problem. We implemented our technique on top of distributed graph engine, PowerGraph, and made intensive experiments. The results show that our partitioning technique reduces the communication cost so it improves the response time of graph analysis patterns. In particular, PageRank computation is 3.2 times faster at most than HDRF, the state-of-the art of streaming-based partitioning approach.

information integration and web-based applications & services | 2016

Grouping method of dementia care text information to share dementiacare information in a website named Ninchisho Chienowa-net

Hisae Nakajima; Naoko Kosugi; Makoto Onizuka; Hiroaki Kazui; Manabu Ikeda

This paper proposes a grouping technique of sentences that contain dementia care information and describes analytical results of statistical information which influences the grouping. This paper focuses on the grouping of the incidents in sentences. One of the problems to group the incidents is that the sentences input by caregivers are redundant. In other words, the sentences contain care information which were related to the incidents, but were unnecessary for the grouping. In order to extract the incident information from the sentences, we divide them into sub-sentences for every verbs and auxiliary verbs. Similarities among sub-sentences are used for the grouping and the accuracy is evaluated with the precision. As a result, we found that a sub-sentence containing seven words is good for the grouping.

information integration and web-based applications & services | 2015

Ninchisho Chienowa-net: a website to share good dementia care techniques

Naoko Kosugi; Makoto Onizuka; Hiroaki Kazui; Manabu Ikeda

This paper presents our web system to collect and publish good dementia care techniques for caregivers. The system collects a lot of dementia care information from caregivers and, in the future, it will extract good dementia care methods from the information by using the text mining technology. The web system was released for about 40 users in July, 2015 to validate the system functions and to get feedbacks for the system. The system was well received by the users. We also found that the caregivers spend about 20 minutes on average for inputting care information in a single session. In addition, the number of input data of the choice was almost double of those of the blank input. We will improve the web system based on these results and the feedbacks from the users in the future.

Explore More