Hideaki Kimura | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hideaki Kimura is active.

Explore More

Publication

Featured researches published by Hideaki Kimura.

very large data bases | 2008

H-store: a high-performance, distributed main memory transaction processing system

Robert Kallman; Hideaki Kimura; Jonathan Natkins; Andrew Pavlo; Alexander Rasin; Stanley B. Zdonik; Evan Philip Charles Jones; Samuel Madden; Michael Stonebraker; Yang Zhang; John Hugg; Daniel J. Abadi

Our previous work has shown that architectural and application shifts have resulted in modern OLTP databases increasingly falling short of optimal performance [10]. In particular, the availability of multiple-cores, the abundance of main memory, the lack of user stalls, and the dominant use of stored procedures are factors that portend a clean-slate redesign of RDBMSs. This previous work showed that such a redesign has the potential to outperform legacy OLTP databases by a significant factor. These results, however, were obtained using a bare-bones prototype that was developed just to demonstrate the potential of such a system. We have since set out to design a more complete execution platform, and to implement some of the ideas presented in the original paper. Our demonstration presented here provides insight on the development of a distributed main memory OLTP database and allows for the further study of the challenges inherent in this operating environment.

very large data bases | 2009

A demonstration of SciDB: a science-oriented DBMS

Philippe Cudré-Mauroux; Hideaki Kimura; Kian-Tat Lim; Jennie Rogers; Roman Simakov; Emad Soroush; Pavel Velikhov; Daniel L. Wang; Magdalena Balazinska; Jacek Becla; David J. DeWitt; Bobbi Heath; David Maier; Samuel Madden; Jignesh M. Patel; Michael Stonebraker; Stanley B. Zdonik

In CIDR 2009, we presented a collection of requirements for SciDB, a DBMS that would meet the needs of scientific users. These included a nested-array data model, science-specific operations such as regrid, and support for uncertainty, lineage, and named versions. In this paper, we present an overview of SciDBs key features and outline a demonstration of the first version of SciDB on data and operations from one of our lighthouse users, the Large Synoptic Survey Telescope (LSST).

international conference on management of data | 2015

FOEDUS: OLTP Engine for a Thousand Cores and NVRAM

Hideaki Kimura

Server hardware is about to drastically change. As typified by emerging hardware such as UC Berkeleys Firebox project and by Intels Rack-Scale Architecture (RSA), next generation servers will have thousands of cores, large DRAM, and huge NVRAM. We analyze the characteristics of these machines and find that no existing database is appropriate. Hence, we are developing FOEDUS, an open-source, from-scratch database engine whose architecture is drastically different from traditional databases. It extends in-memory database technologies to further scale up and also allows transactions to efficiently manipulate data pages in both DRAM and NVRAM. We evaluate the performance of FOEDUS in a large NUMA machine (16 sockets and 240 physical cores) and find that FOEDUS achieves multiple orders of magnitude higher TPC-C throughput compared to H-Store with anti-caching.

ACM Transactions on Database Systems | 2012

Foster b-trees

Goetz Graefe; Hideaki Kimura; Harumi A. Kuno

Foster B-trees are a new variant of B-trees that combines advantages of prior B-tree variants optimized for many-core processors and modern memory hierarchies with flash storage and nonvolatile memory. Specific goals include: (i) minimal concurrency control requirements for the data structure, (ii) efficient migration of nodes to new storage locations, and (iii) support for continuous and comprehensive self-testing. Like Blink-trees, Foster B-trees optimize latching without imposing restrictions or specific designs on transactional locking, for example, key range locking. Like write-optimized B-trees, and unlike Blink-trees, Foster B-trees enable large writes on RAID and flash devices as well as wear leveling and efficient defragmentation. Finally, they support continuous and inexpensive yet comprehensive verification of all invariants, including all cross-node invariants of the B-tree structure. An implementation and a performance evaluation show that the Foster B-tree supports high concurrency and high update rates without compromising consistency, correctness, or read performance.

very large data bases | 2010

CORADD: correlation aware database designer for materialized views and indexes

Hideaki Kimura; George Huo; Alexander Rasin; Samuel Madden; Stanley B. Zdonik

We describe an automatic database design tool that exploits correlations between attributes when recommending materialized views (MVs) and indexes. Although there is a substantial body of related work exploring how to select an appropriate set of MVs and indexes for a given workload, none of this work has explored the effect of correlated attributes (e.g., attributes encoding related geographic information) on designs. Our tool identifies a set of MVs and secondary indexes such that correlations between the clustered attributes of the MVs and the secondary indexes are enhanced, which can dramatically improve query performance. It uses a form of Integer Linear Programming (ILP) called ILP Feedback to pick the best set of MVs and indexes for given database size constraints. We compare our tool with a state-of-the-art commercial database designer on two workloads, APB-1 and SSB (Star Schema Benchmark---similar to TPC-H). Our results show that a correlation-aware database designer can improve query performance up to 6 times within the same space budget when compared to a commercial database designer.We describe an automatic database design tool that exploits correlations between attributes when recommending materialized views (MVs) and indexes. Although there is a substantial body of related work exploring how to select an appropriate set of MVs and indexes for a given workload, none of this work has explored the effect of correlated attributes (e.g., attributes encoding related geographic information) on designs. Our tool identifies a set of MVs and secondary indexes such that correlations between the clustered attributes of the MVs and the secondary indexes are enhanced, which can dramatically improve query performance. It uses a form of Integer Linear Programming (ILP) called ILP Feedback to pick the best set of MVs and indexes for given database size constraints. We compare our tool with a state-of-the-art commercial database designer on two workloads, APB-1 and SSB (Star Schema Benchmark---similar to TPC-H). Our results show that a correlation-aware database designer can improve query performance up to 6 times within the same space budget when compared to a commercial database designer.

very large data bases | 2011

Compression aware physical database design

Hideaki Kimura; Vivek R. Narasayya; Manoj Syamala

Modern RDBMSs support the ability to compress data using methods such as null suppression and dictionary encoding. Data compression offers the promise of significantly reducing storage requirements and improving I/O performance for decision support queries. However, compression can also slow down update and query performance due to the CPU costs of compression and decompression. In this paper, we study how data compression affects choice of appropriate physical database design, such as indexes, for a given workload. We observe that approaches that decouple the decision of whether or not to choose an index from whether or not to compress the index can result in poor solutions. Thus, we focus on the novel problem of integrating compression into physical database design in a scalable manner. We have implemented our techniques by modifying Microsoft SQL Server and the Database Engine Tuning Advisor (DTA) physical design tool. Our techniques are general and are potentially applicable to DBMSs that support other compression methods. Our experimental results on real world as well as TPC-H benchmark workloads demonstrate the effectiveness of our techniques.

very large data bases | 2009

Correlation maps: a compressed access method for exploiting soft functional dependencies

Hideaki Kimura; George Huo; Alexander Rasin; Samuel Madden; Stanley B. Zdonik

In relational query processing, there are generally two choices for access paths when performing a predicate lookup for which no clustered index is available. One option is to use an unclustered index. Another is to perform a complete sequential scan of the table. Many analytical workloads do not benefit from the availability of unclustered indexes; the cost of random disk I/O becomes prohibitive for all but the most selective queries. It has been observed that a secondary index on an unclustered attribute can perform well under certain conditions if the unclustered attribute is correlated with a clustered index attribute [4]. The clustered index will co-locate values and the correlation will localize access through the unclustered attribute to a subset of the pages. In this paper, we show that in a real application (SDSS) and widely used benchmark (TPC-H), there exist many cases of attribute correlation that can be exploited to accelerate queries. We also discuss a tool that can automatically suggest useful pairs of correlated attributes. It does so using an analytical cost model that we developed, which is novel in its awareness of the effects of clustering and correlation. Furthermore, we propose a data structure called a Correlation Map (CM) that expresses the mapping between the correlated attributes, acting much like a secondary index. The paper also discusses how bucketing on the domains of both attributes in the correlated attribute pair can dramatically reduce the size of the CM to be potentially orders of magnitude smaller than that of a secondary B+Tree index. This reduction in size allows us to create a large number of CMs that improve performance for a wide range of queries. The small size also reduces maintenance costs as we demonstrate experimentally.

very large data bases | 2010

UPI: a primary index for uncertain databases

Hideaki Kimura; Samuel Madden; Stanley B. Zdonik

Uncertain data management has received growing attention from industry and academia. Many efforts have been made to optimize uncertain databases, including the development of special index data structures. However, none of these efforts have explored primary (clustered) indexes for uncertain databases, despite the fact that clustering has the potential to offer substantial speedups for non-selective analytic queries on large uncertain databases. In this paper, we propose a new index called a UPI (Uncertain Primary Index) that clusters heap files according to uncertain attributes with both discrete and continuous uncertainty distributions. Because uncertain attributes may have several possible values, a UPI on an uncertain attribute duplicates tuple data once for each possible value. To prevent the size of the UPI from becoming unmanageable, its size is kept small by placing low-probability tuples in a special Cutoff Index that is consulted only when queries for low-probability values are run. We also propose several other optimizations, including techniques to improve secondary index performance and techniques to reduce maintenance costs and fragmentation by buffering changes to the table and writing updates in sequential batches. Finally, we develop cost models for UPIs to estimate query performance in various settings to help automatically select tuning parameters of a UPI. We have implemented a prototype UPI and experimented on two real datasets. Our results show that UPIs can significantly (up to two orders of magnitude) improve the performance of uncertain queries both over clustered and unclustered attributes. We also show that our buffering techniques mitigate table fragmentation and keep the maintenance cost as low as or even lower than using an unclustered heap file.Uncertain data management has received growing attention from industry and academia. Many efforts have been made to optimize uncertain databases, including the development of special index data structures. However, none of these efforts have explored primary (clustered) indexes for uncertain databases, despite the fact that clustering has the potential to offer substantial speedups for non-selective analytic queries on large uncertain databases. In this paper, we propose a new index called a UPI (Uncertain Primary Index) that clusters heap files according to uncertain attributes with both discrete and continuous uncertainty distributions. Because uncertain attributes may have several possible values, a UPI on an uncertain attribute duplicates tuple data once for each possible value. To prevent the size of the UPI from becoming unmanageable, its size is kept small by placing low-probability tuples in a special Cutoff Index that is consulted only when queries for low-probability values are run. We also propose several other optimizations, including techniques to improve secondary index performance and techniques to reduce maintenance costs and fragmentation by buffering changes to the table and writing updates in sequential batches. Finally, we develop cost models for UPIs to estimate query performance in various settings to help automatically select tuning parameters of a UPI. We have implemented a prototype UPI and experimented on two real datasets. Our results show that UPIs can significantly (up to two orders of magnitude) improve the performance of uncertain queries both over clustered and unclustered attributes. We also show that our buffering techniques mitigate table fragmentation and keep the maintenance cost as low as or even lower than using an unclustered heap file.

very large data bases | 2016

Mostly-optimistic concurrency control for highly contended dynamic workloads on a thousand cores

Tianzheng Wang; Hideaki Kimura

Future servers will be equipped with thousands of CPU cores and deep memory hierarchies. Traditional concurrency control (CC) schemes---both optimistic and pessimistic---slow down orders of magnitude in such environments for highly contended workloads. Optimistic CC (OCC) scales the best for workloads with few conflicts, but suffers from clobbered reads for high conflict workloads. Although pessimistic locking can protect reads, it floods cache-coherence backbones in deep memory hierarchies and can also cause numerous deadlock aborts. This paper proposes a new CC scheme, mostly-optimistic concurrency control (MOCC), to address these problems. MOCC achieves orders of magnitude higher performance for dynamic workloads on modern servers. The key objective of MOCC is to avoid clobbered reads for high conflict workloads, without any centralized mechanisms or heavyweight interthread communication. To satisfy such needs, we devise a native, cancellable reader-writer spinlock and a serializable protocol that can acquire, release and re-acquire locks in any order without expensive interthread communication. For low conflict workloads, MOCC maintains OCCs high performance without taking read locks. Our experiments with high conflict YCSB workloads on a 288-core server reveal that MOCC performs 8× and 23× faster than OCC and pessimistic locking, respectively. It achieves 17 million TPS for TPC-C and more than 110 million TPS for YCSB without conflicts, 170× faster than pessimistic methods.

extending database technology | 2012

Optimizing index deployment order for evolving OLAP

Hideaki Kimura; Carleton Coffrin; Alexander Rasin; Stanley B. Zdonik

Many database applications deploy hundreds or thousands of indexes to speed up query execution. Despite a plethora of prior work on index selection, designing and deploying indexes remains a difficult task for database administrators. First, real-world businesses often require online index deployment, and the traditional off-line approach to index selection ignores intermediate workload performance during index deployment. Second, recent work on on-line index selection does not address effects of complex interactions that manifest during index deployment. In this paper, we propose a new approach that incorporates transitional design performance into the overall problem of physical database design. We call our approach Incremental Database Design. As the first step in this direction, we study the problem of ordering index deployment. The benefits of a good index deployment order are twofold: (1) a prompt query runtime improvement and (2) a reduced total time to deploy the design. Finding an effective deployment order is difficult due to complex index interaction and a factorial number of possible solutions. We formulate a mathematical model to represent the index ordering problem and demonstrate that Constraint Programming (CP) is a more efficient solution compared to other methods such as mixed integer programming and A * search. In addition to exact search techniques, we also study local search algorithms that make significant improvements over a greedy solution with minimal computational overhead. Our empirical analysis using the TPC-H dataset shows that our pruning techniques can reduce the size of the search space by many orders of magnitude. Using the TPC-DS dataset, we verify that our local search algorithm is a highly scalable and stable method for quickly finding the best known solutions.

Explore More