Harumi A. Kuno | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Harumi A. Kuno is active.

Explore More

Publication

Featured researches published by Harumi A. Kuno.

international conference on data engineering | 2009

Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning

Archana Ganapathi; Harumi A. Kuno; Umeshwar Dayal; Janet L. Wiener; Armando Fox; Michael I. Jordan; David A. Patterson

One of the most challenging aspects of managing a very large data warehouse is identifying how queries will behave before they start executing. Yet knowing their performance characteristics --- their runtimes and resource usage --- can solve two important problems. First, every database vendor struggles with managing unexpectedly long-running queries. When these long-running queries can be identified before they start, they can be rejected or scheduled when they will not cause extreme resource contention for the other queries in the system. Second, deciding whether a system can complete a given workload in a given time period (or a bigger system is necessary) depends on knowing the resource requirements of the queries in that workload. We have developed a system that uses machine learning to accurately predict the performance metrics of database queries whose execution times range from milliseconds to hours. For training and testing our system, we used both real customer queries and queries generated from an extended set of TPC-DS templates. The extensions mimic queries that caused customer problems. We used these queries to compare how accurately different techniques predict metrics such as elapsed time, records used, disk I/Os, and message bytes. The most promising technique was not only the most accurate, but also predicted these metrics simultaneously and using only information available prior to query execution. We validated the accuracy of this machine learning technique on a number of HP Neoview configurations. We were able to predict individual query elapsed time within 20% of its actual time for 85% of the test queries. Most importantly, we were able to correctly identify both the short and long-running (up to two hour) queries to inform workload management and capacity planning.

web information and data management | 2001

Automating the transformation of XML documents

Hong Su; Harumi A. Kuno; Elke A. Rundensteiner

The advent of web services that use XML-based message exchanges has spurred many efforts that address issues related to inter-enterprise service electronic commerce interactions. Currently emerging standards and technologies enable enterprises to describe and advertise their own Web Services and to discover and determine how to interact with services fronted by other businesses. However, these technologies do not address the problem of how to reconcile structural differences between similar types of documents supported by different enterprises. Transformations between such documents must thus be created manually on a case-by-case basis. In this paper, we explore the problem of how to automate the transformation of XML E-business documents. We develop an integrated solution that automates as much as possible all steps of the document transformation process. One, we propose a set of schema transformation operations that establish semantic relationships between two XML document schemas. Two, we define a model that allows us to compare the cost of performing these operations. Three, we introduce an algorithm that discovers an efficient sequence of operations for transforming a source document schema into a target document schema based on our cost model. The operation sequence then is used to generate an equivalent XSLT transformation script. Experimental results indicate that our algorithm can satisfactorily discover acceptable transformations.

international workshop on testing database systems | 2011

The mixed workload CH-benCHmark

Richard L. Cole; Florian Funke; Leo Giakoumakis; Wey Guy; Alfons Kemper; Stefan Krompass; Harumi A. Kuno; Raghunath Nambiar; Thomas Neumann; Meikel Poess; Kai-Uwe Sattler; Michael Seibold; Eric Simon; Florian Waas

While standardized and widely used benchmarks address either operational or real-time Business Intelligence (BI) workloads, the lack of a hybrid benchmark led us to the definition of a new, complex, mixed workload benchmark, called mixed workload CH-benCHmark. This benchmark bridges the gap between the established single-workload suites of TPC-C for OLTP and TPC-H for OLAP, and executes a complex mixed workload: a transactional workload based on the order entry processing of TPC-C and a corresponding TPC-H-equivalent OLAP query suite run in parallel on the same tables in a single database system. As it is derived from these two most widely used TPC benchmarks, the CH-benCHmark produces results highly relevant to both hybrid and classic single-workload systems.

ACM Transactions on Database Systems | 2012

Foster b-trees

Goetz Graefe; Hideaki Kimura; Harumi A. Kuno

Foster B-trees are a new variant of B-trees that combines advantages of prior B-tree variants optimized for many-core processors and modern memory hierarchies with flash storage and nonvolatile memory. Specific goals include: (i) minimal concurrency control requirements for the data structure, (ii) efficient migration of nodes to new storage locations, and (iii) support for continuous and comprehensive self-testing. Like Blink-trees, Foster B-trees optimize latching without imposing restrictions or specific designs on transactional locking, for example, key range locking. Like write-optimized B-trees, and unlike Blink-trees, Foster B-trees enable large writes on RAID and flash devices as well as wear leveling and efficient defragmentation. Finally, they support continuous and inexpensive yet comprehensive verification of all invariants, including all cross-node invariants of the B-tree structure. An implementation and a performance evaluation show that the Foster B-tree supports high concurrency and high update rates without compromising consistency, correctness, or read performance.

International Journal of Parallel Programming | 2013

Analytical Performance Models for MapReduce Workloads

Emanuel Vianna; Giovanni Comarela; Tatiana Pontes; Jussara M. Almeida; Virgílio A. F. Almeida; Kevin Wilkinson; Harumi A. Kuno; Umeshwar Dayal

MapReduce is a currently popular programming model to support parallel computations on large datasets. Among the several existing MapReduce implementations, Hadoop has attracted a lot of attention from both industry and research. In a Hadoop job, map and reduce tasks coordinate to produce a solution to the input problem, exhibiting precedence constraints and synchronization delays that are characteristic of a pipeline communication between maps (producers) and reduces (consumers). We here address the challenge of designing analytical models to estimate the performance of MapReduce workloads, notably Hadoop workloads, focusing particularly on the intra-job pipeline parallelism between map and reduce tasks belonging to the same job. We propose a hierarchical model that combines a precedence graph model and a queuing network model to capture the intra-job synchronization constraints. We first show how to build a precedence graph that represents the dependencies among multiple tasks of the same job. We then apply it jointly with an approximate Mean Value Analysis (aMVA) solution to predict mean job response time, throughput and resource utilization. We validate our solution against a queuing network simulator and a real setup in various scenarios, finding very close agreement in both cases. In particular, our model produces estimates of average job response time that deviate from measurements of a real setup by less than 15 %.

international conference on data engineering | 2010

Adaptive indexing for relational keys

Goetz Graefe; Harumi A. Kuno

Adaptive indexing schemes such as database cracking and adaptive merging have been investigated to-date only in the context of range queries. These are typical for non-key columns in relational databases. For complete self-managing indexing, adaptive indexing must also apply to key columns. The present paper proposes a design and offers a first performance evaluation in the context of keys. Adaptive merging for keys also enables further improvements in B-tree indexes. First, partitions can be matched to levels in the memory hierarchy such as a CPU cache and an in-memory buffer pool. Second, adaptive merging in merged B-trees enables automatic master-detail clustering.

international workshop on research issues in data engineering | 1996

Augmented inherited multi-index structure for maintenance of materialized path query views

Harumi A. Kuno; Elke A. Rundensteiner

Materialized complex object-oriented views are a promising technique for the integration of heterogeneous databases and the development of powerful data warehousing systems. Path query views are virtual classes formed from selection queries that specify a predicate upon the value of an aggregation hierarchy path. The primary difference between previous work regarding OODB indexing and the efficient implementation of materialized path query views addressed in this paper lies in the nature of their usage. For OODB indexing, query usage is the primary purpose of index structures. Because the materialized view data itself can be used to answer queries, the primary use of index structures with regard to materialized path query views is for the incremental maintenance of views in the face of updates. We have developed an augmented inherited multi-index (AIM) strategy that is specifically tailored for the maintenance of materialized path query views. We find that we can improve update performance by augmenting traditional inherited multi-indices with structured representations of the path queries that use them. This enables us to use class hierarchy relationships to prune the number of aggregation paths that must be re-instantiated during update propagation and also to support complex path queries that include cycles.

international workshop on research issues in data engineering | 1995

Materialized object-oriented views in MultiView

Harumi A. Kuno; Elke A. Rundensteiner

Object-oriented view mechanisms have received much attention in the literature in recent years, since they provide powerful mechanisms for addressing tasks such as customized tool interfacing to object-oriented databases (OODBs) and interoperability of heterogeneous databases. However, little progress has been made thus far on addressing the topic of view materialization in object-oriented databases. In the context of the MultiView project, we have developed an object model and an accompanying set of algorithms for the support of updatable materialized views in OODBs. We take advantage of unique features of the MultiView model, including its support for object-preserving queries, the integration of base and virtual classes into a unified and consistent global class hierarchy, and an object-slicing approach. In this paper, we present the MultiView model of materialized views, supporting updates on both base and virtual classes. We also describe a set of efficient algorithms for incremental view maintenance.<<ETX>>

very large data bases | 2012

Concurrency control for adaptive indexing

Goetz Graefe; Felix Halim; Stratos Idreos; Harumi A. Kuno; Stefan Manegold

Adaptive indexing initializes and optimizes indexes incrementally, as a side effect of query processing. The goal is to achieve the benefits of indexes while hiding or minimizing the costs of index creation. However, index-optimizing side effects seem to turn read-only queries into update transactions that might, for example, create lock contention. This paper studies concurrency control in the context of adaptive indexing. We show that the design and implementation of adaptive indexing rigorously separates index structures from index contents; this relaxes the constraints and requirements during adaptive indexing compared to those of traditional index updates. Our design adapts to the fact that an adaptive index is refined continuously, and exploits any concurrency opportunities in a dynamic way. A detailed experimental analysis demonstrates that (a) adaptive indexing maintains its adaptive properties even when running concurrent queries, (b) adaptive indexing can exploit the opportunity for parallelism due to concurrent queries, (c) the number of concurrency conflicts and any concurrency administration overheads follow an adaptive behavior, decreasing as the workload evolves and adapting to the workload needs.

Lecture Notes in Computer Science | 2001

Conversations + Interfaces = Business Logic

Harumi A. Kuno; Mike Lemon; Alan H. Karp; Dorothea Beringer

In the traditional application model, services are tightly coupled with the processes they support. For example, whenever a servers process changes, existing clients using that process must also be updated. However, electronic commerce is moving toward e-service based interactions, where corporate enterprises use e-services to interact with each other dynamically, and a service in one enterprise could spontaneously decide to engage a service fronted by another enterprise. We clarify here the relationship between currently developing standards such as UDDI, WSDL, and WSCL, and propose a conversation controller mechanism that leverages such standards to direct services in their conversations. We can thus treat services as pools of methods, independent of the conversations they support. Even method names can be decided on independently of the conversations. Services can spontaneously discover each other and then engage in complicated interactions without the services themselves having to explicitly support conversational logic. The dynamism and flexibility enabled by this decoupling is the essential difference between applications offered over the web and e-services.

Explore More