Calisto Zuzarte | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Calisto Zuzarte is active.

Explore More

Publication

Featured researches published by Calisto Zuzarte.

international conference on management of data | 2008

Generating targeted queries for database testing

Chaitanya Mishra; Nick Koudas; Calisto Zuzarte

Tools for generating test queries for databases do not explicitly take into account the actual data in the database. As a consequence, such tools cannot guarantee suitable coverage of test cases commonly required for database testing. In this paper, we investigate the problem of generating queries that satisfy cardinality constraints on intermediate subexpressions when executed on a given test database. Such queries are required to test the performance of a database system under different operating conditions. We formally analyze this problem, quantify its difficulty and follow up this analysis with a description of a practical algorithm which utilizes sampling and space pruning techniques to quickly generate test queries that have desired properties. We present the results of an experimental evaluation of our approach as implemented in an open source data manager, demonstrating the utility of our proposal.

international conference on management of data | 2001

Exploiting constraint-like data characterizations in query optimization

Parke Godfrey; Jarek Gryz; Calisto Zuzarte

Query optimizers nowadays draw upon many sources of information about the database to optimize queries. They employ runtime statistics in cost-based estimation of query plans. They employ integrity constraints in the query rewrite process. Primary and foreign key constraints have long played a role in the optimizer, both for rewrite opportunities and for providing more accurate cost predictions. More recently, other types of integrity constraints are being exploited by optimizers in commercial systems, for which certain semantic query optimization techniques have now been implemented. These new optimization strategies that exploit constraints hold the promise for good improvement. Their weakness, however, is that often the “constraints” that would be useful for optimization for a given database and workload are not explicitly available for the optimizer. Data mining tools can find such “constraints” that are true of the database, but then there is the question of how this information can be kept by the database system, and how to make this information available to, and effectively usable by, the optimizer. We present our work on soft constraints in DB2. A soft constraint is a syntactic statement equivalent to an integrity constraint declaration. A soft constraint is not really a constraint, per se, since future updates may undermine it. While a soft constraint is valid, however, it can be used by the optimizer in the same way integrity constraints are. We present two forms of soft constraint: absolute and statistical. An absolute soft constraint is consistent with respect to the current state of the database, just in the same way an integrity constraint must be. They can be used in rewrite, as well as in cost estimation. A statistical soft constraint differs in that it may have some degree of violation with respect to the state of the database. Thus, statistical soft constraints cannot be used in rewrite, but they can still be used in cost estimation. We are working long-term on absolute soft constraints. We discuss the issues involved in implementing a facility for absolute soft constraints in a database system (and in DB2), and the strategies that we are researching. The current DB2 optimizer is more amenable to adding facilities for statistical soft constraints. In the short-term, we have been implementing pathways in the optimizer for statistical soft constraints. We discuss this implementation.

international conference on data engineering | 2001

Discovery and application of check constraints in DB2

Jarek Gryz; K. Bernhard Schiefer; Jian Zheng; Calisto Zuzarte

The traditional role of integrity constraints is to protect the integrity of data, but integrity constraints can and do play other roles in databases; for example, they can be used for query optimization. In this role, they do not need to model the domain; it is sufficient that they describe regularities that are true about the data currently stored in a database. In this paper, we describe two algorithms for finding such regularities (in the syntactic form of check constraints) and discuss some of their applications in DB2. In particular, we show their use in query optimization.

international conference on data engineering | 2006

Load Balancing for Multi-tiered Database Systems through Autonomic Placement of Materialized Views

Wen-Syan Li; Daniel C. Zilio; Vishal S. Batra; Mahadevan Subramanian; Calisto Zuzarte; Inderpal Narang

A materialized view or Materialized Query Table (MQT) is an auxiliary table with precomputed data that can be used to significantly improve the performance of a database query. AMaterialized Query Table Advisor (MQTA) is often used to recommend and create MQTs. The state-of-the-art MQTA works in a standalone database server where MQTs are placed on the same server as that in which the base tables are located. The MQTA does not apply to a federated or scaleout scenario in which MQTs need to be placed on other servers close to applications (i.e. a frontend database server) for offloading the workload on the backend database server. In this paper, we propose a Data Placement Advisor (DPA) and load balancing strategies for multi-tiered database systems. Built on top of the MQTA, DPA recommends MQTs and advises placement strategies for minimizing the response time for a query workload. To demonstrate the benefit of the data placement advising, we implemented a prototype of DPA that works with theMQTA in the IBM® DB2® Universal Database^TM (DB2 UDB) and the IBM WebSphere® Information Integrator (WebSphere II). The evaluation results showed substantial improvements of workload response times when MQTs are intelligently recommended and placed at a frontend database server subject to space and load characteristics for TPC-H and OLAP type workloads.

international conference on data engineering | 2007

Collecting and Maintaining Just-in-Time Statistics

Amr El-Helw; Ihab F. Ilyas; Wing Lau; Volker Markl; Calisto Zuzarte

Traditional DBMSs decouple statistics collection and query optimization both in space and time. Decoupling in time may lead to outdated statistics. Decoupling in space may cause statistics not to be available at the desired granularity needed to optimize a particular query, or some important statistics may not be available at all. Overall, this decoupling often leads to large cardinality estimation errors and, in consequence, to the selection of suboptimal plans for query execution. In this paper, we present JITS, a system for proactively collecting query-specific statistics during query compilation. The system employs a lightweight sensitivity analysis to choose which statistics to collect by making use of previously collected statistics and database activity patterns. The collected statistics are materialized and incrementally updated for later reuse. We present the basic concepts, architecture, and key features of JITS. We demonstrate its benefits through an extensive experimental study on a prototype inside the IBM DB2 engine.

very large data bases | 2013

Expressiveness and complexity of order dependencies

Jaroslaw Szlichta; Parke Godfrey; Jarek Gryz; Calisto Zuzarte

Dependencies play an important role in databases. We study order dependencies (ODs)--and unidirectional order dependencies (UODs), a proper sub-class of ODs--which describe the relationships among lexicographical orderings of sets of tuples. We consider lexicographical ordering, as by the order-by operator in SQL, because this is the notion of order used in SQL and within query optimization. Our main goal is to investigate the inference problem for ODs, both in theory and in practice. We show the usefulness of ODs in query optimization. We establish the following theoretical results: (i) a hierarchy of order dependency classes; (ii) a proof of co-NP-completeness of the inference problem for the subclass of UODs (and ODs); (iii) a proof of co-NP-completeness of the inference problem of functional dependencies (FDs) from ODs in general, but demonstrate linear time complexity for the inference of FDs from UODs; (iv) a sound and complete elimination procedure for inference over ODs; and (v) a sound and complete polynomial inference algorithm for sets of UODs over restricted domains.

extending database technology | 2011

Queries on dates: fast yet not blind

Jaroslaw Szlichta; Parke Godfrey; Jarek Gryz; Wenbin Ma; Przemyslaw Pawluk; Calisto Zuzarte

Data warehouses are repositories of electronically stored data which are designed to support reporting and analysis. The analysis of historical data often involves aggregation over time. Thus, time is critical in the design of a data warehouse. We describe novel techniques for storing date information and optimization of queries that reference the date dimension. We show how to embed intelligence into the date key and how to exploit monotonic dependencies. We present the value of these techniques for the improvement of performance when combined with partitioning and indexes. We evaluate these techniques on our prototype implemented in IBM® DB2® V9.7 over the current draft version of the TPC-DS benchmark.

conference on information and knowledge management | 2005

Towards estimating the number of distinct value combinations for a set of attributes

Xiaohui Yu; Calisto Zuzarte; Kenneth C. Sevcik

Accurately and efficiently estimating the number of distinct values for some attribute(s) or sets of attributes in a data set is of critical importance to many database operations, such as query optimization and approximation query answering. Previous work has focused on the estimation of the number of distinct values for a single attribute and most existing work adopts a data sampling approach. This paper addresses the equally important issue of estimating the number of distinct value combinations for multiple attributes which we call COLSCARD (for COLumn Set CARDinality). It also takes a different approach that uses existing statistical information (e.g., histograms) available on the individual attributes to assist estimation. We start with cases where exact frequency information on individual attributes is available, and present a pair of lower and upper bounds on COLSCARD that are consistent with the available information, as well as an estimator of COLSCARD based on probability. We then proceed to study the case where only partial information (in the form of histograms) is available on individual attributes, and show how the proposed estimator can be adapted to this case. We consider two types of widely used histograms and show how they can be constructed in order to obtain optimal approximation. An experimental evaluation of the proposed estimation method on synthetic as well as two real data sets is provided.

Knowledge and Information Systems | 2005

Optimizing complex queries based on similarities of subqueries

Qiang Zhu; Yingying Tao; Calisto Zuzarte

As database technology is applied to more and more application domains, user queries are becoming increasingly complex (e.g. involving a large number of joins and a complex query structure). Query optimizers in existing database management systems (DBMS) were not developed for efficiently processing such queries and often suffer from problems such as intolerably long optimization time and poor optimization results. To tackle this challenge, we present a new similarity-based approach to optimizing complex queries in this paper. The key idea is to identify similar subqueries that often appear in a complex query and share the optimization result among similar subqueries in the query. Different levels of similarity for subqueries are introduced. Efficient algorithms to identify similar queries in a given query and optimize the query based on similarity are presented. Related issues, such as choosing good starting nodes in a query graph, evaluating identified similar subqueries and analyzing algorithm complexities, are discussed. Our experimental results demonstrate that the proposed similarity-based approach is quite promising in optimizing complex queries with similar subqueries in a DBMS.

data warehousing and olap | 2006

Computing closest common subexpressions for view selection problems

Wugang Xu; Dimitri Theodoratos; Calisto Zuzarte

Selecting a set of views for materialization is a required task in many current database and data warehousing applications including the design of a data warehouse, and the maintenance of multiple materialized views. The selected views can be materialized permanently or transiently depending on the specific view selection problem. The view selection algorithms are expensive due to the size of the search space of the problem.In this paper we propose an approach for generating candidate views for materialization for view selection problems based on the definition of the input queries. We also provide rewritings of the input queries using the generated candidate views. In generating candidate views, we do not apply costbased techniques but we try to maximize the operations in the views. Subsequently, view selection algorithms can exploit problem dependent cost functions to choose among the generated candidate views. Our approach is not restricted to a specific view selection problem. Compared to a previous one, it generates views that involve more relation occurrences (or operations) and can reduce the size of the search space which can be very large. We implement our approach and we report some experimental evaluation with comparison to previous works.

Explore More