Cesar A. Galindo-Legaria

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Cesar A. Galindo-Legaria is active.

Explore More

Publication

Featured researches published by Cesar A. Galindo-Legaria.

very large data bases | 2004

PIVOT and UNPIVOT: optimization and execution strategies in an RDBMS

Conor Cunningham; Cesar A. Galindo-Legaria; Goetz Graefe

PIVOT and UNPIVOT, two operators on tabular data that exchange rows and columns, enable data transformations useful in data modeling, data analysis, and data presentation. They can quite easily be implemented inside a query processor, much like select, project, and join. Such a design provides opportunities for better performance, both during query optimization and query execution. We discuss query optimization and execution implications of this integrated design and evaluate the performance of this approach using a prototype implementation in Microsoft SQL Server.

international conference on management of data | 1994

Outerjoins as disjunctions

Cesar A. Galindo-Legaria

The outerjoin operator is currently available in the query language of several major DBMSs, and it is included in the proposed SQL2 standard draft. However, “associativity problems” of the operator have been pointed out since its introduction. In this paper we propose a shift in the intuition behind outerjoin: Instead of computing the join while also preserving its arguments, outerjoin delivers tuples that come either from the join or from the arguments. Queries with joins and outerjoins deliver tuples that come from one out of several joins, where a single relation is a trivial join. An advantage of this view is that, in contrast to preservation, disjunction is commutative and associative, which is a significant property for intuition, formalisms, and generation of execution plans. Based on a disjunctive normal form, we show that some data merging queries cannot be evaluated by means of binary outerjoins, and give alternative procedures to evaluate those queries. We also explore several evaluation strategies for outerjoin queries, including the use of semijoin programs to reduce base relations.

international conference on management of data | 2001

Orthogonal optimization of subqueries and aggregation

Cesar A. Galindo-Legaria; Milind M. Joshi

There is considerable overlap between strategies proposed for subquery evaluation, and those for grouping and aggregation. In this paper we show how a number of small, independent primitives generate a rich set of efficient execution strategies —covering standard proposals for subquery evaluation suggested in earlier literature. These small primitives fall into two main, orthogonal areas: Correlation removal, and efficient processing of outerjoins and GroupBy. An optimization approach based on these pieces provides syntax-independence of query processing with respect to subqueries, i. e. equivalent queries written with or without subquery produce the same efficient plan. We describe techniques implemented in Microsoft SQL Server (releases 7.0 and 8.0) for queries containing sub-queries and/or aggregations, based on a number of orthogonal optimizations. We concentrate separately on removing correlated subqueries, also called “query flattening,” and on efficient execution of queries with aggregations. The end result is a modular, flexible implementation, which produces very efficient execution plans. To demonstrate the validity of our approach, we present results for some queries from the TPC-H benchmark. From all published TPC-H results in the 300GB scale, at the time of writing (November 2000), SQL Server has the fastest results on those queries, even on a fraction of the processors used by other systems.

international conference on management of data | 2007

Execution strategies for SQL subqueries

Mostafa Elhemali; Cesar A. Galindo-Legaria; Torsten Grabs; Milind M. Joshi

Optimizing SQL subqueries has been an active area in database research and the database industry throughout the last decades. Previous work has already identified some approaches to efficiently execute relational subqueries. For satisfactory performance, proper choice of subquery execution strategies becomes even more essential today with the increase in decision support systems and automatically generated SQL, e.g., with ad-hoc reporting tools. This goes hand in hand with increasing query complexity and growing data volumes, which all pose challenges for an industrial-strength query optimizer. This current paper explores the basic building blocks that Microsoft SQL Server utilizes to optimize and execute relational subqueries. We start with indispensable prerequisites such as detection and removal of correlations for subqueries. We identify a full spectrum of fundamental subquery execution strategies such as forward and reverse lookup as well as set-based approaches, explain the different execution strategies for subqueries implemented in SQL Server, and relate them to the current state of the art. To the best of our knowledge, several strategies discussed in this paper have not been published before. An experimental evaluation complements the paper. It quantifies the performance characteristics of the different approaches and shows that indeed alternative execution strategies are needed in different circumstances, which make a cost-based query optimizer indispensable for adequate query performance.

international conference on management of data | 2000

Counting, enumerating, and sampling of execution plans in a cost-based query optimizer

Florian Waas; Cesar A. Galindo-Legaria

Testing an SQL database system by running large sets of deterministic or stochastic SQL statements is common practice in commercial database development. However, code defects often remain undetected as the query optimizers choice of an execution plan is not only depending on the query but strongly influenced by a large number of parameters describing the database and the hardware environment. Modifying these parameters in order to steer the optimizer to select other plans is difficult since this means anticipating often complex search strategies implemented in the optimizer. In this paper we devise algorithms for counting, exhaustive generation, and uniform sampling of plans from the complete search space. Our techniques allow extensive validation of both generation of alternatives, and execution algorithms with plans other than the optimized one—if two candidate plans fail to produce the same results, then either the optimizer considered an invalid plan, or the execution code is faulty. When the space of alternatives becomes too large for exhaustive testing, which can occur even with a handful of joins, uniform random sampling provides a mechanism for unbiased testing. The technique is implemented in Microsofts SQL Server, where it is an integral part of the validation and testing process.

very large data bases | 2003

Statistics on views

Cesar A. Galindo-Legaria; Milind M. Joshi; Florian Waas; Ming-Chuan Wu

The quality of execution plans generated by a query optimizer is tied to the accuracy of its cardinality estimation. Errors in estimation lead to poor performance, erratic behavior, and user frustration. Traditionally, the optimizer is restricted to use only statistics on base table columns and derive estimates bottom-up. This approach has shortcomings with dealing with complex queries, and with rich languages such as SQL: Errors grow as estimation is done on top of estimation, and some constructs are simply not handled. In this paper we describe the creation and utilization of statistics on views in SQL Server, which provides the optimizer with statistical information on the result of scalar or relational expressions. It opens a new dimension on the data available for cardinality estimation and enables arbitrary correction. We describe the implementation of this feature in the optimizer architecture, and show its impact on the quality of plans generated through a number of examples.

international conference on management of data | 2012

Query optimization in microsoft SQL server PDW

Srinath Shankar; Rimma V. Nehme; Josep Aguilar-Saborit; Andrew Chung; Mostafa Elhemali; Alan Halverson; Eric R. Robinson; Mahadevan Sankara Subramanian; David J. DeWitt; Cesar A. Galindo-Legaria

In recent years, Massively Parallel Processors have increasingly been used to manage and query vast amounts of data. Dramatic performance improvements are achieved through distributed execution of queries across many nodes. Query optimization for such system is a challenging and important problem. In this paper we describe the Query Optimizer inside the SQL Server Parallel Data Warehouse product (PDW QO). We leverage existing QO technology in Microsoft SQL Server to implement a cost-based optimizer for distributed query execution. By properly abstracting metadata we can readily reuse existing logic for query simplification, space exploration and cardinality estimation. Unlike earlier approaches that simply parallelize the best serial plan, our optimizer considers a rich space of execution alternatives, and picks one based on a cost-model for the distributed execution environment. The result is a high-quality, effective query optimizer for distributed query processing in an MPP.

international conference on data engineering | 2010

Polynomial heuristics for query optimization

Nicolas Bruno; Cesar A. Galindo-Legaria; Milind M. Joshi

Research on query optimization has traditionally focused on exhaustive enumeration of an exponential number of candidate plans. Alternatively, heuristics for query optimization are restricted in several ways, such as by either focusing on join predicates only, ignoring the availability of indexes, or in general having high-degree polynomial complexity. In this paper we propose a heuristic approach to very efficiently obtain execution plans for complex queries, which takes into account the presence of indexes and goes beyond simple join reordering. We also introduce a realistic workload generator and validate our approach using both synthetic and real data.

very large data bases | 2008

Relational support for flexible schema scenarios

Srini Acharya; Peter Carlin; Cesar A. Galindo-Legaria; Krzysztof Kozielczyk; Pawel Terlecki; Peter Zabback

Efficient support for applications that deal with data heterogeneity, hierarchies and schema evolution is an important challenge for relational engines. In this paper we show how this flexibility can be handled in Microsoft SQL Server. For this purpose, the engine has been equipped in an integrated package of relational extensions. The package includes sparse storage, column set operations, filtered indices, filtered statistics and hierarchy querying with OrdPath labeling. In addition, economical loading of metadata allow us to answer queries independently of the number of columns in a table and drastically improve scaling capabilities. The design of a prototypical content and collaboration application based on a wide table is described, along with experiments validating its performance.

international conference on data engineering | 2008

Optimizing Star Join Queries for Data Warehousing in Microsoft SQL Server

Cesar A. Galindo-Legaria; Torsten Grabs; Sreenivas Gukal; Steve Herbert; Aleksandras Surna; Shirley Wang; Wei Yu; Peter Zabback; Shin Zhang

As mainstream data warehouses are growing into the multi-terabyte range, adequate performance for decision support queries remains challenging for database query processors. Proper choice of query plan is essential in data warehouses where fact tables often store billions of rows. This paper discusses query optimization and execution strategies that Microsoft SQL Server employs for decision support queries in dimensionally modeled relational data warehouses. Our approach is based on pattern matching to detect typical star query patterns. When matching the pattern, the optimizer generates additional query plan alternatives specifically optimized for data warehouse performance. For high selectivity queries, the plans use nested loops joins and seeks. Medium selectivity queries in turn rely on right-deep hash joins with bitmap filters. Bitmap filters perform semi-join reductions to efficiently prune out non-qualifying rows early. Final plan choice is left for cost-based optimization which also compares the data warehouse specific plans against conventional query plans. We conducted an extensive experimental investigation using both synthetic workloads and several customer workloads. As our results show, the new plan shapes and execution strategies yield significant performance improvements across the targeted workloads as compared to earlier versions of Microsoft SQL Server.

Explore More