Detecting Optimization Bugs in Database Engines via Non-Optimizing Reference Engine Construction
DDetecting Optimization Bugs in Database Engines viaNon-Optimizing Reference Engine Construction
Manuel Rigger [email protected] of Computer Science, ETH ZurichZurich, Switzerland
Zhendong Su [email protected] of Computer Science, ETH ZurichZurich, Switzerland
ABSTRACT
Database Management Systems (DBMS) are used ubiquitously. Toefficiently access data, they apply sophisticated optimizations. In-correct optimizations can result in logic bugs , which cause a queryto compute an incorrect result set. We propose
Non-OptimizingReference Engine Construction (NoREC), a fully-automatic approachto detect optimization bugs in DBMS. Conceptually, this approachaims to evaluate a query by an optimizing and a non-optimizingversion of a DBMS, to then detect differences in their returned resultset, which would indicate a bug in the DBMS. Obtaining a non-optimizing version of a DBMS is challenging, because DBMS typi-cally provide limited control over optimizations. Our core insight isthat a given, potentially randomly-generated optimized query canbe rewritten to one that the DBMS cannot optimize. Evaluating thisunoptimized query effectively corresponds to a non-optimizing ref-erence engine executing the original query. We evaluated NoREC inan extensive testing campaign on four widely-used DBMS, namelyPostgreSQL, MariaDB, SQLite, and CockroachDB. We found 159previously unknown bugs in the latest versions of these systems,141 of which have been fixed by the developers. Of these, 51 wereoptimization bugs, while the remaining were error and crash bugs.Our results suggest that NoREC is effective, general and requireslittle implementation effort, which makes the technique widelyapplicable in practice.
CCS CONCEPTS • Information systems → Database query processing ; •
Soft-ware and its engineering → Software testing and debugging . KEYWORDS database testing, DBMS testing, query optimizer bugs, test oracle
ACM Reference Format:
Manuel Rigger and Zhendong Su. 2020. Detecting Optimization Bugs inDatabase Engines via Non-Optimizing Reference Engine Construction. In
Proceedings of the 28th ACM Joint European Software Engineering Conferenceand Symposium on the Foundations of Software Engineering (ESEC/FSE ’20),November 8–13, 2020, Virtual Event, USA.
ACM, New York, NY, USA, 13 pages.https://doi.org/10.1145/3368089.3409710
Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].
ESEC/FSE ’20, November 8–13, 2020, Virtual Event, USA © 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM.ACM ISBN 978-1-4503-7043-1/20/11...$15.00https://doi.org/10.1145/3368089.3409710
Database Management Systems (DBMS) are an important compo-nent in many systems. To meet the growing performance demands,increasingly sophisticated optimizations for query evaluation areapplied [18, 35, 39, 60]. Unsurprisingly, the query optimizer is typi-cally considered to be a DBMS’ most complex component, posinga major correctness challenge [20, 21]. Implementation errors inthe optimizer can result in logic bugs , which are bugs that cause aDBMS to return an incorrect result set for a given query. Specifically,we refer to logic bugs in the query optimizer as optimization bugs .Pivoted Query Synthesis (PQS) was recently proposed as a way oftackling logic bugs in DBMS [46]. Its core idea is to verify the DBMSbased on a single pivot row , for which a query is generated that isexpected to fetch this row. While PQS has been effective in detect-ing many bugs in widely-used DBMS, a significant drawback is thehigh implementation effort that is required to realize this technique;specifically, the technique requires the re-implementation of theDBMS’ provided operators and functions to determine whether arandomly-generated expression evaluates to
TRUE . Since PQS con-siders only a single row, it also fails to detect bugs such as when aduplicate row is mistakenly fetched or omitted. Another successfultechnique for detecting logic bugs in DBMS was realized in a systemcalled RAGS [50]. It is based on differential testing [36]. A queryis generated that is sent to multiple DBMS; if the DBMS disagreeon the output, at least one of the DBMS is expected to be affectedby a bug. As noted by the authors, a significant drawback of thistechnique is that it applies only to the common core of SQL, whichis small, because DBMS differ in what operators and types theysupport and because even common operators have subtly differentsemantics between different DBMS [50].In this paper, we propose
Non-Optimizing Reference Engine Con-struction (NoREC) , a novel, general, and cost-effective techniquefor finding optimization bugs in DBMS. The high-level idea of ourapproach is to compare the results of an optimizing version of aDBMS against a version of the same DBMS that does not performany optimizations. Obtaining such a non-optimizing version of aDBMS is challenging. While many DBMS provide some optionsto control optimizations, these are limited and specific to a DBMS.Although adding such options would be a possibility, doing so retro-spectively would be error prone and impractical because of the highimplementation effort and domain knowledge required. Rather, wepropose the idea that a given query can be rewritten so that theDBMS is not expected to optimize it. Finding a translation mecha-nism that guarantees the same result as the original query, whilemaking optimizations inapplicable, is not obvious. Our key insightis that this can be achieved by transforming a query with a
WHERE clause, which is subject to extensive optimization by the DBMS a r X i v : . [ c s . S E ] J u l SEC/FSE ’20, November 8–13, 2020, Virtual Event, USA Manuel Rigger and Zhendong Su
Listing 1: Illustrative example where a bug in SQLite’s
LIKE optimization caused a record to mistakenly be omitted.
CREATE TABLE t0(c0
UNIQUE ); INSERT INTO t0 VALUES (-1);1
SELECT * FROM t0 WHERE t0.c0 GLOB '-*'; -- {}2
SELECT t0.c0 GLOB '-*'
FROM t0; -- {TRUE} and the basis for creating an efficient query plan, to a query thatevaluates the
WHERE clause’s predicate on every record of the table,which cannot be meaningfully optimized; the number of recordsfetched by the first query must be equal to the number of times the
WHERE predicate evaluates to
TRUE for the second query. A differentresult indicates a bug in the DBMS.Listing 1 illustrates the idea of our approach based on a bugthat we found in SQLite where an optimization caused a row tobe erroneously omitted from the result set. Starting from an initialdatabase that contains a single record, we generate query 1 witha random
WHERE condition t0.c0 GLOB '-*' . GLOB is a regularexpression operator, and '-*' a regular expression that shouldmatch a '-' , followed by any number of characters. Since
WHERE (and
JOIN ) clauses are performance-critical, they are subject tooptimization by the DBMS. In this example, SQLite applies the
LIKEoptimization [54] by using an index—which is an auxiliary datastructure used for efficient lookups and implicitly created based onthe
UNIQUE constraint—to do a range search, allowing the executionengine to skip irrelevant records. Unexpectedly, the optimizationcauses the DBMS to omit fetching the single record, even thoughit matches the specified regular expression. Next, we translate thefirst query to query 2 so that the DBMS is unlikely to optimizeit, namely by moving the
WHERE clause’s predicate directly nextto the
SELECT keyword, which causes the query to evaluate thepredicate on each record of the table. We expect that the numberof times the expression evaluates to
TRUE corresponds to the actualnumber of records fetched by the first query. However, in thisexample, the expression evaluates to
TRUE for the single recordin the database. The DBMS could only meaningfully apply theincorrect optimization to the first query, but not to the second. Wereported this bug to the SQLite developers, who quickly fixed it.We implemented NoREC in a tool called
SQLancer , which isavailable at https://github.com/sqlancer. To demonstrate the gen-erality of our approach, we evaluated NoREC on four widely-used,production-level DBMS, SQLite, MariaDB, PostgreSQL, and Cock-roachDB. As part of an extensive 5-month testing campaign, inwhich we sought to demonstrate the effectiveness of the approachand maximize its real-world impact, we found 159 previously-unknown bugs, many of which were serious, of which 141 weresubsequently fixed and 14 confirmed. These comprised 51 opti-mization bugs, 23 crash bugs, 27 assertion failures, and 58 errorbugs. Although SQLite has been extensively tested by PQS, NoRECfound more than 100 additional bugs in it, demonstrating NoREC’seffectiveness. The DBMS developers greatly appreciated our efforts.For example, the SQLite website describes our successful testingcampaign [52] and mentions the following: “
Rigger’s work is cur-rently unpublished. When it is released, it could be as influential as An artifact prepared for long-term archival is also available [45].
Zalewski’s invention of AFL and profile-guided fuzzing. ” We believethat the simplicity, effectiveness, and low implementation effort ofNoREC will result in its broad adoption. In summary, this papercontributes the following: • a new, effective testing technique for DBMS based on a noveltest oracle for detecting optimization bugs called NoREC; • an implementation of NoREC in a tool called SQLancer; • an extensive evaluation of NoREC, which uncovered more than150 new bugs in widely-used DBMS. Database management systems and SQL.
DBMS are based on a datamodel , which abstractly describes how data is organized. Mostwidely-used DBMS are based on the relational data model proposedby Codd [14]—according to the DB-Engines Ranking [17], seven ofthe ten most popular DBMS are based on it. In our work, we primar-ily aim to test such relational DBMS. Structured Query Language(SQL) [10], which is based on relational algebra [15], is the mostcommonly used language in relational DBMS to create databases,tables, insert rows, as well as manipulate and retrieve data. Ourapproach is not directly applicable to NoSQL DBMS, as they oftenprovide their own query languages or support only a SQL subset;however, it is applicable to the newer generation of NewSQL DBMS,which attempt to achieve the same scalability of NoSQL DBMS, butprovide SQL as a query language [41].
Automatic testing.
In this work, we focus on automatic testing,which is is an effective and practical way of finding bugs, althoughit cannot guarantee their absence [27]. Two components are crucialfor an automatic testing approach. First, an effective test case muststress significant portions of the system under test, to find bugs inthem. Second, a test oracle is required that detects whether a certaintest case executes as expected. While various database generators [4,7, 23, 26, 32, 38] and query generators [2, 9, 30, 37, 42, 49, 58] havebeen proposed to generate effective test cases, test oracles havereceived less attention. As part of this work, we propose an effective,cost-effective test oracle that allows detecting logic bugs in DBMS.
Optimizations in DBMS.
Decades of work have been devoted toquery optimization [19, 22]. Each DBMS typically provides a queryoptimizer that inspects a query, potentially simplifies it, and mapsit efficiently to physical accesses ( i.e. , by selecting one of poten-tially multiple available access paths [48]). Consider the two queriesin Listing 1. It is well understood that the primary performancegains of query optimizations stem from determining how the data-base records can be efficiently fetched. Consequently, the queryoptimizer would focus its optimization effort on simplifying and cre-ating an efficient query plan based on the
WHERE clause in query 1 .In query 2 , the predicate is evaluated once for every row in theresult set, and thus provides limited space for meaningful optimiza-tion. As detailed below, we utilize this observation to translate anoptimized query to one that is less optimized.
Differential testing.
Differential testing [36] refers to a testing tech-nique where a single input is passed to multiple systems that areexpected to produce the same output; if the systems disagree onthe output, a bug in at least one of the systems has been detected.Slutz applied this technique for testing DBMS in a system called etecting Optimization Bugs in Database Engines via NoREC ESEC/FSE ’20, November 8–13, 2020, Virtual Event, USA r1 r3 r1r2r3 TRUEFALSETRUE 2 recordswhere φ is TRUE = SELECT *
FROM t0
WHERE φ SELECT( φ is TRUE) FROM t0 rs1 rs2rs3
DBMS
Query Optimizer
DBMS
Query Optimizer r1r2r3 φ ¬ φ φ t0 DBMS
Query Optimizer
DBMS
Query Optimizer Correct OptimizationIncorrect OptimizationHigh Optimization Potential
Low Optimization Potential
Figure 1: The core of the approach is the translation of an optimized query (step 1 ) to an unoptimized one (step 2 ), whichallows the automatic detection of optimization bugs (step 3 ). t0 is a table contained in the database, and rs1 , rs2 , as well as rs3 are result sets returned by the DBMS. Predicate ϕ is random, but fixed. RAGS by generating SQL queries that are sent to multiple DBMSand then observing differences in the output sets [50]. While theapproach was effective, the author stated that the small commoncore and the differences between different DBMS were a challenge.Indeed, DBMS typically differ in the SQL dialect that they support,by deviating from the standard and providing DBMS-specific exten-sions [46]. For example, the CockroachDB developers argued thatthey cannot use differential testing using PostgreSQL [29], whichis the DBMS that is closest to it:
Correctness is difficult because wedon’t have any oracle of truth, which would require a known workingSQL engine, which is exactly the thing we’re trying to break. We areunable to use Postgres as an oracle because CockroachDB has slightlydifferent semantics and SQL support, and generating queries thatexecute identically on both is tricky and doesnâĂŹt allow us to use thefull CockroachDB grammar.
In this paper, we propose an approachthat allows building such a “known working SQL engine”, namelyone that is expected to be free of optimization bugs. As argued next,it is unclear how differential testing could be used to achieve this.
Controlling optimizations in DBMS.
One obvious, but infeasible ap-proach to finding optimization bugs is to realize differential testingby executing a SQL query once by disabling, and once by enablingoptimizations in a DBMS to detect bug-induced deviations in theresult set. This technique has been applied on compilers [33, 61],where programs were compiled without and with optimizationflags. For DBMS, the majority of optimizations cannot be disabled.DBMS typically provide some run-time and compile-time options tocontrol the behavior of operators and optimizations, but these aretypically very limited. For example, the
LIKE optimization appliedto the query in Listing 1 cannot be disabled; SQLite provides onlyan option to control whether the operator should ignore the cas-ing of the string. Similarly, some DBMS allow the specification ofhints to the query optimizer for a given query [8], which also doesnot apply to many optimizations. Although modifying the DBMSto provide configuration options for all optimizations would be apossibility, doing so would require DBMS-specific knowledge andwould involve a high implementation effort.
To find optimization bugs in DBMS, we propose NoREC. Our coreinsight is that a given query that is optimized by the DBMS can betransformed to another query that cannot be effectively optimized.For brevity, we refer to the query that is potentially optimized bythe DBMS as the optimized query , and the query that is not or lessoptimized as the unoptimized query . While our translation stepcannot guarantee the absence of optimizations, we found that thistechnique is widely applicable to disable them in practice.
Figure 1 illustrates our approach. In step 1 , we randomly generatean optimized query of the form
SELECT * FROM t0 WHERE ϕ .Since most optimizations apply to data filtering, such as expressedin the query’s WHERE clause, we expect that the randomly-generatedquery might be optimized by the DBMS. In the figure, the databasecontains a single table t0 holding the records r1 , r2 , and r3 . As-suming that the DBMS functions correctly, the result set shouldcorrespond to rs1 , which comprises two records r1 and r3 . Dueto an incorrect optimization, however, it might occur that a recordis omitted, or a record is mistakenly fetched. In the example, rs2 mistakenly additionally contains r2 .In step 2 , we translate the optimized query to an unoptimizedquery of the form SELECT ( ϕ IS TRUE) FROM t0 . This query lacksa
WHERE condition. Thus, the DBMS must fetch every record in theselected table, which effectively disables most of the optimizationsthat the DBMS could apply. Furthermore, this query evaluates ϕ as a boolean predicate on every record in the table. This predicateshould evaluate to TRUE for every record that is contained in theresult set of the optimized query ( i.e. , for which the
WHERE clauseevaluates to
TRUE ), because the predicate must consistently yieldthe same value, regardless of where it is used. The result set thusmust contain two records with
TRUE , and one record with
FALSE .In step 3 , we pass both queries to the DBMS and compare thetwo result sets (i.e., rs1 / rs2 with rs3 ). For the optimized query,we count the number of records, that is, | rs1 | = SEC/FSE ’20, November 8–13, 2020, Virtual Event, USA Manuel Rigger and Zhendong Su
Listing 2: Join clauses can be copied during translation. SELECT * FROM t0 LEFT JOIN t1 ON t0.c0 = t1.c0 JOIN t2 ON t2.c0 > t0.c1 WHERE t2.c0 = 5;2
SELECT ((t2.c0 = 5) IS
TRUE ) FROM t0 LEFT JOIN t1 ON t0.c0 = t1.c0 JOIN t2 ON t2.c0 > t0.c1; execution and | rs2 | = TRUE values in the result set,that is, | σ column = TRUE ( r ) = | , which should correspond to thenumber of records that are fetched for the optimized query. Since2 (cid:44) TRUE for the unoptimized query to validate the DBMS.Our empirical evidence demonstrates that this suffices to find alloptimization bugs. For completeness, Section 3.3 describes howNoREC can be extended to also validate the records’ contents.
Translating an optimized query to an unoptimized one is a straight-forward, automatic procedure. As illustrated in Figure 1 step 2 , itrequires moving the condition in a
WHERE clause to after the selectstatement, so that it is executed on every row in the table. As de-tailed next, the basic approach can be extended to cover additionalfeatures of the DBMS.
Multiple tables.
In a
FROM clause, multiple tables can be specifiedfrom which records are fetched, which are typically joined by apredicate in the
WHERE clause. Although the previous examples onlyreferred to a single table, our approach directly applies to multipletables without any modifications.
Join clauses.
Besides
WHERE clauses, also
JOIN clauses can be used tojoin two tables. For example, consider query 1 in Listing 2, whichshows an example with one (inner)
JOIN and one
LEFT JOIN . The ON clause for inner JOIN s specifies that only those records shouldbe fetched for which the condition evaluates to
TRUE for records inboth tables ( i.e. , as if the predicate would have been specified in a
WHERE clause). A
LEFT JOIN fetches all records that an inner
JOIN fetches; in addition, it fetches all records from the left table that donot have a matching record in the right table, by assuming selectedcolumns from the right table to be
NULL . These, and the other typesof joins ( e.g. , NATURAL JOIN s, RIGHT JOIN s, and
FULL JOIN s) canbe left unmodified during translation. Query 2 shows that only the
WHERE condition t2.c0 = 5 was moved after the
SELECT clause,and that the
JOIN s were copied. An alternative strategy that couldfind additional bugs in joins would be to move their ON clauses aswell, which would require translating them to multiple unoptimizedqueries (see Section 5). ORDER BY.
ORDER BY clauses do not influence the cardinality of theresult set. Thus, the unoptimized query can either omit or replaceit during translation, to test for bugs related to this feature.
GROUP BY.
GROUP BY clauses group records with same values andare often used in combination with aggregate and window functions.These clauses, if present in the optimized query, can be copied tothe unoptimized query. If so, an additional query is required to sumup the intermediate counts of the individual groups, assuming that
Listing 3: We alternate between two strategies for determin-ing the count for the optimized query. SELECT * FROM t0 WHERE ϕ -- while (rs.next ()) count ++2 SELECT COUNT (*)
FROM t0 WHERE ϕ -- count=rs.getInt (1) Listing 4: We use an aggregate function to determine thecount for the unoptimized query.
SELECT SUM ( count ) FROM ( SELECT ϕ IS TRUE ) as count FROM t0 -- count = rs.getInt (1) an aggregate function is used to sum up the records for which theexpression evaluates to TRUE ( cf. Section 3.3).
Figure 1, step 3 does not illustrate how to compute the countsfor the optimized and unoptimized queries. We apply differentstrategies for this. The naive approach is to iterate through theresult set to determine the count, which is applicable for bothqueries. The second, more efficient strategy—the performance gainvaries on various parameters, such as the number of rows in thedatabase—relies on aggregate functions provided by the DBMS toretrieve the count, but might result in bugs being overlooked, sincethe increased complexity of the query might make optimizationsinapplicable. To balance performance and bug-finding capabilities,we alternate between both strategies.
Optimized query.
Listing 3 demonstrates the two ways how wecompute the count for the optimized query from Figure 1. Query1 represents the naive approach. For this query, the DBMS returnsa result set rs , through which SQLancer iterates to determine thecount. Query 2 uses COUNT(*) to count the number of records byrelying on the DBMS for this. This is more efficient because theDBMS might optimize the query, and also because the overheadfor crossing the boundaries between the DBMS and SQLancer isavoided [5]. SQLancer only needs to retrieve the count from thesingle record in the result set rs returned by the DBMS. Unoptimized query.
For the unoptimized query, we assume thatsince the DBMS is unable to optimize the query, it is unable tooptimize an aggregate function applied to it as well. Since usingan aggregate function is more efficient, we use only this strategy(see Listing 4). The
SUM() function adds up the predicate values byinterpreting
TRUE as one, and
FALSE as well as
NULL as zero. DBMSsuch as PostgreSQL and CockroachDB do not provide implicit con-versions from booleans to integers, and require an additional cast.
Records content.
Our basic idea can be extended to check the records’contents. To this end, the query generated by step 2 must list eachcolumn in addition to the predicate. The records for which thepredicate evaluates to
TRUE for the unoptimized query can then becompared with those fetched for the optimized query in step 3 .However, retrieving and comparing the result sets makes it neces-sary to use the slower naive strategy presented above. Checking therecords’ contents allowed us to find an additional bug in an SQLiteextension, albeit not in its optimizer. We speculate that doing sowas not more effective because we are unaware of any optimiza-tions that transform the fetched content. Furthermore, while it is etecting Optimization Bugs in Database Engines via NoREC ESEC/FSE ’20, November 8–13, 2020, Virtual Event, USA possible that a DBMS returns an incorrect result set with the cor-rect cardinality, our empirical evidence suggests that such bugs areunlikely to occur.
We tested a large subset of each DBMS’ functionality and, in theprocess, identified general limitations as well as three SQLite cornercases that need to be specially treated by our approach. We do notconsider these limitations to be essential, as they did not hinder theapproach in finding bugs.
Ambiguous queries.
SQL queries can be ambiguous, and thus itmight be possible that a DBMS returns a different result for theoptimized query than the unoptimized one, which was also a chal-lenge for previous work [30]. In practice, we found subqueries to beproblematic, especially when comparing the result of a subquerythat might return more than one record with a value. Thus, wedecided to disable the generation of subqueries and will considergenerating unambiguous subqueries as part of future work.
Nondeterministic functions.
A query might be unambiguous, butyield a different result between the optimized and unoptimizedqueries due to nondeterministic functions. Such functions includerandom number generators and those that return the current time.To prevent false positives, we disabled their generation.
Short circuit evaluation.
Our approach is not applicable to detectbugs where an optimization results in an exception or error being“optimized way”. This is due to SQL not specifying whether the
AND and OR operators must short-circuit. We found that DBMS canhandle this inconsistently between the optimized and unoptimizedquery. Consider a predicate ϕ ok AND ϕ err , where ϕ err results in anerror when executed. If ϕ ok is executed first and yields FALSE , theDBMS might avoid also evaluating ϕ err , causing the statement toexecute without errors. Otherwise, the execution of ϕ err resultsin an error. Consequently, our approach cannot detect incorrectoptimizations that prevent expected errors to occur. Other features.
Our approach does neither directly apply to
DISTINCT clauses nor to queries that compute results over mul-tiple records such as aggregate as well as window functions, whichis also a limitation that affects PQS. Also these features are opti-mized, meaning that their implementation might be affected byoptimization bugs as well. We believe that our high level idea oftranslating an optimized to an unoptimized query could also beextended to be applicable in this context.
Number comparisons in SQLite.
One of the three SQLite corner casesthat caused problems was that SQLite3 considered floating-pointnumbers and integers that represent the same value to be equal,also when using the
DISTINCT keyword, which caused inconsistentresults. In Listing 5, the
DISTINCT keyword in the view v0 resultedin only one of the records being fetched—which one is unspecifiedand differed between the optimized and unoptimized query. Forquery 1 , was fetched; thus, the string concatenation yieldedthe value , which evaluated to TRUE . For query 2 , wasfetched from the view, which resulted in the concatenated string , which evaluated to
FALSE . Since such false positives were uncommon in SQLite3, and not present in the other DBMS, we ini-tially manually filtered out such false positives, but then introducedan option to avoid generating
DISTINCT keywords in views.
Listing 5: The
DISTINCT keyword in views can cause incon-sistent results in SQLite.
CREATE TABLE t0(c0);
INSERT INTO t0(c0)
VALUES (0.0) , (0);
CREATE VIEW v0(c0)
AS SELECT DISTINCT c0 FROM t0;1
SELECT COUNT (*)
FROM v0 WHERE v0.c0 0.1; -- 12
SELECT (v0.c0 0.1) IS
TRUE FROM v0; -- 0
Input columns in SQLite.
The second SQLite corner case concernedthe dbstat extension in SQLite (see Listing 6). The
WHERE clause stat.aggregate = 1 set an configuration option to
TRUE , whichchanged the behavior of the query and causes a record to be fetched.When we used this predicate directly after the
SELECT clause, how-ever, the column was not used as an input and no record was fetched.We addressed this by avoiding the generation of clauses that setthe configuration option for this specific column and extension.
Listing 6: Input columns in SQLite can change the behaviorof queries
CREATE
VIRTUAL
TABLE stat
USING dbstat;
SELECT * FROM stat
WHERE stat.aggregate = 1; -- fetchesone record
SELECT stat.aggregate = 1
FROM stat; -- FALSE
Ambiguous
GROUP BY s in SQLite.
The third SQLite corner case wasgiven by ambiguous
GROUP BY s in a view, which caused problemsin combination with other features (such as optimizer hints, seeListing 7). All other DBMS that we tested prohibited such ambigu-ous
GROUP BY s and returned an error for the view creation. In ourcomplete testing period, we encountered such cases seldom, whichis why we did not address this in SQLancer.
Listing 7: Ambiguous
GROUP BY s in SQLite can cause incon-sistent results
CREATE TABLE t0(c0 , c1 , c2 ,
PRIMARY KEY (c2)) WITHOUTROWID;
CREATE INDEX i0 ON t0( CAST (c1
AS INT ));
CREATE VIEW v0 AS SELECT
0, c0
FROM t0 GROUP BY HAVING c2;
INSERT INTO t0(c2)
VALUES ('');
INSERT INTO t0(c1 , c2)
VALUES (1, 1);
SELECT * FROM v0 WHERE
UNLIKELY (1); -- {}
SELECT
UNLIKELY (1)
FROM v0; -- TRUE
The random and targeted generation of databases [4, 7, 23, 26, 32, 38]and queries [2, 9, 30, 37, 42, 49, 58] for different workloads andpurposes has been widely explored, and is not a contribution ofthis paper. Our approach can be applied based on any randomly-generated or existing database. It can also be applied to any randomquery generator that prevents the generation, or ignores errorsin the corner cases described in Section 3.4. Thus, we explain ourdatabase and query generator only for completeness.In our work, we base the generation of databases and queries onSQLancer [46], which we extended to cover additional DBMS ( i.e.
CockroachDB and MariaDB), as well as SQL features ( e.g. additional
SEC/FSE ’20, November 8–13, 2020, Virtual Event, USA Manuel Rigger and Zhendong Su data types, operators, and functions). SQLancer generates a data-base by randomly creating tables, indexes, inserting data, as well asby updating and deleting data to stress the DBMS in an attempt toincrease the chances of finding bugs. The core part of SQLancer’srandom query generation is the generation of random expressions,which we use in
WHERE and
JOIN clauses. SQLancer generates theseexpressions heuristically, by selecting one of the applicable options.The applicable options are specific to a given DBMS, since DBMSvary in which operators they support and which implicit conver-sions they perform. The generation of the expressions is based onthe grammar of the respective DBMS and the schema of the currentdatabase (to generate valid references to columns and tables).
The goal of our evaluation was to demonstrate the effectivenessand generality of our approach. To this end, we tested NoRECon four widely-used DBMS: SQLite, MariaDB, PostgreSQL, andCockroachDB. As part of this, we extended SQLancer by a data-base and query generator for MariaDB as well as CockroachDB,and enhanced these components for SQLite and PostgreSQL (seeSection 3.5). To maximize our real-world impact, we invested signif-icant time and effort over a five-month period, which allowed us tofind 159 true, previously unknown bugs. Furthermore, we analyzedthe bug reports in order to obtain a better understanding on whichkinds of bugs NoREC can find. Since PQS is the closest-related work,we compared PQS with NoREC.
Tested DBMS.
We focused on testing four important, widely-usedDBMS: SQLite, PostgreSQL, MariaDB, and CockroachDB (see Ta-ble 1). According to the DB-Engines Ranking [17], the Stack Over-flow’s annual Developer Survey [40], and GitHub, these DBMSare among the most popular and widely-used DBMS. SQLite isthe most widely deployed DBMS overall, used in most major webbrowsers, mobile phones, and embedded systems. The authors ofSQLite speculate that over one trillion SQLite databases are in ac-tive use [53]. MySQL ranks on top of most popularity rankings.However, MySQL’s binaries and its source code is provided only forrelease versions, which are typically published every 2-3 months,which makes it tedious to filter out test cases that trigger the sameunderlying bug, as also noted previously [46]. Furthermore, onlysome of the bugs found by PQS have been fixed, providing fewerincentives to test this DBMS. Thus, we decided to test MariaDB,which is a fork of MySQL, and uses an open-source developmentprocess. Since MariaDB shares much code with MySQL, we believethat the results are similar to those that would be obtained whentesting MySQL. PostgreSQL is also a popular DBMS; it seems tobe more robust than most other DBMS, and the PQS work couldfind only a single logic bug in it [46]. CockroachDB [56] is a recentcommercial NewSQL DBMS [41]. It has received much attentionand is highly popular on GitHub, although it has a low rank onthe other popularity rankings. We tested only CockroachDB’s freecommunity edition, and not the commercial enterprise edition.
Testing focus.
The developer’s reaction times was a significant fac-tor that determined on which DBMS we concentrated our testingefforts. The SQLite developers were most responsive in fixing bugs;
Table 1: The DBMS we tested are popular, complex, and mosthave been developed for a long time.
Popularity RankDBMS DB-Engines StackOverflow GitHubStars LOC FirstReleaseSQLite 11 4 1.5k 0.3M 2000MariaDB 13 7 3.2k 3.6M 2009PostgreSQL 4 2 6.3k 1.4M 1996CockroachDB 75 - 17.7k 1.1M 2015 typically, they would fix a bug within hours of us reporting it. Thus,we invested significantly more time into testing SQLite than for theother DBMS. Besides testing SQLite’s core, we tested three impor-tant extensions that are included as part of SQLite’s source code,but need to be enabled during build time. One extension enablesFull Text Search (FTS) for SQLite, which was subject to extensiveinvestigations by security researchers [57], as it is, for example,enabled in Google Chrome. R-Tree is an important index structurefor spatial objects that is designed for efficiently supporting rangequeries [25].
DBSTAT is a virtual table that allows querying infor-mation about the content of a SQLite database. We also investedsignificant effort into testing PostgreSQL; however, we were unableto find any interesting bugs, which is why the developer’s reactiontime was insignificant. The CockroachDB developers quickly con-firmed our bugs, and fixed many of them within days, especiallythose in the query optimizer. The MariaDB developers quickly con-firmed our bugs; however, only one was fixed. We stopped testingMariaDB after reporting the initial bugs, due to the difficulty offiltering out duplicates.
Existing testing efforts.
All DBMS are extensively tested, whichwe want to illustrate based on SQLite, and CockroachDB, whichboth documented their testing efforts. SQLite likely has the mostimpressive testing effort, which is documented on the SQLite home-page [52]. The SQLite developers follow a design process inspired bythe DO-178B guidelines [59], which are concerned with the safetyof safety-critical software used in certain airborne systems. Theyachieve 100% modified condition/decision coverage [59], whichimplies that every branch has been taken and falls through at leastonce. Besides, they employ out-of-memory testing, I/O error testing,crash testing, compound failure testing, fuzz testing, and dynamicanalysis [52]. We believe that CockroachDB is an interesting target,because the developers have put significant effort into developingand using automatic testing techniques, which they run as part oftheir continuous integration. For example, they have been runninga grammar-based fuzzer on CockroachDB that has found 70 bugssuch as crashes and hangs within 3 years [28, 29]. In addition, theyported SQLSmith to Go to use it as an alternative query generator,which found over 40 bugs [29]. However, as noted by them [29],“[t]his kind of fuzz testing is not able to deduce correctness”, whichis the gap that we are filling with NoREC. When we reported bugs,they actively worked on enhancing their testing infrastructureto find similar correctness bugs based on domain knowledge andrandom testing techniques. etecting Optimization Bugs in Database Engines via NoREC ESEC/FSE ’20, November 8–13, 2020, Virtual Event, USA
Listing 8: A bug in the IN affected expressions with affinity. CREATE TABLE t0(c0
INT UNIQUE ); INSERT INTO t0(c0)
VALUES (1);
SELECT * FROM t0 WHERE '1' IN (t0.c0); -- {1} {} Testing methodology.
We implemented our approach iteratively andapplied it to the DBMS under test after each iteration. Typically,we added a new feature to the random query generator ( e.g. , anew operator) or database generator ( e.g. , a new data type), afterwhich we continued testing the DBMS. While some bugs werefound seconds after implementing a feature, others were foundonly after weeks. After finding a bug, we reduced the test case.Although special query reducers have been proposed [30, 50], wefound that C-Reduce [43], a tool that was originally developedfor reducing C/C++ programs, was sufficient for our use case. Wealso manually reduced and canonicalized test cases, to reduce thedeveloper’s debugging effort. After excluding potential duplicates,we reported issues on the bug tracker, mailing list, or via a privatereport (when we considered the bug to potentially be a securityissue). We did not analyze any potential security issues, since thefocus of this work are optimization bugs. Until a bug was fixed,we tried to avoid generating patterns that triggered the bug. Weinvested significant time and effort in testing, as well as in triagingand reporting bugs, and opened a total of 168 bug reports. Due tothe iterative implementation and deployment of our tool, we cannotprovide any detailed statistics on the total run time or efficiency.
Next, we show a selection of bugs found by NoREC to give anintuition on what kind of interesting bugs it can find. These bugsare necessarily biased—we found many less interesting bugs, butalso other interesting ones that we had to omit. The full list ofbugs can be found as part of the supplementary material suppliedwith the manuscript. For brevity, we omit the unoptimized querywhere it can be directly derived from the optimized one. Rather,we highlight the actual, incorrect result with a symbol, and theexpected result with a symbol.
Incorrect IN optimization. The SQL IN operator allows checkingwhether a value on the left side is contained in a set of values onthe right side. Previously, SQLite implemented an optimization thattransformed an expression of the form X IN (Y) to X=Y (note that Yis a single value). For the = operator, SQLite performs implicit typeconversions based on the affinity of an operand ( e.g. the columntype), while it is not supposed to perform them for the IN operator.We thus found that this optimization is incorrect in the presence ofaffinity conversions (see Listing 8), as it caused SQLite to mistakenlyconvert the string ‘1’ to an integer in the query, thus unexpectedlyfetching a record. We found other, similar bugs related to affinityconversions ( e.g. , a bug in the constant propagation). We believe thataffinity conversions are difficult to reason about, and our findingsseem to demonstrate that this mechanism is error prone. Operator commuting disregards
COLLATE . We found a bug wherecommuting an operator mistakenly resulted in matching an inap-plicable partial index (see Listing 9). The
COLLATE NOCASE clause
Listing 9: Commuting the operator resulted in the partialindex being mistakenly used.
CREATE TABLE t0(c0
COLLATE
NOCASE , c1);
CREATE INDEX i0 ON t0(0) WHERE c0 >= c1;
INSERT INTO t0 VALUES ('a', 'B');
SELECT * FROM t0 WHERE t0.c1 <= t0.c0; -- {} , {a|B}
Listing 10: A bug in the vectorization engine caused a recordto be omitted.
SET SESSION
VECTORIZE= experimental_on ;
CREATE TABLE t1(c0
INT ); CREATE TABLE t0(c0
INT UNIQUE ); INSERT INTO t1(c0)
VALUES (0);
INSERT INTO t0(c0)
VALUES ( NULL ), (
NULL ); SELECT * FROM t0 , t1
WHERE t0.c0 IS
NULL ; -- {NULL |0} {NULL|0, NULL |0}
Listing 11: A bug in the handling of filters unexpectedlycaused a record to be fetched.
CREATE TABLE t0(c0 BOOL
UNIQUE , c1 BOOL
CHECK ( true )); INSERT INTO t0(c0)
VALUES ( true ); SELECT * FROM t0 WHERE t0.c0
AND ( false NOT BETWEEN SYMMETRIC t0.c0
AND NULL AND true ); -- {TRUE} {} specifies that when this column is used in string comparisons, thecasing of strings should be disregarded; however, since the c1 col-umn is used on the left hand side and lacks a COLLATE , the casingis assumed to be relevant. Thus, the lowercase characters ‘a’ isassumed to be greater than the uppercase ‘B’ , making the predicateyield
TRUE . However, the record was not fetched. The cause for thebug was that SQLite commuted the operator, while updating theexpressions’
COLLATE s, which subsequently caused it to match thepartial index, since insufficient information was preserved to verifywhether the expression correctly qualifies the index. The bug wasfixed by adding logic to maintain this information.
Vectorization engine bug.
We found 11 bugs related to Cock-roachDB’s vectorization engine, one of which is illustrated by List-ing 10. The query is expected to fetch two records, because the twotables are joined without filtering out records, effectively comput-ing the table’s cross product ( |t0|*|t1| = 2*1 = 2 ). However,only one record was fetched. The core reason for this bug was thatthe empty set of equality columns for the join between the twotables was handled incorrectly for hash joins by the vectorizedexecution engine.
Incorrect handling of filters.
We found a bug that exposed thatCockroachDB, in rare cases, incorrectly handled
CHECK constraints,which are used to refine the ranges in filters, causing the query inListing 11 to incorrectly fetch a record, even though the predicateshould evaluate to
NULL . While this specific bug was fixed by intro-ducing a missing normalization rule, the underlying cause for thebug was fixed in a subsequent commit that extended and refactoredCockroachDB’s index constraints library.
SEC/FSE ’20, November 8–13, 2020, Virtual Event, USA Manuel Rigger and Zhendong Su
Listing 12: A bug in the range scan resulted in a row beingomitted.
CREATE TABLE t0(c0
INT UNIQUE ); INSERT INTO t0 VALUES ( NULL ) ,(
NULL ) ,(
NULL ) ,(
NULL ) ,(1) ,(0);
SELECT * FROM t0 WHERE c0 < '\n2'; -- {0} {0, 1}
Listing 13: A predicate comparing a float-point number withan integer unexpectedly evaluated to
TRUE . CREATE TABLE t0(c0
INT ); INSERT INTO t0 VALUES (1);
CREATE INDEX i0 ON t0(c0); SELECT * FROM t0 WHERE
Incorrect string range scan.
We found a bug in MariaDB where arange scan on an index was applied incorrectly for a string com-parison (see Listing 12). As explained by the MariaDB developers,the optimizer incorrectly constructed the range
NULL < x <= 0 for the range scan, even though the upper limit should be , lead-ing to only one row being fetched. The reason for that was thatspace characters like \n were handled inconsistently. This bug alsoaffected MySQL, and was the only bug that we reported that wasfixed by the MariaDB developers. Incorrect number comparison.
We found a bug where a comparisonof a floating-point number with an integer column, on which anindex is created, yielded an incorrect result (see Listing 13). The com-parison should evaluate to
FALSE , because c0 should, afteran implicit type conversion, evaluate to the value , and (cid:44) .However, the query unexpectedly fetched the single record storedin the table. While this bug was quickly confirmed and reproducedfor both MariaDB and MySQL, it has not been addressed yet. General bug statistics.
Table 2 shows the number of bugs that wereported and their status. Out of the 168 bug reports, 159 turnedout to be previously-unknown true bugs . 141 of these bugs wereaddressed by code changes, demonstrating that the developers tookour bug reports seriously. 14 bugs were verified but have not beenaddressed yet, and 3 bug reports were addressed by documentationchanges. 9 turned out to be false bugs . Out of these, 7 were notconsidered to be bugs, either because an internal error that we con-sidered to be unexpected was actually expected by the developers,or because we were not yet aware of the limitations of the approach(see Section 3.4). As we tested the latest version of each DBMS, only2 bugs turned out to be already known.
Oracles.
Table 3 shows the oracles that we used to find the bugs. Wefound 51 bugs with the NoREC oracle, which was the primary focusof our work. Besides, we identified 58 bugs through unexpectedinternal errors, either while creating the database, or when sendingqueries to it. We found these bugs by annotating a list of expectederrors to each SQL statement [46]. SQLancer identified also manycrash bugs, since it implicitly acts as a grammar-based fuzzer. Webuilt the debug versions of SQLite and PostgreSQL, which allowedus to find 27 debug assertion failures. Furthermore, we found 23crash bugs—which also included hang bugs—in release builds (butnot necessarily in released versions of the DBMS).
Table 2: We found 159 bugs in SQLite, MariaDB, PostgreSQL,and CockroachDB, 141 of which have been fixed.
ClosedDBMS Fixed Verified Intended DuplicateSQLite 110 0 6 0MariaDB 1 5 0 1PostgreSQL 5 2 1 0CockroachDB 28 7 0 1
Table 3: We found 51 bugs with the NoREC oracle, 58through unexpected errors, 27 by debug assertion failures,and 23 by crashes that occur in release version.
CrashDBMS Logic Error Release DebugSQLite 39 30 15 26MariaDB 5 0 1 0PostgreSQL 0 4 3 1CockroachDB 7 24 4 0
Additional clauses.
Section 3.2 mentions that optionally,
ORDER BY and
GROUP BY clauses can be generated, to further stress the queryoptimizer. We found one logic bug, and one crash bug with an
ORDERBY clause. We found only one error bug using a
GROUP BY clause.Overall, these two clauses did not contribute much to NoRECâĂŹsbug-finding capabilities; however, their implementation requireslittle effort, which might still justify their implementation.
SQLite.
We found most bugs in SQLite, which is expected, sincewe invested most effort and energy into testing it. Out of the 110SQLite bugs, 71 affected the SQLite core. A smaller portion affectedextensions; we found 13 bugs in RTREE, 24 in FTS, and 2 in DBSTAT.Note that some bugs that we found in these extensions affectedvirtual tables in general, on which these and other extensions arebased. While we were testing SQLite, the developers added supportfor generated columns [51], which are columns that are computedbased on other columns. After this feature was merged to trunk,but before it was released, we found 22 bugs in it, contributing sig-nificantly to its correctness. Besides logic bugs, 26 bugs manifestedthemselves as debug assertion failures. This high number can beexplained by previous work such as PQS having omitted testingthem. A number of these assertions did not indicate real bugs in theSQLite core; rather, they indicated the omission of corner cases inthe testing logic that SQLite uses [55]. The high number of crashesin release builds is surprising, considering that PQS found only 2crash bugs in SQLite [46]. One main reason is the aforementionedgenerated column feature, in which we found 9 crash bugs. We alsofound one hang bug in FTS, one bug that involved a trigger, twobugs in RTREE, and two bugs in window functions.
PostgreSQL.
Although we invested significant effort into testingPostgreSQL, we found only 8 bugs in it. None of these bugs wasan optimization bug. This is consistent with previous findings; for etecting Optimization Bugs in Database Engines via NoREC ESEC/FSE ’20, November 8–13, 2020, Virtual Event, USA example, PQS could find only a single logic bug in this DBMS [46].We believe that one significant reason for that is that PostgreSQLis very restrictive in what input it accepts compared to the otherDBMS. Richard Hipp, the main SQLite developer, also noted thatPostgreSQL in particular is a high-quality DBMS, which has hadfew bugs and noted that one possible reason could be their veryelaborate peer review process [59].
CockroachDB.
We found 35 bugs in CockroachDB. In 15 cases, ourbug reports relied on experimental features. Of these, 11 bugs af-fected the vectorizer engine (see Listing 10). Out of the 24 errorbugs, 17 were due to internal errors, for which execution resultedin displaying a stack trace along with information on where to re-port the bug, while the server stayed responsive. Based on our bugreports, the CockroachDB developers actively added testing infras-tructure and reviewed code to detect similar bugs, demonstratingthat they took our bug reports seriously. For example, one of theduplicate bug report was due to an open issue that acknowledgedthat the issue was found based on one of our bug reports.
MariaDB.
We found 6 bugs in MariaDB. All bugs were quicklyconfirmed by the MariaDB developers, and three of the bugs werereproduced also for MySQL. However, only one bug was fixedwithin a duration of three months, which is why we stopped testingMariaDB. Since we invested only little time in testing MariaDB, webelieve that our approach could find additional bugs in it.
PQS is the state of the art in finding logic bugs in DBMS, whichis why we want to compare NoREC’s effectiveness with it. Weexpected PQS to find a broader class of bugs, because our aim forNoREC was to find optimization bugs. To the best of our knowledge,there is no other publicly available tool that could detect logic bugsin DBMS to which we could compare.
Evaluation challenges.
Fairly comparing the effectiveness betweenPQS and NoREC is difficult due to various reasons. First, the imple-mentation effort of PQS is significant, and limited the authors to acore subset of the respective DBMS’s supported SQL dialect [46].For example, in PQS, a single comparison operator alone coversalready more than 200 LOC, since it needs to support comparingarbitrary data types, which also involves implicit conversions forDBMS like SQLite or MySQL. In contrast, the implementation of theNoREC oracle consists of less than 200 LOC, and also allows find-ing bugs in the optimization of complex operators and functions.Nevertheless, we believe that a fair evaluation should disregard theamount of time that was invested into implementing the respectiveapproach. Another challenge is that a different set of DBMS weretested. While both PQS and NoREC were evaluated on SQLite andPostgreSQL, PQS was also evaluated on MySQL, while we evaluatedNoREC on MariaDB, and CockroachDB. Overall, this inhibited usfrom doing an automatic comparison.
Methodology.
To provide a fair comparison, we performed a manualquantitative and qualitative comparison. We noticed that a test casefor NoREC can typically be converted directly to an equivalent PQStest case that triggers the bug, if it can be reproduced by PQS. In fact,we can take the unmodified query with the predicate in the
WHERE clause and check if the bug can be reproduced by selecting an ap-plicable pivot row. Likewise, a PQS test case can often be convertedto an equivalent NoREC test case, by performing the translationstep to an unoptimized query. Based on this observation, we man-ually tried to create equivalent test cases where possible, whichwe then evaluated on the version of the DBMS in which the bugwas found. While we cannot completely rule out misclassificationsthat might be due to overlooking that a bug could be reproducedby another query, we believe that the majority of cases were clear.Note that we considered only bugs found by the NoREC oracle—weexpect that the errors and crashes can be triggered with either ofthe approaches. Overall, we investigated (1) the 51 bugs found bythe NoREC oracle to check if they could have been found by PQSas well and (2) the 61 bugs found by PQS to check whether theycould have been found by NoREC.
Bugs found only by NoREC.
PQS could detect 56.9% of the bugsthat were found by NoREC. Specifically, NoREC detected 4 bugsfor which the result set unexpectedly contained or missed dupli-cate records. PQS conceptually cannot detect such bugs, becauseit validates only whether a randomly-selected record, which is in-distinguishable from other duplicate rows, is part of the result set.NoREC triggered 5 bugs by aggregate functions that are used forcounting the rows ( i.e. , SUM and
COUNT ). PQS’ main oracle relieson only checking a single row at a time, which conceptually hin-ders PQS in detecting bugs related to these aggregate functions.We found 13 cases where records were mistakenly fetched. PQSdid not detect this class of bugs, because it only checks for bugswhere the pivot row is mistakenly omitted from the result set. Webelieve that this not a fundamental limitation of PQS, because itcould be extended to also generate queries that guarantee to omitthe pivot row, enabling PQS to also detect such bugs. Taking thisinto account, PQS could detect 82.4% of the bugs that NoREC found.A caveat that was already mentioned is that some other bugs canbe detected only by a close-to-complete implementation of PQS; asnoted, we did not consider this limitation in this analysis, since itis arguable which cases would be deemed to involve too much ofan implementation effort.
Bugs found only by PQS.
In total, NoREC could have found 52.7%of the bugs that PQS found. By far the most common kind of bugsthat were detected by PQS, but not by NoREC, stem from the in-correct implementation of operators, functions, and other features(especially affinity conversions in SQLite), both in the optimizedand unoptimized case. NoREC failed to detect 18 such bugs. 3 bugscould not be triggered by NoREC, since they relied on a
DISTINCT query, which was disregarded by NoREC, but could be supportedby translating it to a
GROUP BY clause in the unoptimized case.While PQS, like NoREC, is in general unable to detect bugs in ag-gregate functions, since it operates on a single pivot row, it cando so when a table contains only a single row, which allowed it todetect 3 bugs. PQS could also find 1 bug that was exposed whenusing
INTERSECT , which PQS uses to efficiently check containment,similarly to how NoREC uses the aggregate functions for counting.1 bug was triggered based on a predicate in an ON clause of a LEFTJOIN . NoREC failed to discover that bug, because it copies left joinsand predicates when translating the query. As outlined in Section 5,
SEC/FSE ’20, November 8–13, 2020, Virtual Event, USA Manuel Rigger and Zhendong Su we believe that our approach might be enhanced by implementingadditional translation schemes for joins.
Discussion.
The comparison has demonstrated that PQS can find anumber of bugs that NoREC cannot find. Although NoREC couldfind classes of bugs that PQS is unable to find as well, such as du-plicate records, it is mostly restricted to finding optimization bugs.However, PQS incurs significant implementation overhead, sinceevery operator and function to be tested needs to be implemented,for every DBMS that should be tested. In contrast, NoREC relieson a straightforward translation process, and is applicable to anydatabase and query generator that can address the limitations men-tioned in Section 3.4. Due to the low implementation effort, NoREChas successfully found a wide range of optimization bugs, even forSQLite, which has recently been comprehensively tested by PQS.Thus, we envision that DBMS could be tested by a combination ofboth approaches. PQS could be used to test many basic operatorsand functions, and, being an exact oracle, it can help in establishinga ground truth. Then, NoREC could be used to find the lingeringoptimization bugs in parts not comprehensively tested by PQS.
Reception by the DBMS developers.
One fuzzing researcher of particular note is Manuel Rigger, [...]. Mostfuzzers only look for assertion faults, crashes, undefined behavior(UB), or other easily detected anomalies. Dr. Rigger’s fuzzers, on theother hand, are able to find cases where SQLite computes an incorrectanswer. Rigger has found many such cases. Most of these finds arefairly obscure corner cases involving type conversions and affinitytransformations, and a good number of the finds are against unre-leased features. Nevertheless, his finds are still important as they arereal bugs, and the SQLite developers are grateful to be able to identifyand fix the underlying problems. Rigger’s work is currently unpub-lished. When it is released, it could be as influential as Zalewski’sinvention of AFL and profile-guided fuzzing. ” Bug importance.
We believe that we found many critical bugsthat are likely to affect real users. However, we also acknowl-edge that many of the other bugs that we found can be triggeredonly by an unlikely combination of operators or features. Eventhose seemingly unimportant bugs can affect users due to thewidespread use of these DBMS, for which we found evidence onthe SQLite mailing list, where a user reported an incorrect resultfor a query with a complex
WHERE predicate. When confrontedby another SQLite user, the reporter of the bug defended them-selves with the following [13]: “
I might not spell it like that my-self, but a code generator would do it (and much worse!). This ex-ample was simplified from a query generated by a Django ORMqueryset using .exclude(nullable_joined_table__column=1) ,for instance. ” The bug had already been fixed in the latest devel-opment version, because we previously reported the same under-lying bug [44]. We found it with a query on a view and a pred-icate
NOTNULL NOTNULL , which would unlikely to be written bya programmer as well. We speculate that also other users mightbe affected by such seemingly-obscure bugs; however, finding the root cause in such cases is difficult, especially when the query isgenerated by middleware.
Handling of joins.
Our translation process leaves
JOIN clauses un-modified (see Section 3.2). Although this allowed us to find somebugs in their handling, translating them to a condition that is placedafter the
SELECT as well could uncover additional bugs. For innerjoins, this is straightforward, because an ON predicate ϕ and a WHERE predicate ϕ can be translated to a predicate ϕ AND ϕ . Forother joins, this would be more involved. For example, for LEFTJOIN s, a simple
SELECT statement would no longer sufficient, butwould require combining the results of multiple queries. We con-sider additional strategies in translating
JOIN clauses as potentiallyuseful and part of future work.
Code coverage and performance.
At first sight, it might be interest-ing to evaluate the code coverage and run-time performance thatNoREC achieves. However, neither helps to explain the approach’seffectiveness. With respect to coverage, it was found that codecoverage is not particularly useful for fuzzing DBMS, since highcoverage for the core components ( e.g. , the query optimizer) can beachieved quickly [30]. Furthermore, although the SQLite developersachieve 100% MC/DC coverage in their test suite, we could still findbugs in SQLite. The run-time overhead of the query generationand translation is negligible, which is why we did not measure itin detail. The run time is dominated by the DBMS, which needsto process the generated queries, as well as by the communicationwith the DBMS. The communication overhead was lower for SQLitethan for the other DBMS, because SQLite is an embedded
DBMS,that is, it runs within the application process of SQLancer.
Kind of bugs.
51 of the 159 true bugs that we found were optimiza-tion bugs. We believe that such bugs are most severe, since theyare likely to go unnoticed by developers. Bugs caused by crashes,assertion failures, and error represent a large fraction of the overallbugs. However, they can be found by existing approaches, such asgrammar-based fuzzers. Furthermore, such bugs typically causethe DBMS to terminate, thus signalling the user that the DBMSmalfunctions for the given query. A possible conclusion might bethat our testing approach detects also errors typically found byfuzzers, and that such bugs might be more common or easier tofind than logic bugs (given that we do not have the ground truthon the total number of bugs).
Fully automatic approach.
We claim that NoREC is fully automatic,as it finds bugs by repeatedly generating test cases and validatingtheir result set, without requiring any human interaction. As dis-cussed in Section 3.4, corner cases exist that have to be treatedspecially to ensure that only true bugs are reported. When NoRECdetects a bug, it is helpful to reduce the generated test case so thatit is minimal. For our evaluation, we first automatically reduced thetest case, and then manually tried to reduce it further; we considermanual reduction optional, but convenient for the DBMS develop-ers for debugging. Before reporting bugs, we manually checkedthe bug tracker to lower the chances of reporting duplicate bugs.In practice, it would be useful to suspend running NoREC after itreports a bug, and continue its execution after the bug is fixed. etecting Optimization Bugs in Database Engines via NoREC ESEC/FSE ’20, November 8–13, 2020, Virtual Event, USA
Differential testing of DBMS. Differential testing discovers bugs byexecuting a given input using multiple versions of a system to de-tect differing outputs, which indicate a bug in one of these systems.Similarly to metamorphic testing, differential testing has provento be effective in many domains [6, 16, 31, 33, 36, 61]. For DBMS, itwas first applied by Slutz in a tool
RAGS to find bugs by executinga query on multiple different DBMS and comparing their resultsets [50]. While the approach was highly effective, it can only beapplied to a small subset of common SQL. Gu et al. used optionsand hints to force the generation of different query plans, to thenrank the accuracy of the optimizer based on the estimated costfor each plan [24]. Jinho et al. used differential testing in a systemcalled
APOLLO to find performance regression bugs in DBMS, byexecuting a SQL query on an old and newer version of a DBMS [30].They found 10 previously-unknown performance regression bugs inSQLite and PostgreSQL. While our work uses metamorphic testingto find optimization bugs, conceptually, it can be interpreted as real-izing differential testing by comparing the results of an optimizing,and non-optimizing version of a DBMS.
Other correctness oracles for testing DBMS.
Pivoted Query Synthesis(PQS) is the state of the art in testing DBMS for logic bugs and isthe most closely-related work [46]. It is both an automatic testingapproach as well as an oracle, and is based on the idea of testingthe DBMS’ correctness on a randomly-selected pivot row . PQS hasfound close to 100 bugs in widely-used DBMS, demonstrating thatit is highly effective. However, due to its high implementation effort,it might be infeasible to test all of a DBMS’ supported operatorsand functions. NoREC is mainly applicable to finding optimizationbugs, a subcategory of logic bugs, towards which PQS is geared.Due to its low implementation effort, it can find bugs in compo-nents for which the implementation of PQS would be too costly.ADUSA [32] is a query-aware database generator and generates in-put data as well as the expected result for a query, thus also tacklingthe correctness oracle problem for DBMS. It translates the schemaand query to an Alloy specification, which is subsequently solved.The approach could reproduce various known and injected bugs inMySQL, HSQLDB, and also find a new bug in Oracle Database. Webelieve that the high overhead that solver-based approaches incurmight inhibit such approaches from finding more DBMS bugs.
Random and targeted queries.
Many query generators have beenproposed for purposes such as bug-finding and benchmarking. SQL-smith is a widely-used, open-source random query generator, whichhas found over 100 bugs in widely-used DBMS [49]. Bati et al. pro-posed an approach based on genetic algorithms to incorporateexecution feedback for generating queries [2]. SQLFUZZ [30] alsoutilizes execution feedback and randomly generates queries usingonly features that are supported by all the DBMS systems they con-sidered. Khalek et al. proposed generating both syntactically and se-mantically valid queries based on a solver-backed approach [1]. Theproblem of generating queries that satisfy cardinality constraints ontheir sub-expressions was shown to be computationally hard [9, 37].Consequently, a number of heuristic and approximate approachesto generating targeted queries were proposed [9, 37, 42, 58]. Allthese random-query generators can be used to find bugs such as crashes and hangs in DBMS. When paired with the test oracleproposed in this paper, they could also be used to find logic bugs.
Random and targeted databases.
Many approaches have been pro-posed to automatically generate databases. Given a query and a setof constraints, QAGen [4, 34] generates a database that matches thedesired query results by combining traditional query processingand symbolic execution. Reverse Query Processing takes a queryand a desired result set as an input to then generate a databasethat could have produced the result set [3]. As discussed above,ADUSA is a query-aware database generator [32]. Gray et al. dis-cussed a set of techniques utilizing parallel algorithms to quicklygenerate billions-record databases [23]. DGL was proposed as adomain-specific language to generate input data following variousdistributions and inter-table correlations based on iterators that canbe composed [7]. Neufield et al. proposed to generate test data fortables with constraints by deriving a generator formula that is thentranslated to operators to generate the test data [38]. An improveddatabase generation might enable NoREC to find additional bugs.
Metamorphic testing.
Our approach is based on metamorphic test-ing [11]. Metamorphic testing addresses the test oracle problem by,based on an input and output of a system, generating a new inputfor which the result is known. This technique has been appliedsuccessfully in various domains [12, 47]. Central in this approachis the metamorphic relation, which can be used to infer the ex-pected result. In this work, we combine a translation and countingmechanism to establish a metamorphic relation specifically gearedtowards detecting optimization bugs. A limitation of metamorphictesting is that it cannot establish a ground truth. For NoREC, thisimplies that both the unoptimized and optimized query can yieldan incorrect result (see Section 4.4). Similarly, the optimized querycould compute the correct result, rather than the unoptimized one.
This paper has proposed a general, highly-effective approach fordetecting bugs in DBMS, which we termed Non-Optimizing Refer-ence Engine Construction (NoREC). The core insight of NoREC isthat a given optimized query can be translated to an unoptimized query, which enables the construction of a test oracle that can detect optimization bugs by comparing the result sets of the two queries.While we believe that this work provides a solid foundation forcorrectness testing of DBMS, the basic idea of NoREC could beextended by using additional or alternative translation strategiesfor queries, for example, by translating predicates of
JOIN clauses.As another example, queries could also be transformed in otherways, while still being expected to generate the same results ( e.g. ,by switching the operands of commutative operators). Furthermore,the efficiency and effectiveness of NoREC could be enhanced bypairing it with better database and query generators.
ACKNOWLEDGMENTS
We want to thank the DBMS developers for verifying and addressingour bug reports as well as for their feedback to our work. Further-more, we are grateful for the feedback received by the anonymousreviewers, Martin Kersten, as well as by the members of the ASTLab at ETH Zurich.
SEC/FSE ’20, November 8–13, 2020, Virtual Event, USA Manuel Rigger and Zhendong Su
REFERENCES [1] Shadi Abdul Khalek and Sarfraz Khurshid. 2010. Automated SQL Query Genera-tion for Systematic Testing of Database Engines. In
Proceedings of the IEEE/ACMInternational Conference on Automated Software Engineering (Antwerp, Belgium) (ASE ’10) . ACM, New York, NY, USA, 329–332. https://doi.org/10.1145/1858996.1859063[2] Hardik Bati, Leo Giakoumakis, Steve Herbert, and Aleksandras Surna. 2007. AGenetic Approach for Random Testing of Database Systems. In
Proceedings of the33rd International Conference on Very Large Data Bases (Vienna, Austria) (VLDB’07) . VLDB Endowment, 1243–1251.[3] Carsten Binnig, Donald Kossmann, and Eric Lo. 2007. Reverse Query Processing.
Proceedings - International Conference on Data Engineering , 506–515. https://doi.org/10.1109/ICDE.2007.367896[4] Carsten Binnig, Donald Kossmann, Eric Lo, and M. Tamer Özsu. 2007. QAGen:Generating Query-Aware Test Databases. In
Proceedings of the 2007 ACM SIG-MOD International Conference on Management of Data (Beijing, China) (SIGMODâĂŹ07) . Association for Computing Machinery, New York, NY, USA, 341âĂŞ352.https://doi.org/10.1145/1247480.1247520[5] Carl Friedrich Bolz, Darya Kurilova, and Laurence Tratt. 2016. Making anEmbedded DBMS JIT-friendly. In . 4:1–4:24. https://doi.org/10.4230/LIPIcs.ECOOP.2016.4[6] Robert Brummayer and Armin Biere. 2009. Fuzzing and Delta-Debugging SMTSolvers. In
Proceedings of the 7th International Workshop on Satisfiability ModuloTheories (Montreal, Canada) (SMT âĂŹ09) . Association for Computing Machinery,New York, NY, USA, 1âĂŞ5. https://doi.org/10.1145/1670412.1670413[7] Nicolas Bruno and Surajit Chaudhuri. 2005. Flexible Database Generators. In
Pro-ceedings of the 31st International Conference on Very Large Data Bases (Trondheim,Norway) (VLDB âĂŹ05) . VLDB Endowment, 1097âĂŞ1107.[8] Nicolas Bruno, Surajit Chaudhuri, and Ravi Ramamurthy. 2009. Power Hintsfor Query Optimization. In
Proceedings of the 2009 IEEE International Conferenceon Data Engineering (ICDE âĂŹ09) . IEEE Computer Society, USA, 469âĂŞ480.https://doi.org/10.1109/ICDE.2009.68[9] Nicolas Bruno, Surajit Chaudhuri, and Dilys Thomas. 2006. Generating Querieswith Cardinality Constraints for DBMS Testing.
IEEE Trans. on Knowl. and DataEng.
18, 12 (Dec. 2006), 1721–1725. https://doi.org/10.1109/TKDE.2006.190[10] Donald D. Chamberlin and Raymond F. Boyce. 1974. SEQUEL: A StructuredEnglish Query Language. In
Proceedings of the 1974 ACM SIGFIDET (Now SIG-MOD) Workshop on Data Description, Access and Control (Ann Arbor, Michigan) (SIGFIDET ’74) . ACM, New York, NY, USA, 249–264. https://doi.org/10.1145/800296.811515[11] Tsong Y Chen, Shing C Cheung, and Shiu Ming Yiu. 1998.
Metamorphic testing: anew approach for generating next test cases . Technical Report. Technical ReportHKUST-CS98-01, Department of Computer Science, Hong Kong.[12] Tsong Yueh Chen, Fei-Ching Kuo, Huai Liu, Pak-Lok Poon, Dave Towey, T. H.Tse, and Zhi Quan Zhou. 2018. Metamorphic Testing: A Review of Challenges andOpportunities.
ACM Comput. Surv.
Commun. ACM
13, 6 (June 1970), 377–387. https://doi.org/10.1145/362384.362685[15] E. F. Codd. 1972.
Relational Completeness of Data Base Sublanguages . IBMCorporation.[16] Pascal Cuoq, Benjamin Monate, Anne Pacalet, Virgile Prevosto, John Regehr, BorisYakobowski, and Xuejun Yang. 2012. Testing Static Analyzers with RandomlyGenerated Programs. In
Proceedings of the 4th International Conference on NASAFormal Methods (Norfolk, VA) (NFMâĂŹ12) . Springer-Verlag, Berlin, Heidelberg,120âĂŞ125. https://doi.org/10.1007/978-3-642-28891-3_12[17] DB-Engines. 2019. DB-Engines Ranking (July 2019). https://db-engines.com/en/ranking[18] Bailu Ding, Sudipto Das, Wentao Wu, Surajit Chaudhuri, and Vivek Narasayya.2018. Plan Stitch: Harnessing the Best of Many Plans.
Proc. VLDB Endow.
11, 10(June 2018), 1123âĂŞ1136. https://doi.org/10.14778/3231751.3231761[19] Ramez Elmasri and Sham Navathe. 2017.
Fundamentals of database systems . Vol. 7.Pearson.[20] Leo Giakoumakis and César A Galindo-Legaria. 2008. Testing SQL Server’s QueryOptimizer: Challenges, Techniques and Experiences.
IEEE Data Eng. Bull.
31, 1(2008), 36–43.[21] Torsten Grabs, Steve Herbert, and Xin (Shin) Zhang. 2008. Testing Challengesfor Extending SQL ServerâĂŹs Query Processor: A Case Study. In
Proceedings ofthe 1st International Workshop on Testing Database Systems (Vancouver, BritishColumbia, Canada) (DBTest âĂŹ08) . Association for Computing Machinery, NewYork, NY, USA, Article Article 2, 6 pages. https://doi.org/10.1145/1385269.1385272[22] Goetz Graefe. 1993. Query evaluation techniques for large databases.
ACMComputing Surveys (CSUR)
25, 2 (1993), 73–169. [23] Jim Gray, Prakash Sundaresan, Susanne Englert, Ken Baclawski, and Peter J. Wein-berger. 1994. Quickly Generating Billion-Record Synthetic Databases.
SIGMODRec.
23, 2 (May 1994), 243âĂŞ252. https://doi.org/10.1145/191843.191886[24] Zhongxian Gu, Mohamed A. Soliman, and Florian M. Waas. 2012. Testing theAccuracy of Query Optimizers. In
Proceedings of the Fifth International Workshopon Testing Database Systems (Scottsdale, Arizona) (DBTest ’12) . ACM, New York,NY, USA, Article 11, 6 pages. https://doi.org/10.1145/2304510.2304525[25] Antonin Guttman. 1984. R-Trees: A Dynamic Index Structure for Spatial Search-ing. In
Proceedings of the 1984 ACM SIGMOD International Conference on Manage-ment of Data (Boston, Massachusetts) (SIGMOD âĂŹ84) . Association for Comput-ing Machinery, New York, NY, USA, 47âĂŞ57. https://doi.org/10.1145/602259.602266[26] Kenneth HoukjÃęr, Kristian Torp, and Rico Wind. 2006. Simple and RealisticData Generation. In
Proceedings of the 32nd International Conference on Very LargeData Bases (Seoul, Korea) (VLDB âĂŹ06) . VLDB Endowment, 1243âĂŞ1246.[27] William E. Howden. 1978. Theoretical and Empirical Studies of Program Testing.In
Proceedings of the 3rd International Conference on Software Engineering (Atlanta,Georgia, USA) (ICSE ’78)
Proc. VLDB Endow.
13, 1 (Sept. 2019), 57âĂŞ70. https://doi.org/10.14778/3357377.3357382[31] Timotej Kapus and Cristian Cadar. 2017. Automatic Testing of Symbolic ExecutionEngines via Program Generation and Differential Testing. In
Proceedings of the32Nd IEEE/ACM International Conference on Automated Software Engineering (Urbana-Champaign, IL, USA) (ASE 2017) . IEEE Press, Piscataway, NJ, USA, 590–600.[32] S. A. Khalek, B. Elkarablieh, Y. O. Laleye, and S. Khurshid. 2008. Query-AwareTest Generation Using a Relational Constraint Solver. In
Proceedings of the 200823rd IEEE/ACM International Conference on Automated Software Engineering (ASE’08) . IEEE Computer Society, Washington, DC, USA, 238–247. https://doi.org/10.1109/ASE.2008.34[33] Vu Le, Mehrdad Afshari, and Zhendong Su. 2014. Compiler Validation via Equiv-alence Modulo Inputs. In
Proceedings of the 35th ACM SIGPLAN Conference onProgramming Language Design and Implementation (Edinburgh, United Kingdom) (PLDI ’14) . ACM, New York, NY, USA, 216–226. https://doi.org/10.1145/2594291.2594334[34] Eric Lo, Carsten Binnig, Donald Kossmann, M. Tamer Özsu, and Wing-Kai Hon.2010. A framework for testing DBMS features.
The VLDB Journal
19, 2 (01 Apr2010), 203–230. https://doi.org/10.1007/s00778-009-0157-y[35] Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh,Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. Neo: A LearnedQuery Optimizer.
Proc. VLDB Endow.
12, 11 (July 2019), 1705âĂŞ1718. https://doi.org/10.14778/3342263.3342644[36] William M McKeeman. 1998. Differential testing for software.
Digital TechnicalJournal
10, 1 (1998), 100–107.[37] Chaitanya Mishra, Nick Koudas, and Calisto Zuzarte. 2008. Generating TargetedQueries for Database Testing. In
Proceedings of the 2008 ACM SIGMOD Interna-tional Conference on Management of Data (Vancouver, Canada) (SIGMOD ’08) .ACM, New York, NY, USA, 499–510. https://doi.org/10.1145/1376616.1376668[38] Andrea Neufeld, Guido Moerkotte, and Peter C. Lockemann. 1993. GeneratingConsistent Test Data: Restricting the Search Space by a Generator Formula.
TheVLDB Journal
2, 2 (April 1993), 173âĂŞ214.[39] Thomas Neumann and Bernhard Radke. 2018. Adaptive Optimization of VeryLarge Join Queries. In
Proceedings of the 2018 International Conference on Manage-ment of Data (Houston, TX, USA) (SIGMOD âĂŹ18) . Association for ComputingMachinery, New York, NY, USA, 677âĂŞ692. https://doi.org/10.1145/3183713.3183733[40] Stack Overflow. 2019. Developer Survey Results 2019. https://insights.stackoverflow.com/survey/2019[41] Andrew Pavlo and Matthew Aslett. 2016. What’s Really New with NewSQL?
SIGMOD Rec.
45, 2 (Sept. 2016), 45âĂŞ55. https://doi.org/10.1145/3003665.3003674[42] Meikel Poess and John M. Stephens. 2004. Generating Thousand BenchmarkQueries in Seconds. In
Proceedings of the Thirtieth International Conference onVery Large Data Bases - Volume 30 (Toronto, Canada) (VLDB âĂŹ04) . VLDBEndowment, 1045âĂŞ1053.[43] John Regehr, Yang Chen, Pascal Cuoq, Eric Eide, Chucky Ellison, and XuejunYang. 2012. Test-Case Reduction for C Compiler Bugs. In
Proceedings of the 33rdACM SIGPLAN Conference on Programming Language Design and Implementation (Beijing, China) (PLDI âĂŹ12) etecting Optimization Bugs in Database Engines via NoREC ESEC/FSE ’20, November 8–13, 2020, Virtual Event, USA [45] Manuel Rigger and Zhendong Su. 2020. ESEC/FSE 20 Artifact for "DetectingOptimization Bugs in Database Engines via Non-Optimizing Reference EngineConstruction". https://doi.org/10.5281/zenodo.3947858[46] Manuel Rigger and Zhendong Su. 2020. Testing Database Engines via PivotedQuery Synthesis.[47] Sergio Segura and Zhi Quan Zhou. 2018. Metamorphic Testing 20 Years Later:A Hands-on Introduction. In
Proceedings of the 40th International Conferenceon Software Engineering: Companion Proceeedings (Gothenburg, Sweden) (ICSEâĂŹ18) . Association for Computing Machinery, New York, NY, USA, 538âĂŞ539.https://doi.org/10.1145/3183440.3183468[48] P. Griffiths Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G.Price. 1979. Access Path Selection in a Relational Database Management System.In
Proceedings of the 1979 ACM SIGMOD International Conference on Managementof Data (Boston, Massachusetts) (SIGMOD âĂŹ79) . Association for ComputingMachinery, New York, NY, USA, 23âĂŞ34. https://doi.org/10.1145/582095.582099[49] Andreas Seltenreich. 2019. SQLSmith. https://github.com/anse1/sqlsmith[50] Donald R Slutz. 1998. Massive stochastic testing of SQL. In
VLDB
Proceedings of the 2020 ACM SIGMOD International Conference on Managementof Data (Portland, OR, USA) (SIGMOD âĂŹ20) . International Foundation forAutonomous Agents and Multiagent Systems, Richland, SC, 1493âĂŞ1509. https://doi.org/10.1145/3318464.3386134[57] Tencent Blade Team. 2019. Magellan 2.0. https://blade.tencent.com/magellan2/index_en.html[58] Manasi Vartak, Venkatesh Raghavan, and Elke A. Rundensteiner. 2010. QRelX:Generating Meaningful Queries That Provide Cardinality Assurance. In
Proceed-ings of the 2010 ACM SIGMOD International Conference on Management of Data (In-dianapolis, Indiana, USA) (SIGMOD âĂŹ10) . Association for Computing Machin-ery, New York, NY, USA, 1215âĂŞ1218. https://doi.org/10.1145/1807167.1807323[59] Marianne Winslett and Vanessa Braganholo. 2019. Richard Hipp Speaks Outon SQLite.
SIGMOD Rec.
48, 2 (Dec. 2019), 39âĂŞ46. https://doi.org/10.1145/3377330.3377338[60] Chenggang Wu, Alekh Jindal, Saeed Amizadeh, Hiren Patel, Wangchao Le, ShiQiao, and Sriram Rao. 2018. Towards a Learning Optimizer for Shared Clouds.
Proc. VLDB Endow.
12, 3 (Nov. 2018), 210âĂŞ222. https://doi.org/10.14778/3291264.3291267[61] Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding andUnderstanding Bugs in C Compilers. In
Proceedings of the 32Nd ACM SIG-PLAN Conference on Programming Language Design and Implementation (SanJose, California, USA) (PLDI ’11)(PLDI ’11)