Elle: Inferring Isolation Anomalies from Experimental Observations
aa r X i v : . [ c s . D B ] M a r Elle: Inferring Isolation Anomalies from ExperimentalObservations
Kyle Kingsbury
Jepsen [email protected] Peter Alvaro
UC Santa Cruz [email protected]
ABSTRACT
Users who care about their data store it in databases, which(at least in principle) guarantee some form of transactionalisolation. However, experience shows [26, 24] that manydatabases do not provide the isolation guarantees they claim.With the recent proliferation of new distributed databases,demand has grown for checkers that can, by generating clientworkloads and injecting faults, produce anomalies that wit-ness a violation of a stated guarantee. An ideal checkerwould be sound (no false positives), efficient (polynomialin history length and concurrency), effective (finding viola-tions in real databases), general (analyzing many patternsof transactions), and informative (justifying the presence ofan anomaly with understandable counterexamples). Sadly,we are aware of no checkers that satisfy these goals.We present
Elle : a novel checker which infers an Adya-style dependency graph between client-observed transactions.It does so by carefully selecting database objects and oper-ations when generating histories, so as to ensure that theresults of database reads reveal information about their ver-sion history.
Elle can detect every anomaly in Adya et al’sformalism [2] (except for predicates), discriminate betweenthem, and provide concise explanations of each.This paper makes the following contributions: we present
Elle , demonstrate its soundness, measure its efficiency againstthe current state of the art, and give evidence of its effec-tiveness via a case study of four real databases.
1. INTRODUCTION
Database systems often offer multi-object transactions atvarying isolation levels, such as serializable or read commit-ted. However, design flaws or bugs in those databases mayresult in weaker isolation levels than claimed. In order toverify whether a given database actually provides claimedsafety properties, we can execute transactions against thedatabase, record a concurrent history of how those transac-tions completed, and analyze that history to identify invari-ant violations. This property-based approach to verification is especially powerful when combined with fault injectiontechniques. [29]Many checkers use a particular pattern of transactions,and check that under the expected isolation level, somehand-proved invariant(s) hold. For instance, one could checkfor long fork (an anomaly present in parallel snapshot isola-tion) by inserting two records x and y in separate transac-tions, and in two more transactions, reading both records. Ifone read observes x but not y , and the other observes y butnot x , then we have an example of a long fork, and can con-clude that the system does not provide snapshot isolation—or any stronger isolation level.These checkers are generally efficient (i.e. completing inpolynomial time), and do identify bugs in real systems, butthey have several drawbacks. They find a small numberof anomalies in a specific pattern of transactions, and tellus nothing about the behavior of other patterns. They re-quire hand-proven invariants: one must show that for cho-sen transactions under a given consistency model, those in-variants hold. They also do not compose: transactions weexecute for one checker are, in general, incompatible withanother checker. Each property may require a separate test.Some researchers have designed more general checkers whichcan analyze broader sets of possible transactions. For in-stance, Knossos [22] and
Porcupine [3] can verify whetheran arbitrary history of operations over a user-defined datatypeis linearizable, using techniques from Wing & Gong [37] andLowe [28]. Since strict-1SR is equivalent to linearizability(where operations are transactions, and the linearizable ob-ject is a map), these checkers can be applied to strict se-rializable databases as well. While this approach does findanomalies in real databases, its use is limited by the NP-complete nature of linearizability checking, and the com-binatorial explosion of states in a concurrent multi-registersystem.Serializability checking is also (in general) NP-complete [30]—and unlike linearizability, one cannot use real-time constraintsto reduce the search space. Following the abstract exe-cution formalism of Cerone, Bernardi, and Gotsman [12],Kingsbury attempted to verify serializability by identify-ing write-read dependencies between transactions, translat-ing those dependencies to an integer constraint problemon transaction orders [23], and applying off-the-shelf con-straint solvers like
Gecode [35] to solve for a particularorder. This approach works, but, like
Knossos , is limitedby the NP-complete nature of constraint solving. Historiesof more than a hundred-odd transactions quickly becomeintractable. Moreover, constraint solvers give us limited in-1ight into why a particular transaction order was unsolv-able. They can only tell us whether a history is serializableor not, without insight into specific transactions that mayhave gone wrong. Finally, this approach cannot distinguishbetween weaker isolation levels, such as snapshot isolationor read committed.What would an ideal checker for transaction isolation looklike? Such a checker would accept many patterns of transac-tions, rather than specific, hand-coded examples. It woulddistinguish between different types of anomalies, allowing usto verify stronger (e.g. strict-1SR) and weaker (e.g. read un-committed) isolation levels. It ought to localize the faults itfinds to specific subsets of transactions. Of course, it shoulddo all of this efficiently.In this paper, we present
Elle : an isolation checker forblack-box databases. Instead of solving for a transactionorder,
Elle uses its knowledge of the transactions issuedby the client, the objects written, and the values returnedby reads to reason about the possible dependency graphs ofthe opaque database in the language of Adya’s formalism. [2]While
Elle can make limited inferences from read-write reg-isters, it shines with richer datatypes, like append-only lists.All history checkers depend on the system which generated their transactions.
Elle ’s most powerful analysis requiresthat we generate histories in which reads of an object corre-spond return its entire version history, and where a uniquemapping exists between versions and transactions. How-ever, we show that generating histories which allow theseinferences is straightforward, that the required datatypesare broadly supported, and that these choices do not prevent
Elle from identifying bugs in real-world database systems.
2. THE ADYA FORMALISM
In their 2000 paper “Generalized Isolation Level Defini-tions”, Adya, Liskov, and O’Neil formalized a variety oftransactional isolation levels in terms of proscribed behav-iors in an abstract history H . Adya et al.’s histories (here-after: ”Adya histories”) comprise of a set of transactions,an event order which encodes the order of operations inthose transactions, and a version order ≪ : a total orderover installed versions of each object. Two anomalies, G1a(aborted reads) and G1b (intermediate reads), are defined astransactions which read versions written by aborted trans-actions, or which read versions from the middle of someother transaction, respectively. The remainder are definedin terms of cycles over a Direct Serialization Graph (DSG),which captures the dependencies between transactions. Set-ting aside predicates, Adya et al.’s dependencies are: • Directly write-depends . T i installs x i , and T j in-stalls x ’s next version • Directly read-depends . T i installs x i , T j reads x i • Directly anti-depends . T i reads x i , and T j installs x ’s next versionThe direct serialization graph DSG ( H ) is a graph overtransactions in some history H , whose edges are given bythese dependencies. A G0 anomaly is a cycle in the DSGcomprised entirely of write dependencies. G1c anomaliesinclude read dependencies. Instances of G2 involve at leastone anti-dependency (those with exactly one are G-single). This is a tantalizing model for several reasons. Its defi-nitions are relatively straightforward. Its anomalies are lo-cal , in the sense that they involve a specific set of transac-tions. We can point to those transactions and say, ”Thingswent wrong here!”—which aids debugging. Moreover, wecan check these properties in linear time : intermediate andaborted reads are straightforward to detect, and once we’veconstructed the dependency graph, cycle detection is solv-able in O(vertices + edges) time, thanks to Tarjan’s algo-rithm for strongly connected components [34].However, there is one significant obstacle to working withan Adya history: we don’t have it . In fact, one may noteven exist. The database system may not have any con-cept of a version order, or it might not expose that orderinginformation to clients.As an example, consider Adya et al.’s history H serial , asit would be observed by clients. Is this history serializable? T : w ( z ) , w ( x ) , w ( y ) , cT : r ( x ) , w ( y ) , cT : w ( x ) , r ( y ) , w ( z ) , cT read x , so it must read-depend on T , and likewise, T must read-depend on T . What about T and T ’s writesto y ? Which overwrote the other? As Crooks et al observe[17], we cannot tell, because we lack a key component inAdya’s formalism: the version order. H serial includes ad-ditional information ( x ≪ x , y ≪ y , z ≪ z ) which isinvisible to clients. We might consider deducing the versionorder from the real-time order in which writes or commitstake place, but Adya et al explicitly rule this out, since opti-mistic and multi-version implementations might require thefreedom to commit earlier versions later in time. Moreover,network latency may make it impossible to precisely deter-mine concurrency windows.We would like to be able to infer an Adya history basedpurely on the information available to clients of the system,which we call an observation . When a client submits anoperation to a database, it generally knows what kind ofoperation it performed (e.g. a read, a write, etc.), the objectthat operation applies to, and the arguments it provided.For instance, a client might write the value 5 to object x .If the database returns a response for an operation, we mayalso know a return value; for instance, that a read of x returned the number 5.Clients know the order of operations within every trans-action. They may also know whether a transaction was def-initely aborted, definitely committed, or could be either,based on whether a commit request was sent, and how (orif) the database acknowledged that request. Clients can alsorecord their own per-client and real-time orders of events.This is not, in general, enough information to go on. Con-sider a pair of transactions which set x to 1 and 2, respec-tively. In the version order, does 1 or 2 come first? We can’tsay! Or consider an indeterminate transaction whose effectsare not observed. Did it commit? We have no way to tell.This implies there might be many possible histories whichare compatible with a given observation. Are there condi-tions under which only one history is possible? Or, if morethan one is possible, can we infer something about the struc-ture of all of them which allows us to identify anomalies?2e argue that yes: one can infer properties of every his-tory compatible with a given observation, by taking advan-tage of datatypes which allow us to trace the sequence ofversions which gave rise to a particular version of an object,and which let us recover the particular writes which gaverise to those versions. Next, we provide an intuition for howthis can be accomplished.
3. DEDUCING DEPENDENCIES
Consider a set of observed transactions interacting withsome read-write register x . One transaction T j read x andobserved some version x i . Another transaction T i wrote x i to x . In general, we cannot tell whether T i was the trans-action which produced x i , because some other transactionmight have written x i as well. However, if we know that noother transaction wrote x i , then we can recover the partic-ular transaction which wrote x i : T i . This allows us to infera direct write-read dependency: T i < wr T j .If every value written to a given register is unique , thenwe can recover the transaction which gave rise to any ob-served version. We call this property recoverability : everyversion we observe can be mapped to a specific write in someobserved transaction.Recoverability allows us to infer read dependencies. How-ever, inferring write- and anti-dependencies takes more thanrecoverability: we need the version order ≪ . Read-writeregisters make inferring ≪ impossible in general: if twotransactions set x to x i and x j respectively, we can’t tellwhich came first.In a sense, blind writes to a register “destroy history”.If we used a compare-and-set operation, we could tell some-thing about the preceding version, but a blind write can suc-ceed regardless of what value was present before. Moreover,the version resulting from a blind write carries no informa-tion about the previous version. Let us therefore considerricher datatypes, whose writes do preserve some informationabout previous versions.For instance, we could take increment operations on coun-ters , starting at 0. If every write is an increment, then theversion history of a register should be something like (0, 1, 2,. . . ). Given two transactions T i and T j , both of which readobject x , we can construct a read-read dependency T i < rr T j if T i ’s observed value of x is smaller than T j ’s. However, anynon-trivial increment history is non-recoverable , because wecan’t tell which increment produced a particular version.This keeps us from inferring write-write, write-read, andread-write dependencies. We could return the resulting ver-sion from our writes, but this works only when the clientreceives acknowledgement of its request(s).What if we let our objects be sets of elements, and hadeach write add a unique element to a given set? Like coun-ters, this lets us recover read-read dependencies wheneverone version is a proper superset of another. Moreover, wecan recover some (but not all) write-write, write-read, andread-write dependencies. Consider these observed transac-tions: T : read ( x, { } ) T : add ( x, This approach is used by Crooks et al, and has a longhistory in the literature. T : add ( x, T : read ( x, { , , } )Since { , , } is a proper superset of { } , we know that T read a higher version of x than T , and can infer T < rr T .In addition, we can infer that T < wr T and T < wr T ,since their respective elements 1 and 2 were both visible to T . Conversely, since T ’s read of 0 did not include 1 or 2, wecan infer that T < rw T and T < rw T : anti-dependencyrelations! We cannot, however, identify the write-write de-pendency between T and T : the database could have ex-ecuted T and T in either order with equivalent effects,because sets are order-free. Only a read of { , } or { , } could resolve this ambiguity.So, let’s add order to our values. Let each version be anordered list , and a write to x append a unique value to x .Then any read of x i tells us the order of all versions writ-ten prior: x i = [1 , ,
3] implies that x took on the versions[] , [1] , [1 , , ,
3] in exactly that order. We call thisproperty traceability . Since appends add a unique elementto the end of x , we can also infer the exact write which gaverise to any observed version: these versions are recoverable as well.As with counters and sets, we can use traceability to re-construct read-read, write-read, and read-write dependencies—but because we have the full version history, we can preciselyidentify read-write and write-write dependencies for everytransaction whose writes were observed by some read. Weknow, for instance, that the transaction which added 2 to x must write-depend on the transaction which added 1 to x , because we have a read of x wherein 1 immediately pre-cedes 2. There may be some writes near the end of a historywhich are never observed, but so long as histories are longand include reads every so often, the unknown fraction of aversion order can be made relatively small.Recoverability and traceability are the key to inferringdependencies between observed transactions, but we haveglossed over the mapping between observed dependenciesand Adya histories, as well as the challenges arising fromaborted and indeterminate transactions. In the followingsection, we discuss these issues more rigorously.
4. FORMAL MODEL
In this section we present our abstract model of databasesand transactional isolation. We establish the notions of traceability and recoverability , which we prove to be suffi-cient to reason directly from external observations to inter-nal histories. We show that this approach is sound : that is,any anomalies identified in an observation must be presentin any Adya history consistent with that observation.Due to space considerations, we do not present the formaldefinitions of traceability and recoverability or their accom-panying proofs here; instead, we summarize these results.
We begin our formalism by defining a model of a database,transactions, and histories, closely adapted from Adya etal. We omit predicates for simplicity, and generalize Adya’sread-write registers to objects supporting arbitrary trans-formations as writes, resulting in a version graph . We con-strain Adya’s version order ≪ to be compatible with thisversion graph. This generalization introduces a new class ofanomaly, dirty updates , which we define in Section 4.3.1.3bject Versions x init WritesRegister Any nil w ( x i , a ) −→ ( a, nil )Counter Integers 0 w ( x i , a ) −→ ( x i + a, nil )Set Add Sets {} w ( x i , a ) −→ ( x i ∪ { a } , nil )List Append Lists [] w ([ e , . . . , e n ] , a ) −→ ([ e , . . . , e n , a ] , nil ) Figure 1: Example objects. An object x is a mutable datatype, consisting of a set of versions , written x i , x j , etc., an initial version , labeled x init , and a set of object operations .An object operation represents a state transition betweentwo versions x i and x j of some object x . Object operationstake an argument a and produce a return value r . We writethis as f ( x, x i , a ) −→ ( x j , r ). Where the object, argument,return value, or return tuple can be inferred from contextor are unimportant, we may omit them: f ( x i , a ) −→ ( x j ), f ( x i ) −→ ( x j ), f ( x i ), etc.Like Adya, we consider two types of operations: reads( r ) and writes ( w ). A read takes no argument, leaves theversion of the object unchanged, and returns that version: r ( x i , nil ) −→ ( x i , x i ).As we show in Figure 1, a write operation w changes aversion somehow. Write semantics are object-dependent.Adya’s model (like much work on transactional databases)assumes objects are registers, and that writes blindly replacethe current value. Our other three objects incorporate in-creasingly specific dependencies on previous versions. In thissection we are primarily concerned with objects like list ap-pend, but we provide definitions for the first three objectsas illustrative examples.The versions and write operations on an object x togetheryield a version graph v x : a directed graph whose vertices areversions, and whose edges are write operations.A database is a set of objects x , y , etc. A transaction is a list (a totally ordered set) of object op-erations, followed by at most one commit or abort operation.Transactions also include a unique identifier for disambigua-tion.We say a transaction is committed if it ends in a commit,and aborted if it ends in an abort. We call the version result-ing from a transaction’s last write to object x its final version of x ; any other writes of x result in intermediate versions .If a transaction commits, we say it installs all final versions;we call these installed versions . Unlike Adya, we use com-mitted versions to refer to versions written by committedtransactions, including intermediate versions. Versions fromtransactions which did not commit are called uncommittedversions .The initial version of every object is considered commit-ted. Per Adya et al, a history H comprises a set of transac-tions T on objects in a database D , a partial order E over For simplicity, we assume versions are values, and thatversions do not repeat in a history. operations in T , and a version order ≪ over versions of theobjects in D .The event order E has the following constraints: • It preserves the order of all operations within a trans-action, including commit or abort events. • One cannot read from the future: if a read r ( x i ) isin E , then there must exist a write w −→ ( x i ) whichprecedes it. • Transactions observe their own writes: if a transaction T contains w ( x i ) followed by r ( x j ), and there exists no w ( x k ) between the write and read of x i in T , x i = x j . • The history must be complete: if E contains a read orwrite operation for a transaction T i , E must contain acommit or abort event for T i .The version order ≪ has two constraints, per Adya et al • It is a total order over installed versions of each indi-vidual object; there is no ordering of objects written byuncommitted, aborted, or intermediate transactions. • The version order for each object x (which we write ≪ x ) begins with the initial version x init .Driven by the intuition that the version order should beconsistent with the order of operations on an object, that x i ≪ x j means x j overwrote x i , and that cycles in ≪ aremeant to capture anomalies like circular information flow ,we introduce an additional constraint that was not necessaryin Adya et al.’s formalism: the version order ≪ should beconsistent with the version graphs v x , v y , . . . .Specifically, if x i ≪ x j , there exists a path from x i to x j in v x . It would be odd if one appended 3 to the list [1 , , , , , ≪ [1 , We define write-write, read-write, and write-read depen-dencies between transactions, adapted directly from Adya’sformalism.Finally, we (re-)define the
Direct Serialization Graph us-ing those dependencies. The anomalies we wish to find areexpressed in terms of that serialization graph. • Direct write-write dependency . A transaction T j directly ww-depends on T i if T i installs a version x i of x , and T j installs x ’s next version x j , by ≪ . • Direct write-read dependency . A transaction T j directly wr-depends on T i if T i installs some version x i and T j reads x i . • Direct read-write dependency . A transaction T j directly rw-depends on T i if T i reads version x i of x ,and T j installs x ’s next version in ≪ .4 Direct Serialization Graph , or
DSG , is a graph of de-pendencies between transactions. The DSG for a history H is denoted DSG( H ). If T j ww-depends on T i , there existsan edge labeled ww from T i to T j in DSG(H), and similarlyfor wr- and rw-dependencies. Adya defines
G1a: aborted read as a committed transac-tion T containing a read of some value x i which was writ-ten by an aborted transaction T . However, our abstractdatatypes allow a transaction to commit a write which in-corporates aborted state.We therefore define a complementary phenomenon to G1a,intended to cover scenarios in which information “leaks”from uncommitted versions to committed ones via writes. Ahistory exhibits dirty update if it contains an uncommittedtransaction T which writes x i , and a committed transaction T which contains a write acting on x i . We now define a class of traceable objects , which permitrecovery of the version graph and object operations resultingin any particular version.Recall that for an object x , the version graph v x is com-prised of versions connected by object operations. We calla path in v x from x init to some version x i a trace of x i .Intuitively, a trace captures the version history of an object.We say an object x is traceable if every version x i hasexactly one trace; i.e. v x is a tree.Given a history with version order ≪ , we call the largestversion of x (by ≪ x ) x max . Because ≪ is a total order overinstalled versions, and because a path in the version graphexists between any two elements ordered by ≪ , it followsthat every committed version of x is in the trace of x max .Moreover, for any installed version x i of x , we can recoverthe prefix of ≪ x up to and including x i simply by removingintermediate and aborted versions from the trace of x i .Restricting our histories to traceable objects (e.g., lists)will allow us to directly reason about the version order ≪ using the results of individual read operations. When we interact with a database system, the history maynot be accessible from outside the database—or perhaps no“real” history exists at all. We construct a formal “theoryof mind” which allows us to reason about potential Adyahistories purely from client observations.We define an observation of a system as a set of experimentally-accessible transactions where versions, return values, andcommitted states may be unknown, and consider the inter-pretations of that observation—the set of all histories whichcould have resulted in that particular observation.To be certain that an external observation constitutes anirrefutable proof of an internal isolation anomaly requiresthat observations have a unique mapping between versionsand observed transactions, a notion we call recoverability . It appears that Adya et al’s read dependencies don’t ruleout a transaction depending on itself. We preserve theirdefinitions here, but assume that in serialization graphs, T i = T j . We provide practical, sufficient conditions to produce recov-erable histories. Imagine a set of single-threaded logical processes whichinteract with a database system as clients. Each processsubmits transactions for execution, and may receive infor-mation about the return values of the operations in thosetransactions. What can we tell from the perspective of thoseclient processes?Recall that an object operation has five components: f ( x, x i , a ) −→ ( x j , r ) denotes an operation on object x whichtakes version x i and combines it with argument a to yieldversion x j , returning r . When a client makes a write, itknows the object x and argument a , but (likely) not theversions x i or x j . If the transaction commits, the clientmay know the return value r , but might not if, for example,a response message is lost by the network.We define an observed object operation as an operationwhose versions and return value may be unknown. We writeobserved operations with a hat: ˆ w ( x, , −→ ( , nil ) denotesan observed write of 3 to object x , returning nil ; the versionsinvolved are unknown. An observed operation is either anobserved object operation, a commit, or an abort.An observed transaction , written ˆ T i , is a transaction com-posed of observed operations. If a client attempts to abort,or does not attempt to commit, a transaction, the observedtransaction ends in an abort. If a transaction is known tohave committed, it ends in a commit operation. However,when a client attempts to commit a transaction, but the re-sult is unknown, e.g. due to a timeout or database crash,we leave the transaction with neither a commit nor abortoperation.An Observation O represents the experimentally-accessibleinformation about a database system’s behavior. Observa-tions have a set of observed transactions ˆ T . We assume ob-servations include every transaction executed by a databasesystem. We say that O is determinate if every transactionin ˆ T is either committed or aborted; e.g. there are no inde-terminate transactions. Otherwise, O is indeterminate .Consider the set X c of versions of a traceable object x read by committed transactions in some observation O . Wedenote a single version with a trace longer than any other x longest —if there are multiple longest traces, any will do. Wesay O is consistent if for all x in O , every x i ∈ X c appears inthe trace of x longest . Otherwise, O is inconsistent . We willfind x longest helpful in inferring as much of ≪ as possible. Intuitively, an observed operation ˆ o could be a witness toan “abstract” operation o if the two execute the same typeof operation on the same key with the same argument, andtheir return values and versions don’t conflict. We capturethis correspondence in the notion of compatibility .Consider an operation o = f ( x i , a ) −→ ( x j , r ) and an ob-served operation ˆ o = ˆ f ( ˆ x i , ˆ a ) −→ (ˆ x j , ˆ r ). We say that o is compatible with ˆ o iff: • ˆ f = f • ˆ a = a • ˆ r is either unknown or equal to r • ˆ x i is either unknown or equal to x i ˆ x j is either unknown or equal to x j We may now define a notion of compatibility among trans-actions that builds upon object compatibility. Consider anabstract transaction T i and an observed transaction ˆ T i . Wesay that T i is compatible with ˆ T i iff: • They have the same number of object operations. • Every object operation in T i is compatible with itscorresponding object operation in ˆ T i . • If T i committed, ˆ T i is not aborted, and if ˆ T i commit-ted, T i committed too. • If T i aborted, ˆ T i is not committed, and if ˆ T i aborted, T i aborted too.Finally, we generalize the notion of compatibility to en-tire histories and observations. Consider a history H andan observation O , with transaction sets T and ˆ T respec-tively. We say that H is compatible with O iff there existsa one-to-one mapping R from ˆ T to T such that ∀ ( ˆ T i , T i ) ∈ R, T i is compatible with ˆ T i . We call any ( H, R ) which sat-isfies this constraint an interpretation of O . Given an inter-pretation, we say that T i = R ˆ T i is the corresponding trans-action to ˆ T i , and vice versa.There may be many histories compatible with a given ob-servation. For instance, an indeterminate observed trans-action may either commit or abort in a compatible history.Given two increment transactions T and T with identicaloperations, there are two choices of R for any history H , cor-responding to the two possible orders of T and T . Theremay also be many observations compatible with a given his-tory: for instance, we could observe transaction T ’s com-mit, or fail to observe it and label T indeterminate.In each interpretation of an observation, every observedtransaction corresponds to a distinct abstract transactionin that interpretation’s history, taking into account that wemay not know exactly what versions or return values wereinvolved, or whether or not observed transactions actuallycommitted. These definitions of compatible formalize an in-tuitive ”theory of mind” for a database: what we think couldbe going on behind the scenes. Traceability allows us to derive version dependencies, butin order to infer transaction dependencies, we need a way tomap between versions and observed transactions. We alsoneed a way to identify aborted and intermediate versions,which means proving which particular write in a transactionyielded some version(s). To do this, we exploit the definitionof reads, and a property relating versions to observed writes,which we call recoverability .The definition of a read requires that the pre-version, post-version, and return value are all equal. This means for anobserved committed read, we know exactly what version itobserved—and conversely, given a version, we know whichreads definitely observed it. We say an observed transac-tion ˆ T i read x i when x i is returned in one of ˆ T i ’s reads. Bycompatibility, every corresponding transaction T i must also have read x i .Writes are more difficult, because in general multiple writescould have resulted in a given version. For example, consider Indeterminate reads, of course, may have read different val-ues in different interpretations. two observed increment operations ˆ w = w ( x, , −→ ( , nil )and ˆ w = w ( x, , −→ ( , nil ). Which of these writes re-sulted in, say, the version 2? It could be either ˆ w or ˆ w .We cannot construct a single transaction dependency graphfor this observation. We could construct a (potentially ex-ponentially large) set of dependency graphs, and check eachone for cycles, but this seems expensive. To keep our analy-sis computationally tractable, we restrict ourselves to thosecases where we can infer a single write, as follows.Given an observation O and an object x with some version x i , we say that x i is recoverable iff there is exactly one writeˆ w i in O which is compatible with any write leading to x i in the version graph v x . We call ˆ w i recoverable as well, andsay that x i must have been written by ˆ w i . Since there isonly one ˆ w i , there is exactly one transaction ˆ T i in O whichperformed ˆ w i .Thanks to compatibility, any interpretation of O has ex-actly one w i compatible with ˆ w i , again performed by aunique transaction T i . When a version is recoverable, weknow which single transaction performed it in every inter-pretation.We say a version x i is known-aborted if it is recoverable toan aborted transaction, known-committed if it is recoverableto a committed transaction, and known-intermediate if itis recovered to a non-final write. By compatibility, theseproperties apply not just to an observation O , but to everyinterpretation of O .We say an observation O is completely recoverable if everywrite in O is recoverable. O is intermediate-recoverable ifevery intermediate write in O is recoverable. O is trace-recoverable if, for every x in O , x is traceable, and everynon-initial version in the trace of every committed read of x is recoverable.We can obtain complete recoverability for a register bychoosing unique arguments for writes. Counters and setsare difficult to recover in general: a set like { , } could haveresulted either from a write of 1 or 2. However, restrict-ing observations to a single write per object makes recoverytrivial.For traceable objects, we can guarantee an observation O is trace-recoverable when O satisfies three criteria:1. Every argument in the observed writes to some objectis distinct.2. Given a committed read of x i , every argument to everywrite in the trace of x i is distinct.3. Given a committed read of x i , every write in the traceof x i has a compatible write in O .We can ensure the first criterion by picking unique val-ues when we write to the database. We can easily detectviolations of the remaining two criteria, and each pointsto pathological database behavior: if arguments in tracesare not distinct, it implies some write was somehow applied twice ; and if a trace’s write has no compatible write in O ,then it must have manifested from nowhere.Similar conditions suffice for intermediate-recoverability.With a model for client observations, interpretations ofthose observations, and ways to map between versions and We can define a weaker notion of recoverability which iden-tifies all writes in the causal past of some version, but welack space to discuss it here.6bserved operations, we are ready to infer the presence ofanomalies.
We would like our checker to be sound : if it reports ananomaly in an observation, that anomaly should exist inevery interpretation of that observation. We would also likeit to be complete : if an anomaly occurred in an history, weshould be able to find it in any observation of that history.In this section, we establish the soundness of
Elle formally,and show how our approach comes close to guaranteeingcompleteness.The anomalies identified by Adya et al. can be broadlysplit into two classes. Some anomalies involve transactionsoperating on versions that they should not have observed, ei-ther because an aborted transaction wrote them or becausethey were not the final write of some committed transac-tion. Our soundness proof must show that if one of theseanomalies is detected in an observation, it surely occurredin every interpretation of that observation. Others involvea cycle in the dependency graph between transactions; weshow that given an observation, we can can construct a de-pendency graph which is a subgraph of every possible historycompatible with that observation. If we witness a cycle inthe subgraph, it surely exists in any compatible history.We begin with the first class: non-cycle anomalies.
We can use the definition of compatibility, along withproperties of traceable objects and recoverability, to inferwhether or not an observation implies that every interpreta-tion of that observation contains aborted reads, intermediatereads, or dirty updates.
Direct Observation
Consider an observation O with aknown-aborted version x i . If x i is read by an observed com-mitted transaction ˆ T i , that read must correspond to a com-mitted read of an aborted version in every interpretation of O : an aborted read. A similar argument holds for interme-diate reads. Inconsistent Observations
For traceable objects, we cango further. If an observation O is inconsistent, it contains acommitted read of some version x i which does not appear inthe trace of x longest . As previously discussed, all committedversions of x must be in the trace of x max . At most one of x longest or x i may be in this trace, so at least one of themmust be aborted. Via Traces
Consider a committed read of some value x c whose trace contains a known-aborted version x a . Either x c is aborted (an aborted read), or a dirty update exists be-tween x a and x c . An similar argument allows us to identifydirty updates when x c is the product of a known-committedwrite. The closer x a and x c are in the version graph, thebetter we can localize the anomaly. Completeness
The more recoverable a history is, andthe fewer indeterminate transactions it holds, the more non-cycle anomalies we can catch. If an observation is determi-nate and trace-recoverable, we know exactly which readscommitted in every interpretation, and exactly which writesaborted, allowing us to identify every case of aborted read.Finding every dirty update requires complete recoverability.For an intermediate-recoverable observation O , we canidentify every intermediate read. We can do the same if O istrace-recoverable. Let x i be a version read by a committed read in O . Trace-recoverability ensures x i is recoverable toa particular write, and we know from that write’s positionin its observed transaction whether it was intermediate ornot. Compatibility ensures all interpretations agree.In practice, observations are rarely complete, but as weshow in section 7, we typically observe enough of a historyto detect the presence of non-cycle anomalies. The remainder of the anomalies identified by Adya et al.are defined in terms of cycles in the dependency graph be-tween transactions. Given an observation O , we begin byinferring constraints on the version order ≪ , then use prop-erties of reads and recoverability to map dependencies on versions into dependencies on transactions . Inferred Version Orders
Consider a intermediate-recoverableobservation O of a database composed of traceable objects ,and an interpretation ( H, R ) of O . We wish to show that wecan derive part of H ’s version order ≪ from O alone, withminimal constraints on H and R . Traceability allows us torecover a prefix of ≪ x from any installed x i in H , assum-ing we know which transactions in the trace of x i commit-ted, and which aborted. Let us assume H does not containaborted reads, intermediate reads, or dirty updates. We callsuch a history clean .Given O , which version of x should we use to recover ≪ x ? Ideally, we would have x max . However, there could bemultiple interpretations of O with distinct x max . Instead,we take a version x f read by a transaction ˆ T f such that: • ˆ T f is committed. • ˆ T f read x f before performing any writes to x . • No other version of x satisfying the above propertieshas a longer trace than x f .We use x f to obtain an inferred version order < x thatis consistent with ≪ x , as follows. First, we know that x f corresponds to an installed version of x in H because H contains no intermediate or aborted reads. By a similar ar-gument, we also know that every version of x in the trace of x f was written by a committed transaction. Therefore, if weremove the intermediate versions in the trace of x f (whichwe know, thanks to intermediate-recoverability), we are leftwith a total order over committed versions that correspondsdirectly to the prefix of ≪ x up to and including x f .We define < as the union of < x for all objects x . Inferred Serialization Graphs
Given an intermediate-recoverable observation O of a database of traceable objects,we can infer a chain of versions < x which is a prefix of ≪ x ,for every object x in O . If O is trace-recoverable, we canmap every version in < to a particular write in O whichproduced it, such that the corresponding write in every in-terpretation of O produced that same version. Using theserelationships, we define inferred dependencies between pairsof transactions ˆ T i and ˆ T j in O as follows: • Direct inferred write-write dependency . A trans-action ˆ T j directly inferred-ww-depends on ˆ T i if ˆ T i per-forms a final write of a version x i of x , and ˆ T j performsa final write resulting in x ’s next version x j , by < . We can also derive weaker constraints on the version orderfrom non-traceable objects, which we leave as an exercisefor the reader.7
Direct inferred write-read dependency . A trans-action ˆ T j directly inferred-wr-depends on ˆ T i if ˆ T i per-forms a final write of a version x i in < , and ˆ T j reads x i . • Direct inferred read-write dependency . A trans-action ˆ T j directly inferred-rw-depends on ˆ T i if ˆ T i readsversion x i of x , and ˆ T j performs a final write of x ’snext version in < .Unlike Adya et al’s definitions, we don’t require that atransaction install some x i , because an indeterminate trans-action in O might be committed in interpretations of O ,and have corresponding dependency edges there. Instead,we rely on the fact that < only relates installed versions (inclean interpretations).An Inferred Direct Serialization Graph , or
IDSG , is agraph of dependencies between observed transactions. TheIDSG for an observation O is denoted IDSG( O ). If ˆ T j inferred-ww-depends on ˆ T i , there exists an edge labeled ww from ˆ T i to ˆ T j in IDSG( O ), and similarly for inferred-wr andinferred-rw-dependencies.All that remains is to show that for every clean interpre-tation ( H, R ) of an observation, IDSG( O ) is (in some sense)a subgraph of DSG( H ). However, the IDSG and DSG aregraphs over different types of transactions; we need the bi-jection R to translate between them. Given a relation R and a graph G , we write R ⋄ G to denote G with each vertex v replaced by Rv .The soundness proof for Elle first establishes that forevery clean interpretation (
H, R ) of a trace-recoverable ob-servation O , R ⋄ IDSG ( O ) is a subgraph of DSG ( H ). Theproof proceeds by cases showing that for each class of de-pendency, if a given edge exists in the IDSG, it surely existsin every compatible DSG. We omit these details, which arestraightforward, in this paper.For every anomaly defined in terms of cycles on a DSG(e.g. G0, G1c, G-Single, G2, . . . ), we can now define a cor-responding anomaly on an IDSG. If we detect that anomalyin IDSG( O ), its corresponding anomaly must be present inevery clean interpretation of O as well!We present a soundness theorem for Elle below:
Theorem Given a trace-recoverable observation O , if Elle infers aborted reads, dirty updates, or intermediatereads, then every interpretation of O exhibits correspond-ing phenomena. If Elle infers a cycle anomaly, then everyclean interpretation of O exhibits corresponding phenomena.Unclean Interpretations What of unclean interpreta-tions, like those with aborted reads or dirty updates? Ifthose occurred, the trace of a version read by a commit-ted transaction could cause us to infer a version order < x which includes uncommitted versions, and is not a prefix of ≪ x . A clean interpretation could have cycles absent froman unclean interpretation, and vice versa.Phenomena like aborted reads and dirty updates are, inan informal sense, “worse” than dependency cycles like G1cand G2. If every interpretation of an observation must ex-hibit aborted reads, the question of whether it also exhibitsanti-dependency cycles is not as pressing! And if some in-terpretations exist which don’t contain aborted reads, butall of those exhibit anti-dependency cycles, we can chooseto give the system the benefit of the doubt, and say that itdefinitely exhibits G2, but may not exhibit aborted reads. Completeness
The more determinate transactions anobservation contains, the more likely we are to definitivelydetect anomalies. In special cases (e.g. when O is deter-minate, completely-recoverable, etc.), we can prove com-pleteness. In practice, we typically fail to observe the re-sults of some transactions, and must fall back on probabilis-tic arguments. In section 7 we offer experimental evidencethat Elle is complete enough to detect anomalies in realdatabases.
5. INFERRING ADDITIONAL DEPENDEN CIES
We have argued that
Elle can infer transaction depen-dencies based on traceability and recoverability. In this sec-tion, we suggest additional techniques for inferring the rela-tionships between transactions and versions.
In addition to dependencies on values, we can infer addi-tional dependencies purely from the concurrency structureof a history. For instance, if process A performs T then T , we can infer that T < p T . These dependencies en-code a constraint akin to sequential consistency: each pro-cess should (independently) observe a logically monotonicview of the database. We can use these dependencies tostrengthen any consistency model testable via cycle detec-tion. For instance, Berenson et al’s definition of snapshotisolation [4] does not require that transaction start times-tamps proceed in any particular order, which means that asingle process could observe, then un-observe, a write. If weaugment the dependency graph with per-process orders, wecan identify these anomalies, distinguishing between SI andstrong session SI [16].Similarly, serializability makes no reference to real-timeconstraints: it is legal, under Adya’s formalism, for everyread-only transaction to return an initial, empty state ofthe database, or to discard every write-only transaction byordering it after all reads. Strict serializability [19] enforces areal-time order: if transaction T completes before T begins, T must appear to take effect after T . We can computea transitive reduction of the realtime precedence order in O ( n · p ) time, where n is the number of operations in thehistory, and p is the number of concurrent processes, anduse it to detect additional cycles.Some snapshot-isolated databases expose transaction startand commit timestamps to clients. Where this informationis available, we can use it to infer the time-precedes orderused in Adya’s formalization of snapshot isolation [1], andconstruct a start-ordered serialization graph. Traceability on x allows us to infer a prefix of the ver-sion order < x —but this does not mean that non-traceableobjects are useless! If we relax < x to be a partial order,rather than total, and make some small, independent as-sumptions about the behavior of individual objects, we canrecover enough version ordering information to detect cyclicanomalies on less-informative datatypes, such as registers orsets.For instance, if we assume that the initial version x init isnever reachable via any write, we can infer x init < x x i forevery x i other than x init . With registers, for example, we8now that 1, 2, 3, etc. must all follow nil . When the numberof versions per object is small (or when databases have ahabit of incorrectly returning nil ), this trivial inference canbe sufficient to find real-world anomalies.If we assume that writes follow reads within a single trans-action, we can link versions together whenever a transactionreads, then writes, the same key, and that write is recov-erable. For instance, T = r ( x i ) , w ( x j ) allows us to infer x i < x x j .Many databases claim that each record is independentlylinearizable, or sequentially consistent. After computing theprocess or real-time precedence orders, we can use thosetransaction relationships to infer version dependencies. If atransaction finishes writing or reading a linearizable object x at x i , then another transaction precedes to write or read x j , we can infer (on the basis of per-key linearizability) that x i < x x j .Where databases expose version metadata to clients, wecan use that metadata to construct version dependency graphsdirectly, rather than inferring the version order from values.Since we can use transaction dependencies to infer versiondependencies, and version dependencies to infer transactiondependencies, we can iterate these procedures to infer in-creasingly complete dependency graphs, up to some fixedpoint. We can then use the resulting transaction graph toidentify anomalies.
6. FINDING COUNTEREXAMPLES
These techniques allow us to identify several types of de-pendencies between transactions: write-read, write-write,and read-write relationships on successive versions of a singleobject, process and real-time orders derived from the con-currency structure of the history, and version and snapshotmetadata orders where databases offer them. We take theunion of these dependency graphs, with each edge labeledwith its dependency relationship(s), and search for cycleswith particular properties. • G0 : A cycle comprised entirely of write-write edges. • G1c : A cycle comprised of write-write or write-readedges. • G-single : A cycle with exactly one read-write edge. • G2 : A cycle with one or more read-write edges.Optionally, we may expand these definitions to allow pro-cess, realtime, version, and/or timestamp dependencies tocount towards a cycle.To find a cycle, we apply Tarjan’s algorithm to identifystrongly connected components [34]. Within each graphcomponent, we apply breadth-first search to identify a shortcycle.To find G0 anomalies, we restrict the graph to only write-write edges, which ensures that any cycle we find is purelycomprised of write dependencies. For G1c, we select onlywrite-write and write-read edges. G-single is trickier, be-cause it requires exactly one read-write edge. We partitionthe dependency graph into two subgraphs: one with, andone without read-write edges. We find strongly connectedcomponents in the full graph, but for finding a cycle, we be-gin with a node in the read-write subgraph, follow exactlyone read-write edge, then attempt to complete the cycle us-ing only write-write and write-read edges. This allows us to identify cycles with exactly one read-write edge, should oneexist.These cycles can be presented to the user as a witness ofan anomaly. We examine the graph edges between each pairof transactions, and use those relationships to construct ahuman-readable explanation for the cycle, and why it im-plies a contradiction. As described in section 4.3.1, we can exploit recoverabilityand traceability to directly detect aborted read, intermedi-ate read, and dirty update. In addition, there are phenom-ena which Adya et al.’s formalism does not admit, but whichwe believe (having observed them in real databases) warrantspecial verification: • Garbage reads : A read observes a value which wasnever written. • Duplicate writes : The trace of a committed readversion contains a write of the same argument multipletimes. • Internal inconsistency : A transaction reads somevalue of an object which is incompatible with its ownprior reads and writes.Garbage reads may arise due to client, network, or databasecorruption, errors in serialization or deserialization, etc. Du-plicate writes can occur when a client or database retries anappend operation; with registers, duplicate writes can mani-fest as G1c or G2 anomalies. Internal inconsistencies can becaused by improper isolation, or by optimistic concurrencycontrol which fails to apply a transaction’s writes to its localsnapshot.
7. IMPLEMENTATION AND CASE STUDY
We have implemented
Elle as a checker in the open-source distributed systems testing framework
Jepsen [25]and applied it to four distributed systems, including SQL,document, and graph databases.
Elle revealed anomaliesin every system we tested, including G2, G-single, G1a, lostupdates, cyclic version dependencies, and internal inconsis-tency. Almost all of these anomalies were previously un-known. We have also demonstrated, as a part of
Elle ’s testsuite, that
Elle can identify G0, G1b, and G1c anomalies,as well as anomalies involving real-time and process orders.
Elle is straightforward to run against real-world databases.Most transactional databases offer some kind of list withappend. The SQL standard’s
CONCAT function and the
TEXT datatype are a natural choice for encoding lists, e.g. ascomma-separated strings. Some SQL databases, like
Post-gres , offer JSON collection types. Document stores typi-cally offer native support for ordered collections. Even sys-tems which only offer registers can emulate lists by perform-ing a read followed by a write.While list-append gives us the most precise inference ofanomalies, we can use the inference rules discussed in section5 to analyze systems without support for lists. Wide rowsin
Cassandra and predicates in SQL are a natural fit forsets. Many systems have a notion of an object version num-ber or counter datatype: we can detect cycles in both using
Elle . Even systems which offer only read-write registers9 et:T1 = {:value [[:append 250 10] [:r 253 [1 3 4]] [:r 255 [2 3 4 5]] [:append 256 3]], ...}T2 = {:value [[:append 255 8] [:r 253 [1 3 4]]], ...}T3 = {:value [[:append 256 4] [:r 255 [2 3 4 5 8]] [:r 256 [1 2 4]] [:r 253 [1 3 4]]], ...}Then:- T1 < T2, because T1 did not observe T2’s append of 8 to 255.- T2 < T3, because T3 observed T2’s append of 8 to key 255.- However, T3 < T1, because T1 appended 3 after T3 appended 4 to 256: a contradiction!
Figure 2: A textual explanation of an experimentally obtained real-time G-single cycle, as presented by ourchecker.Figure 3: The same cycle, as plotted by our checker.Arrows show dependencies between transactions:wr denotes a read dependency, rw denotes an anti-dependency, and rt denotes a real-time ordering. allow us to infer write-read dependencies directly, and ver-sion orders can be (partially) inferred by write-follows-read,process, and real-time orders.In all our tests, we generated transactions of varying length(typically 1-10 operations) comprised of random reads andwrites over a handful of objects. We performed anywherefrom one to 1024 writes per object; fewer writes per objectstresses codepaths involved in the creation of fresh databaseobjects, and more writes per object allows the detection ofanomalies over longer time periods.We ran 10-30 client threads across 5 to 9 nodes, depend-ing on the particular database under test. When a clientthread times out while committing a transaction (as is typ-ical for fault-injection tests)
Jepsen spawns a new logicalprocess for that thread to execute. This causes the logicalconcurrency of tests to rise over time. Tens of thousands oflogically concurrent transactions are not uncommon.Our implementation takes an expected consistency model(e.g. strict-serializable) and automatically detects and re-ports anomalies as data structures, visualizations, and human-verifiable explanations of each cycle. For example, considerthe G-single anomaly in Figures 2 and 3.
TiDB [32] is an SQL database which claims to providesnapshot isolation, based on Google’s
Percolator [31]. Wetested list append with SQL
CONCAT over
TEXT fields, andfound that versions 2.1.7 through 3.0.0-beta.1 exhibited fre-quent anomalies—even in the absence of faults. For exam-ple, we observed the following trio of transactions: T : r(34, [2, 1]), append(36, 5), append(34, 4) T : append(34, 5) T : r(34, [2, 1, 5, 4]) T did not observe T ’s append of 5 to key 34, so T mustrw-depend on T . However, T ’s read implies T ’s append of4 to key 34 followed T ’s append of 5, so T ww-depends on T . This cycle contains exactly one anti-dependency edge, soit is a case of G-single: read skew. We also found numerouscases of inconsistent observations (implying aborted reads)as well as lost updates.These cases stemmed from an automated transaction retrymechanism: when one transaction conflicted with another, TiDB simply re-applied the transaction’s writes again, ig-noring the conflict. This feature was enabled by default.Turning it off revealed the existence of a second, undocu-mented retry mechanism, also enabled by default. Version3.0.0-rc2 resolved these issues by disabling both retry mech-anisms by default.Furthermore,
TiDB ’s engineers claimed that select ...for update prevented write skew.
Elle demonstrated thatG2 anomalies including write skew were still possible, evenwhen all reads used select ... for update . TiDB ’s lock-ing mechanism could not express a lock on an object whichhadn’t been created yet, which meant that freshly insertedrows were not subject to concurrency control.
TiDB hasdocumented this limitation.
YugaByte DB [21] is a serializable SQL database basedon Google’s
Spanner [14]. We evaluated version 1.3.1 us-ing
CONCAT over
TEXT fields, identified either by primary orsecondary keys, both with and without indices. We foundthat when master nodes crashed, paused, or otherwise be-came unavailable to tablet servers, those tablet servers couldexhibit a handful of G2-item anomalies. For instance, thiscycle (condensed for clarity), shows two transactions whichfail to observe each other’s appends: T : . . . append(3, 837) . . . r(4, [ . . . T : . . . append(4, 885) . . . r(3, [ . . . Every cycle we found involved multiple anti-dependencies;we observed no cases of G-single, G1, or G0.
YugaByteDB ’s engineers traced this behavior to a race condition: af-ter a leader election, a fresh master server briefly advertisedan empty capabilities set to tablet servers. When a tablet10erver observed that empty capabilities set, it caused ev-ery subsequent RPC call to include a read timestamp.
Yu-gaByte DB should have ignored those read timestamps forserializable transactions, but did not, allowing transactionsto read from inappropriate logical times. This issue wasfixed in 1.3.1.2-b1.
FaunaDB [20] is a deterministic, strict-serializable doc-ument database based on
Calvin [36]. It offers native listdatatypes, but the client we used had no list-append function—we used strings with concat instead. While
FaunaDB claimedto provide (up to) strict serializability, we detected internal inconsistencies in version 2.6.0, where a single transactionfailed to observe its own prior writes: T : append(0, 6), r(0, nil) These internal inconsistencies also caused
Elle to inferG2 anomalies. Internal anomalies occurred frequently, un-der low contention, in clusters without any faults. However,they were limited to index reads. Fauna believes this couldbe a bug in which coordinators fail to apply tentative writesto a transaction’s view of an index.
Dgraph [27] is a graph database with a homegrown trans-action protocol influenced by Shacham, Ohad et al. [33]
Dgraph ’s data model is a set of entity-attribute-value triples,and it has no native list datatype. However, it does lend it-self naturally to registers, which we analyzed with
Elle .We evaluated
Dgraph version 1.1.1, which claimed to offersnapshot isolation, plus per-key linearizability.Like
FaunaDB , Dgraph transactions failed to provideinternal consistency under normal operation: reads wouldfail to observe previously read (or written!) values. Thistransaction, for instance, set key 10 to 2, then read an earliervalue of 1. T : w(10, 2), r(10, 1) To find cycles over registers, we allowed
Elle to infer par-tial version orders from the initial state, from writes-follow-reads within individual transactions, and (since
Dgraph claims linearizable keys) from the real-time order of opera-tions. These inferred dependencies were often cyclic —here,transaction T finished writing key 540 a full three secondsbefore T began, but T failed to observe that write: T : r(541, nil), w(540, 2) T : r(540, nil), w(544, 1) Elle automatically reports and discards these inconsis-tent version orders, to avoid generating trivial cycles, but itwent on to identify numerous instances of read skew, bothwith and without real-time edges: T : r(2432, 10), r(2434, nil) T : w(2434, 10) T : w(2432, 10), r(2434, 10) These cycles stemmed from a family of bugs in
Dgraph related to shard migration: transactions could read fromfreshly-migrated shards without any data in them, returning nil . Dgraph Labs is investigating these issues.
Elle ’s performance on real-world workloads was excel-lent; where
Knossos ( Jepsen ’s main linearizability checker)often timed out or ran out of memory after a few hundredtransactions,
Elle was able to check histories of hundredsof thousands of transactions in tens of seconds. To confirmthis behavior experimentally, we designed a history gener-ator which simulates clients interacting with an in-memoryserializable-snapshot-isolated database, and analyzed thosehistories with both
Elle and
Knossos .Our histories were composed of randomly generated trans-actions performing one to five operations each, interactingwith any of 100 possible objects at any point in time. Weperformed 100 appends per object. We generated historiesof different lengths, and with varying numbers of concurrentprocesses, and measured both
Elle and
Knossos ’ runtime.Since many
Knossos runs involved search spaces on the or-der of 10 , we capped runtimes at 100 seconds. All testswere performed on a 24-core Xeon with 128 GB of ram.As figure 4 shows, Knossos ’ runtime rises dramaticallywith concurrency: given c concurrent transactions, the num-ber of permutations to evaluate is c !. Symmetries and prun-ing reduce the state space somewhat, but the problem re-mains fundamentally NP-complete. With 40+ concurrentprocesses, even histories of 5000 transactions were (gener-ally) uncheckable in reasonable time frames. Of course, run-time rises with history length as well. Elle does not exhibit
Knossos ’ exponential runtimes: itis primarily linear in the length of a history. Building in-dices, checking for consistent orders, looking for internal andaborted reads, constructing the inferred serialization graph,and detecting cycles are all linear-time operations. Unlike
Knossos , concurrency does not have a strong impact on
Elle . With only one process, every transaction commits.As concurrency rises, some transactions abort due to con-flicts, which mildly reduces the number of transactions wehave to analyze. At high concurrency, more transactionsinteract with the same versions, and we infer more depen-dencies.
8. RELATED WORK
As we discuss in Section 1, there has been a significantamount of work on history checkers in the concurrent pro-gramming community. As early as 1993, Wing & Gong [37]simulated executions of linearizable objects to record con-current histories, and described a checker algorithm whichcould search for bugs in those histories.
Line-Up [9],
Knos-sos [22], and Lowe’s linearizability checker [28] follow simi-lar strategies. Gibbons & Korach showed [18] that sequen-tial consistency checking was NP-complete via reduction toSAT.Generating random operations, applying them to someimplementation of a datatype, and checking that the re-sulting history obeys certain invariants is a key concept in generative , or property-based testing. Perhaps the most well-known implementation of this technique is
QuickCheck [13],and
Jepsen applies a similar approach to distributed sys-tems [24]. Majumdar & Niksic argued probabilistically forthe effectiveness of this randomized testing approach [29],which helps explain why our technique finds bugs.Brutschy et al. propose both a static [8] and a trace-based dynamic [7] analysis to find serializability violations in11 =1c=5c=10c=20c=40 c=1c=20c=5c=100 R un t i m e ( s ) OperationsRuntime vs history length, for various concurrencies (c=1, 2, ...)Elle
Figure 4: Performance of Elle vs Knossos. programs run atop weakly-consistent stores. Quite recently,Biswas & Enea provided polynomial-time checkers for readcommitted, read atomic, and causal consistency, as well asexponential-time checkers for prefix consistency, snapshotisolation, and serializability. [6]Using graphs to model dependencies among transactionshas a long history in the database literature. The depen-dency graph model was first proposed by Bernstein [5, 4]and later refined by Adya [2, 1]. Dependency graphs havebeen applied to the problem of safely running transactionsat a mix of isolation levels [17] and to the problem of run-time concurrency control[11, 38], in addition to reasoningformally about isolation levels and anomalous histories.As attractive as dependency graphs may be as a foun-dation for database testing, they model orderings amongobject versions and operations that are not necessarily vis-ible to external clients. Instead of defining isolation levelsin terms of internal operations, some declarative definitionsof isolation levels [12, 10] are based upon a pair of com-patible dependency relations: a visibility relation capturingthe order in which writes are visible to transactions and an arbitration relation capturing the order in which writes arecommitted.The client-centric formalism of Crooks et al. [15] goes astep further, redefining consistency levels strictly in terms ofclient-observable states. While both approaches, like ours,enable reasoning about existing isolation levels from the outside of the database implementation, our goal is some-what different. We wish instead to provide a faithful map-ping between externally-observable events and Adya’s data-centric definitions, which have become a lingua franca in thedatabase community. In so doing, we hope to build a bridgebetween two decades of scholarship on dependency graphsand emerging techniques for black-box database testing.
9. FUTURE WORK & CONCLUSIONS
Future Work
There are some well-known anomalies,like long fork, which
Elle detects but tags as G2. We believeit should be possible to provide more specific hints to users about what anomalies are present. Ideally, we would like totell a user exactly which isolation levels a given history doesand does not satisfy.Our approach ignores predicates and deals only in indi-vidual objects; we cannot distinguish between repeatableread and serializability. Nor can we detect anomalies likepredicate-many-preceders. We would like to extend our modelto represent predicates, and prove how to infer dependencieson them. One could imagine a system which somehow gen-erates a random predicate P , in such a way that any versionof an object can be classified as in P or not, and then usingthat knowledge to generate dependency edges for predicate-based reads. Conclusions
We present
Elle : a novel theory and toolfor experimental verification of transactional isolation. Byusing datatypes and generating histories which couple theversion history of the database to client-observable readsand writes, we can extract rich dependency graphs betweentransactions. We can identify cycles in this graph, categorizethem as various anomalies, and present users with concise,human-readable explanations as to why a particular set oftransactions implies an anomaly has occurred.
Elle is sound. it identifies G0, G1a, G1b, G1c, G-single,and G2 anomalies, as well as inferring cycles involving per-process and real-time dependencies. In addition, it can iden-tify dirty updates, garbage reads, duplicated writes, andinternal consistency violations. When
Elle identifies ananomaly in an observation of database, it must be presentin every interpretation of that observation.
Elle is efficient. It is linear in the length of a historyand effectively constant with respect to concurrency. It canquickly analyze real-world histories of hundreds of thousandsof transactions, even when processes crash leading to highlogical concurrency. We see no reason why it cannot handlemore. It is dramatically faster than linearizability check-ers [22] and constraint-solver serializability checkers [23].
Elle is effective. It has found anomalies in every databasewe’ve checked, ranging from internal inconsistency and abortedreads to anti-dependency cycles.12 lle is general. Unlike checkers which hard-code a par-ticular example of an anomaly (e.g. long fork),
Elle workswith arbitrary patterns of writes and reads over differenttypes of objects, so long as those objects and transactionssatisfy some simple properties: traceability and recoverabil-ity. Generating random histories with these properties isstraightforward; list append is broadly supported in trans-actional databases.
Elle can also make limited inferencesfrom less informative datatypes, such as registers, counters,and sets.
Elle is informative. Unlike solver-based checkers,
Elle ’scycle-detection approach produces short witnesses of spe-cific transactions. Moreover, it provides a human-readableexplanation of why each witness must be an instance of theclaimed anomaly.We are aware of no other checker which combines theseproperties. Using
Elle , testers can write a small test whichverifies a wealth of properties against almost any database.The anomalies
Elle reports can rule out (or tentativelysupport) that database’s claims for various isolation levels.Moreover, each witness points to particular transactions atparticular times, which helps engineers investigate and fixbugs. We believe
Elle will make the database industrysafer.
The authors wish to thank Asha Karim for discussionsleading to
Elle , and Kit Patella for her assistance in build-ing the
Elle checker.
10. REFERENCES [1] A. Adya. Weak consistency: A generalized theory andoptimistic implementations for distributedtransactions. Technical report, MIT, 1999.[2] A. Adya, B. Liskov, and P. E. O’Neil. GeneralizedIsolation Level Definitions. ICDE’00, 2000.[3] A. Athalye. Porcupine. https://github.com/anishathalye/porcupine ,2017-2018.[4] P. A. Bernstein, P. A. Bernstein, and N. Goodman.Concurrency Control in Distributed DatabaseSystems.
ACM Computing Survey , 13(2), 1981.[5] P. A. Bernstein, D. W. Shipman, and W. S. Wong.Formal Aspects of Serializability in DatabaseConcurrency Control.
IEEE Transactions on SoftwareEngineering , 5(3), 1979.[6] R. Biswas and C. Enea. On the complexity of checkingtransactional consistency.
Proceedings of the ACM onProgramming Languages , 3(OOPSLA), 2019.[7] L. Brutschy, D. Dimitrov, P. M¨uller, and M. Vechev.Serializability for Eventual Consistency: Criterion,Analysis, and Applications. POPL 2017, 2017.[8] L. Brutschy, D. Dimitrov, P. M¨uller, and M. Vechev.Static Serializability Analysis for Causal Consistency.PLDI 2018, 2018.[9] S. Burckhardt, C. Dern, M. Musuvathi, and R. Tan.Line-up: A Complete and Automatic LinearizabilityChecker. PLDI ’10, 2010.[10] S. Burckhardt, D. Leijen, M. F¨ahndrich, andM. Sagiv. Eventually consistent transactions. In
ESOP2012 , 2012.[11] M. J. Cahill, U. R¨ohm, and A. D. Fekete. SerializableIsolation for Snapshot Databases.
ACM Transactionson Database Systems , 34(4), 2009.[12] A. Cerone, G. Bernardi, and A. Gotsman. AFramework for Transactional Consistency Models withAtomic Visibility. In , 2015.[13] K. Claessen and J. Hughes. QuickCheck: ALightweight Tool for Random Testing of HaskellPrograms. ICFP ’00, 2000.[14] J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost,J. Furman, S. Ghemawat, A. Gubarev, C. Heiser,P. Hochschild, et al. Spanner: Google’sglobally-distributed database. In
Proceedings of OSDI ,volume 1, 2012.[15] N. Crooks, Y. Pu, L. Alvisi, and A. Clement. Seeing isBelieving: A Client-Centric Specification of DatabaseIsolation. PODC ’17, 2017.[16] K. Daudjee and K. Salem. Lazy database replicationwith snapshot isolation. VLDB ’06, 2006.[17] A. Fekete. Allocating Isolation Levels to Transactions.PODS ’05, 2005.[18] P. B. Gibbons and E. Korach. Testing sharedmemories.
SIAM Journal on Computing , 26(4), 1997.[19] M. P. Herlihy and J. M. Wing. Linearizability: ACorrectness Condition for Concurrent Objects.
ACMTransactions on Programming Languages andSystems , 12(3), 1990.[20] F. Inc. Faunadb. https://fauna.com , 2019.[21] Y. Inc. Yugabyte db. https://yugabyte.com , 2019.1322] K. Kingsbury. Knossos. https://github.com/jepsen-io/knossos , 2013-2019.[23] K. Kingsbury. Gretchen. https://github.com/aphyr/gretchen , 2016.[24] K. Kingsbury and K. Patella. Jepsen (reports). http://jepsen.io/analyses , 2013-2019.[25] K. Kingsbury and K. Patella. Jepsen (softwarelibrary). https://github.com/jepsen-io/jepsen ,2013-2019.[26] M. Kleppmann. Hermitage: Testing transactionisolation levels. https://github.com/ept/hermitage ,2014-2019.[27] D. Labs. Dgraph. https://dgraph.io , 2020.[28] G. Lowe. Testing and Verifying Concurrent Objects.
Concurrency and Computation: Practice andExperience , 29(4), 2017.[29] R. Majumdar and F. Niksik. Why is Random TestingEffective for Partition Tolerance Bugs. POPL 2018,2018.[30] C. H. Papadimitriou. The serializability of concurrentdatabase updates.
Journal of the Association forComputing Machinery , 26(4), 1979.[31] D. Peng and F. Dabek. Large-scale incrementalprocessing using distributed transactions andnotifications. In
OSDI , 2010.[32] PingCAP. Tidb. https://pingcap.com/en , 2019.[33] O. Shacham, F. Perez-Sorrosal, E. Bortnikov,E. Hillel, I. Keidar, I. Kelly, M. Morel, andS. Paranjpye. Omid, Reloaded: Scalable and HighlyAvailable Transaction Processing.
USENIXConference on File and Store Technologies , 2017.[34] R. Tarjan. Depth-first Search and Linear GraphAlgorithms. SWAT ’71, 1971.[35] G. Team. Gecode: Generic constraint developmentenvironment. , 2005.[36] A. Thomson, T. Diamond, S.-C. Weng, K. Ren,P. Shao, and D. J. Abadi. Calvin: Fast DistributedTransactions for Partitioned Database Systems.SIGMOD ’12, 2012.[37] J. M. Wing and C. Gong. Testing and VerifyingConcurrent Objects.
Journal of Parallel andDistributed Computing , 17(1-2), 1993.[38] C. Yao, D. Agrawal, P. Chang, G. Chen, B. C. Ooi,W. Wong, and M. Zhang. DGCC: A new dependencygraph based concurrency control protocol for multicoredatabase systems.