Transactional Partitioning: A New Abstraction for Main-Memory Databases
aa r X i v : . [ c s . D B ] J a n Transactional Partitioning: A New Abstraction forMain-Memory Databases
Vivek Shah(supervised by Marcos Vaz Salles)
University of CopenhagenCopenhagen, Denmark [email protected]
ABSTRACT
The growth in variety and volume of OLTP (Online Trans-action Processing) applications poses a challenge to OLTPsystems to meet performance and cost demands in the exist-ing hardware landscape. These applications are highly in-teractive (latency sensitive) and require update consistency.They target commodity hardware for deployment and de-mand scalability in throughput with increasing clients anddata. Currently, OLTP systems used by these applicationsprovide trade-offs in performance and ease of developmentover a variety of applications. In order to bridge the gap be-tween performance and ease of development, we propose anintuitive, high-level programming model which allows OLTPapplications to be modeled as a cluster of application logicunits. By extending transactions guaranteeing full ACIDsemantics to provide the proposed model, we maintain easeof application development. The model allows the applica-tion developer to reason about program performance, andto influence it without the involvement of OLTP system de-signers (database designers) and/or DBAs. As a result, thedatabase designer is free to focus on efficient running of pro-grams to ensure optimal cluster resource utilization.
1. INTRODUCTION
OLTP applications have grown in volume, variety and per-formance demands [20]. In addition to the classic banking,reservation, and order entry systems, novel applications in-clude massively multiplayer online worlds (MMOW) or vir-tual worlds, massively multiplayer online role playing games(MMORPG), financial applications (e.g., online trading),telecommunication applications and information visualiza-tions. These applications demand scalability in through-put with growing data and clients. They exhibit high in-teractivity (latency sensitive) while maintaining consistencyon updates. Furthermore, they typically target commodityhardware for deployment. We will refer to them as HIC-CUP (Highly-Interactive Commodity-Hardware Consistent-on-Update) applications.
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. To view a copy of this li-cense, visit http://creativecommons.org/licenses/by-nc-nd/3.0/. Obtain per-mission prior to any use beyond those covered by the license. Contactcopyright holder by emailing [email protected].
Proceedings of the VLDB 2014 PhD Workshop . Stonebraker et al. have shown that OLTP systems canbe deployed in main-memory using shared-nothing clusterof machines at modest cost [21]. This leads to order-of-magnitude gains in performance over classical relational databasemanagement systems (RDBMS). With the reduction in pricesof commodity hardware and growth of cloud computing [1],a cluster-based infrastructure consisting of shared-memorycommodity hardware nodes increasingly provides flexible main-memory deployment options to meet HICCUP applicationrequirements.In order to build HICCUP applications for cluster-basedarchitectures, there are currently two dominant program-ming models exposed to the application developer. Thefirst approach abstracts the distributed architecture into auniform shared-memory space, which is then presented us-ing a unified data model, e.g., the relational model understrong consistency semantics (ACID transactions). The sec-ond approach abstracts the distributed architecture into alow-level distributed storage-oriented model, e.g., key valuestores with no/loose consistency semantics.Although the first approach provides ease of development,performance varies with the variety of applications. Thisis because of the sensitivity of the approach to data andcode partitioning to reduce the impact of distributed trans-actions [11]. In the second approach, performance is con-trolled by the application developer, but development is ex-tremely hard due to no/loose consistency semantics and thelow-level, storage-oriented programming model. There arein-between approaches that partition the unified data modeland provide consistency semantics within partitions [2, 19].However, in these systems, the clients must specify thesepartitions and handle consistency across partitions. Thisincreases application complexity, especially for HICCUP ap-plications which cannot always be architected to eliminateinter-partition accesses.We want to address this gap by exploring a middle groundbetween the two extreme approaches. Strong consistency se-mantics (ACID transactions) provide ease of application de-velopment. Sacrificing consistency semantics is not enoughto guarantee OLTP application performance [10]. We wantto build a transactional system maintaining global consis-tency semantics in the cluster to enable ease of developmentwhile guaranteeing optimal cluster performance. In orderto do so, we propose to extend the transactional abstractionwith a logical distribution abstraction. This enables theapplication developers to reason about the performance oftheir application in terms of distributed transactional logicunits and thus write efficient programs.1hile the logical distribution abstraction hides the low-level distributed architecture, it allows application develop-ers to understand program performance behavior in a clustersetup and to influence it. Database designers can then fo-cus on running the programs efficiently for optimal clusterresource utilization. This approach ensures that the appli-cation developers can participate in improving the perfor-mance of the application. The approach also reduces thecriticality of data and code partitioning. As a result, itremoves database designers from the critical path of perfor-mance improvement over a variety of applications by sep-arating concerns of writing efficient programs (applicationdevelopers) and running them efficiently (database design-ers). Global consistency guarantees in the cluster free thedeveloper from worrying about consistency issues, thus re-ducing program complexity and development costs.The remainder of the paper is organized as follows. InSection 2, we provide a background of the project by layingdown the design requirements for OLTP systems and review-ing the state of the art with respect to these requirements.In Section 3, we describe our approach in greater detail withan example OLTP application (the TPC-C benchmark).Finally, in Section 4, we outline the challenges and roadmapover the course of the project with some initial thoughts onhow to tackle them.
2. BACKGROUND
In this section, we provide a background of the Ph.D.project by first isolating the OLTP design requirements inthe current hardware and software landscape and then re-viewing the state of the art with respect to these designrequirements.
1. In order to meet the requirements of a growing vari-ety of HICCUP applications, OLTP systems must ex-pose a programming model which allows applicationdevelopers to reason about program performance in acluster-based setup and to improve the performance oftheir architected applications (programs).2. Lack of global consistency semantics and a low-leveldistributed storage-oriented programming model makeapplication development extremely expensive, error-prone and limit the adoptability of such a system [22].OLTP systems must provide a high-level, intuitive pro-gramming model with strong global consistency se-mantics over the distributed cluster setup to ease ap-plication development.3. To take advantage of advances in commodity hardwareand the growth of cloud computing [1], OLTP systemsmust target a hardware ecosystem of shared-memorynodes in a shared-nothing setup, allowing flexible de-ployments varying in performance and cost. OLTPsystems must also be designed to utilize the multi-processor, multi-core processor and multi-tiered cachearchitecture in individual cluster nodes to ensure op-timal hardware performance [13]. In recent times there have been numerous implementa-tions of systems that can be used by HICCUP applications.There has been an explosion in the implementation of key-value stores, which provide varying performance, scalabil-ity and availability characteristics. Amazon Dynamo [8],Google BigTable [3], Yahoo! PNUTS [5], Amazon HBaseand Cassandra [15], Amazon S3, Google Cloud Storage,Windows Azure Storage are some examples of distributedkey-value stores. These systems target commodity hardwarein a cluster-based architecture and provide high availabilityand scalability at the cost of lower consistency semantics anda low-level storage-oriented programming model. In orderto build HICCUP applications using these systems, appli-cations must implement multi-key transactions and man-age the low-level key-value based infrastructure, which isextremely complex and error prone, thus limiting ease ofdevelopment [22].In order to fix these problems, transactional capabilitieswere introduced in distributed data stores, such as GoogleSpanner [6] and Megastore [2], or exposed using client li-braries [9, 18]. Megastore [2] provides full ACID semanticsbut only within partitions of the data, which limits the easeof application development and flexible deployments. Warp[9] and Percolator [18] provide multi-key transactions us-ing client libraries but their storage-oriented programmingmodel and client-based execution make it difficult to reasonabout application performance for varying cluster-based de-ployments.On the other end, classical RDBMS provide a high-leveluniform shared-memory abstraction, which makes applica-tion development extremely easy. However, these systemsface performance scalability issues. In order to meet thesechallenges, Shore-MT [13] and DORA [16] target shared-memory architectures for high performance. Shore-MT op-timizes a classical RDBMS engine by reducing or removingcontention points for multi-core and multi-processor hard-ware. DORA proposes a novel methodology to partitiondata amongst the processing threads and migrates transac-tions between threads based on the partitioning. Althoughboth these systems ease application development, reasoningabout program performance remains extremely hard. Thesesystems are also not suited for cluster-based deployment un-less a shared-memory layer is built underneath them.H-Store was one of the first systems to target cluster-based deployments for OLTP applications [14]. In orderto provide a high-level relational abstraction, H-Store is re-liant on inferring optimal data partitioning from applicationlogic to minimize the impact of distributed transactions [7].This makes it extremely difficult for application develop-ers to reason about program performance over a variety ofapplications and increases the performance reliance of thesystem on automatic data [17] and code partitioning [4],or database administrators (DBAs). This brings databasedesigners and/or DBAs in the critical path of performanceimprovements of varying OLTP applications and ignores ap-plication developers as a valuable resource.Our approach is to provide a high-level, intuitive pro-gramming model over a cluster-based architecture wherethe uniform shared-memory space is partitioned by appli-cation logic. This partitioning is done by the applicationdevelopers, which allows them to reason about the perfor-mance of their programs and to improve them. By extending2ransactions to provide this abstraction with global ACIDsemantics, our approach maintains ease of application de-velopment.
3. APPROACH
In this section, we explain our approach in detail with anexample HICCUP application (the TPC-C benchmark) anddemonstrate how it meets the design requirements outlinedin Section 2.
In a cluster-based architecture, the programming modelmust provide an abstraction to balance ease of developmentand utilization of cluster resources. We want to explorethe middle ground between the two current dominant ap-proaches by allowing programmers to reason about the per-formance of their programs in the distributed setup whilemaintaining ease of use. We do so by introducing the notionof logical partitions. A logical partition forms the unit oflogical code isolation. A programmer architects his appli-cation to consist of multiple application logic units, whichcommunicate with each other, and establishes control flowdependencies between them. An application logic unit mustbe associated with a logical partition by the programmer.Communication between the application logic units is ex-pensive, which encourages the developers to think about theperformance of the constructed program and to make it moreefficient. In order to provide ease of development, ACID se-mantics are maintained globally across logical partitions viatransactions, which form the unit of code construction inthis model .In our model, a program (transaction) begins executionon a logical partition. During its execution the program canonly access data on that logical partition. If it needs toaccess data on another logical partition, it must explicitlydo so by invoking a program on another logical partitionwhich is executed as a subtransaction [25, p. 253]. Whena transaction or subtransaction is invoked, the logical par-tition where it must be executed must be specified as well.Program code is pre-compiled on each logical partition andexecuted within the database process for performance.In contrast to partitioning data, our model forces the ap-plication developer to partition his application logic. Theprogrammer must construct his application in terms of dis-tributed logical partitions by associating application logic tothem. Data partitioning is a consequence of the code par-titioning formulated by the application developer. ExistingOLTP systems either provide the view of a single data par-tition or fragmented data partitions, but not partitioningin terms of application logic. This increases the relianceof these systems on optimal code and data partition lay-out, thus making it hard for application developers to rea-son about the performance of a variety of applications usingthese systems.In summary, our model allows the application developer toreason about program performance in terms of distributedapplication logic units by exposing coordination costs be-tween logical partitions. The database designer can focuson providing efficient mapping of logical partitions to physi-cal partitions (machines) for optimal cluster resource utiliza-tion. Our model allows ease of construction and deploymentof different HICCUP applications across various cluster con-figurations. txn new order(w id, d id, c id , order) { //Get the order id from the district and insert the//next order id to be used in it < wh,dist,cust > = gen order id(w id, d id, c id , order);total = 0; for (ord item in order.items) { //Get the amount for the number of items ordered amount = get amount(ord item);total += amount; //Get the district information of the stock from the//supplier warehouse stock info = get dist info stock (ord item); //Update the stock of the supplier warehouse update stock(ord item, amount); add (”order line”, dist . order id , w id, d id, stock info ,amount, ...) ; } total pay = (1 + wh.tax + dist.tax) ∗ total ∗ (1 − cust.discount); return total pay; } Figure 1: New order transactionModeling hard to partition applications.
The model isideally suited for applications where partitioning is implicitin the application logic. Fortunately, this property holds fora large class of HICCUP applications. Although applica-tions that are hard to partition, e.g., social networks wouldnot be a compelling use case for our model, there is no rea-son for them to perform worse using our model compared toRDBMS and key-value stores. On the contrary, the modelprovides an analytical playground to application developersto experiment with different partition layouts. This allowsthem to adopt the appropriate model (relational or key-valuestores) for their application depending on the performanceneeds and development costs. The goal of the model is not toinfer a partitioning layout for non-partitioned transactionsbut to provide a framework to the application developer tospecify transactions with respect to a partitioning layout.
System Initialization.
The model deterministically as-signs transactions to logical partitions. Since the model al-ways starts with an initial empty database state, the userapplication must provide initialization transactions to createan initial database state. Since those transactions would becreated using the proposed model, data distribution wouldalways be deterministic and unambiguous. It is importantto mention that command logging to provide durability canbe interpreted as an automatic program to generate a givendesirable initial database state. This highlights the focus ofthe approach to logically partition code and not data.
The ease of partitioning application logic will determinethe ease of applicability of this approach. HICCUP appli-cations that have simple application logic are well suitedfor this approach as we demonstrate by explaining our ap-proach using the new order transaction class from the TPC-C benchmark as an example. We will not go into the de-tailed explanation of the benchmark which is available in its3 xn new order update stock(order) { result = <> ; for (ord item in order.items) { amount = get amount(ord item);stock info = get dist info stock (ord item);update stock(ord item, amount);append(result, < stock info,amount > ); } return result; } Figure 2: Transaction to read and update stock in-formation
PARTITIONING FUNCTION map(w id) { return w id; } ;new order PARTITION MAPPER map;new order update stock
PARTITION MAPPER map;
Figure 3: Mapping transactions to logical partitions specification .We begin by expressing the new order application logicwithout any notion of partitioning. This is followed by thetransformation of the program into the proposed model bypartitioning the new order transaction by warehouses. Fi-nally, we show how developers can reason about the perfor-mance of this transaction and rewrite it in order to make itmore efficient. The pseudocode for the new order transaction is outlinedin Figure 1. The pseudocode preserves the data dependen-cies in the transaction class, but abstracts away other detailsof the implementation including data types. In the trans-action logic, the order is first sanity checked and an orderid is generated. For each item in the order, the amountfor the number of ordered items is computed. The stockof the supplier warehouse is then updated and an orderline is inserted for each item along with the district infor-mation of the stock. Finally, the total amount to be paidby the customer is computed and returned by the transac-tion. It is important to note that the fields retrieved by the get dist info stock method from the stock relation, whichare used to insert a row in the order line relation, are neverupdated by the update stock method. The price for eachitem is present in the item relation, which is never updatedand never grows as more warehouses are added.The TPC-C benchmark was designed to scale in the num-ber of warehouses, i.e., with additional warehouses, there aremore customers who request greater number of new order transactions. Warehouses form the unit of distribution andscale-up in the benchmark. Many HICCUP applicationshave similar properties where a distribution unit leads togrowth in both data and client requests and importantly,application logic is mostly affine to the distribution unit.
Existing programming models fail to take advantage of thisproperty . Our approach, which provides a high-level, intu-itive, partitioned programming model with ACID guaran-tees, is ideally suited for this class of applications.
In the TPC-C benchmark, a warehouse is the unit of logi-cal partitioning. This follows intuitively because transaction txn new order (w id, d id, c id , order) { < wh,dist,cust > = gen order id(w id, d id, c id , order);results [order.num s w id]; i=0; for (s id in order. supplier w id) { //Execute the remote transaction for all the order// items requested of the remote supplier warehouse results [ i++] = EXEC new order update stock(subset(order, s id ))
ON PARTITION (s id); } //Use results to add order line and compute total pay total = 0; for ( result in results ) { for (item result in result ) { total += item result.amount; add (”order line”, dist . order id , w id, d id,item result , ...) ; }} total pay = (1+wh.tax+dist.tax) ∗ total ∗ (1 − cust.discount); return total pay; } Figure 4: New order transaction partitioned bywarehouses logic is warehouse-affine and both data and number of clienttransactions increase with additional warehouses. In orderto allow orders to be entirely supplied by a customer ware-house, each warehouse replicates the item relation.Under this logical partitioning model, the new order trans-action code is mostly local to the customer warehouse (w id).In order to execute the get dist info stock and update stock methods, the transaction needs to access a remote partition,since these methods operate on the supplier warehouse. Weneed to group these methods under a new order update stock transaction as shown in Figure 2, so that they can be invokedon the remote supplier warehouse logical partition.In the partitioned model, each logical partition is identi-fied by a unique logical partition identifier. Without loss ofgenerality, the logical partitions can be identified by uniquenatural numbers. When a transaction is invoked, the logi-cal partition where it would be executed must be specified,which establishes the relationship between the applicationlogic unit and a logical partition in the model. This is doneby using a mapping function that operates on the call pa-rameters or any available context and returns an appropri-ate partition identifier. Each transaction class declares themapping function it uses and defines it; however, all map-ping functions output partition numbers in the same domainof logical partitions. For our example, warehouse id can beused as the logical partition identifier. The mapping func-tion is declared and defined as shown in Figure 3. Now, a new order transaction can be invoked as:
EXEC new order (w id, ...) ON PARTITION (w id)
Finally, we re-partition the original new order transac-tion as shown in Figure 4, so that it is aware of logical par-titions by invoking new order update stock on remote sup-plier warehouse logical partitions. Since transactions guar-antee full ACID semantics even across logical partitions, theprogrammers do not have to worry about consistency issues.
A closer look at the partitioned new order transactionshown in Figure 4 exposes several performance issues ow-4 xn new order (w id, d id, c id , order) { < wh,dist,cust > = gen order id(w id, d id, c id , order); //Execute the remote transaction for all the order items//requested of the supplier warehouse in parallel PARALLEL EXEC new order update stock(subset(order, s id ))
ON PARTITION (s id) for s id in order. supplier w id ;total = 0; for (ord item in order.items) { //Compute the pay amount amount = get amount(ord item);total += amount; //Get the district information of the stock using the// replicated stock table stock info = get dist info stock (ord item); add (”order line”, dist . order id , w id, d id, stock info ,amount, ...) ; } total pay = (1+wh.tax+dist.tax) ∗ total ∗ (1 − cust.discount); return total pay; } Figure 5: New order transaction with further opti-mizations ing to control flow dependencies upon remote transactionresults. There is a dependency on the results from the re-mote new order update stock transaction in order to insertorder line entries. Since the item relation is already repli-cated on each logical partition, the amount can be computedlocally. The real dependency of order line entries is to getthe district information of the stock from the remote sup-plier warehouse. The district information fields are neverupdated so these fields can be replicated on each logicalpartition. Then, the new order update stock transactioncan be modified so that it does not have to return a resultand the new order transaction can be rewritten as shown inFigure 5. In order to invoke subtransactions in parallel, weprovide
PARALLEL EXEC construct. In order to use theconstruct, the transaction to be executed and the iterator tobe used must be specified. Using this construct in Figure 5,“EXEC new order update stock (order) ON PARTITION(s id)” is invoked in parallel for all iterator elements (s id)in order.supplier w id list.Since new order transactions constitute ∼
43% of the work-load mix, this optimization might be worthwhile. As thenumber of warehouses increase, however, the size of thereplicated stock relation with the district information fieldsincreases as well. For a large number of warehouses, thisoptimization option may thus not be viable. Nevertheless,it is important to note that the model allowed the applica-tion developer to reason about the performance of the de-veloped programs ( new order programs) and improve themusing the mentioned optimizations. Consequently, OLTPsystems exposing a logical partitioning model can provideperformance guarantees over varying classes of applications.
In a cluster-based setup, it is important to separate theconcerns of writing efficient programs and ensuring their ef-ficient execution by mapping to physical machines. Thisis done by exposing a programming model based on logi-cal partitioning, which allows the application developers to reason about program performance. HICCUP applications,which have an implicit distribution unit, can be intuitivelymodeled in the proposed programming model as shown inthe TPC-C example. Strong global consistency semanticsensure ease of program development.
4. CHALLENGES AND ROADMAP
In this section, we outline the challenges that need totackled during the course of the project. We also providesome initial thoughts on how to address them.
Logical to Physical Mapping.
In order to deploy a setof logical partitions over a set of physical partitions (phys-ical machines with varying hardware characteristics), thelogical partitions need to be mapped to the set of physicalpartitions. Each physical partition needs to run a partitionexecutor that is responsible for handling transactions. Thisgives rise to the following sub-challenges, which we plan toinvestigate: (1) How to reuse a transaction processing sys-tem which meets performance demands on modern main-memory multi-core machines and extend it with the logicalprogramming abstraction? (2) How to map a set of logicalpartitions to a set of physical partitions to ensure optimalresource usage of the cluster?In order to meet the first challenge, we plan to use astate-of-the-art transaction processing system designed formain-memory multi-core machines [24], and extend it tooperate in a distributed setup to provide the logical par-titioning abstraction. In order to meet the second chal-lenge, we plan to investigate cost models that attempt toapproximate transaction execution costs in a physical clustersetup. This involves identifying and characterizing metricsin the cost model. The cost model can then be used to pro-duce an initial mapping, which can be changed if workloadcharacteristics change. We also plan to investigate trans-action scheduling strategies without violating performanceconstraints, which would help in better resource utilization.
Local Concurrency Control and Global Commit.
Oneof the simplest models to run multiple transactions is to runthem serially, which is what H-Store advocates [14]. Thissimple model will not work for distributed transactions un-less a global schedule is pre-decided [23]. Running transac-tions serially would be a waste of compute power in multi-core architectures of today. There is also a need to hidememory latencies, network latencies and commit latencies.However, multi-threading with pessimistic concurrency con-trol leads to contention bottlenecks in the lock manager [13]and multi-phase commits, which hurt performance.In a partitioned model, distributed optimistic concurrencycontrol holds promise owing to its parallel nature, lack oflong critical sections and multi-phase commits [12]. It pro-vides a global commit order in the distributed setup using lo-cal commit ordering decisions without necessitating a multi-phase protocol and hindering local transactions. We want toinvestigate the performance of a distributed optimistic con-currency control mechanism in order to support the trans-actional semantics of our approach. We also want to in-vestigate how transaction aborts hurt performance undercontention, and whether hybrid concurrency control and/orprogram repartitioning would improve system performance.
Evaluation.
In order to evaluate the system implemen-tation, we plan to measure the scalability in transaction5hroughput, latency, abort rates and cost per transactionwith growing data and number of clients. We are particu-larly interested in evaluating these metrics for various phys-ical machine configurations and logical-to-physical partitionmappings. We plan to evaluate the system using the TPC-Cbenchmark. We are also interested in looking at other non-standard application benchmarks, which could provide clueson usability of the proposed model. We plan to compare theresults with that of H-Store, a commercial RDBMS and Siloto characterize the system performance and understand thearchitecture design trade-offs.
The cloud computing infrastructure provides a flexibleecosystem for deployment of the system. In order to tar-get the cloud computing infrastructure and satisfy varyingHICCUP application requirements, we plan to investigatethe necessary set of self-tuning tools to make the systemcloud-ready. Deployment advisors that self-manage diverse,changing resources [26], adapt to the cloud infrastructure[27], have received a lot of attention lately. We plan to de-velop a deployment advisor that generates an optimal clus-ter configuration in the cloud from an application’s require-ments (e.g., throughput, latency, cost constraints and goals)and programs by constructing and solving an optimizationproblem. We plan to investigate techniques to allow the de-ployment advisor to adapt to the variance and workload mixin the cloud infrastructure.
5. CONCLUSION
Designing main-memory OLTP systems that provide highperformance, scalability and ease of development over a va-riety of HICCUP applications is an open challenge. In thispaper, we identify the need to allow application developersto reason about the performance of their programs whilemaintaining ease of use. We propose an extension to thetransactional abstraction to provide a high-level, intuitive,distributed logical programming model with strong globalconsistency semantics to the application developer. We havealso shown how the model can be intuitively used by theapplication developer by using TPC-C as an example. Weplan to evaluate the potential of this approach to meet therequirements of a variety of HICCUP applications.
6. REFERENCES [1] M. Armbrust, et al. A view of cloud computing.
Commun. ACM , 53(4):50–58, 2010.[2] J. Baker, et al. Megastore: Providing scalable, highlyavailable storage for interactive services. In
CIDR ,pages 223–234, 2011.[3] F. Chang, et al. Bigtable: A distributed storagesystem for structured data.
ACM Trans. Comput.Syst. , 26(2), 2008.[4] A. Cheung, et al. Automatic partitioning of databaseapplications.
PVLDB , 5(11):1471–1482, 2012.[5] B. F. Cooper, et al. Pnuts: Yahoo!’s hosted dataserving platform.
PVLDB , 1(2):1277–1288, 2008.[6] J. C. Corbett, et al. Spanner: Google’sglobally-distributed database. In
OSDI , pages261–264. USENIX Association, 2012. [7] C. Curino, et al. Schism: a workload-driven approachto database replication and partitioning.
PVLDB ,3(1):48–57, 2010.[8] G. DeCandia, et al. Dynamo: amazon’s highlyavailable key-value store. In
SOSP , pages 205–220.ACM, 2007.[9] R. Escriva, B. Wong, and E. G. Sirer. Warp:Lightweight multi-key transactions for key-valuestores. Technical report, Cornell University, Ithaca,2013.[10] A. Floratou, et al. Can the elephants handle the nosqlonslaught?
PVLDB , 5(12):1712–1723, 2012.[11] P. Helland. Life beyond distributed transactions: anapostate’s opinion. In
CIDR , pages 132–141, 2007.[12] M. Herlihy. Apologizing versus asking permission:Optimistic concurrency control for abstract datatypes.
ACM Trans. Database Syst. , 15(1):96–124,1990.[13] R. Johnson, et al. Shore-mt: a scalable storagemanager for the multicore era. In
EDBT , volume 360,pages 24–35. ACM, 2009.[14] R. Kallman, et al. H-store: a high-performance,distributed main memory transaction processingsystem.
PVLDB , 1(2):1496–1499, 2008.[15] A. Lakshman and P. Malik. Cassandra: adecentralized structured storage system.
OperatingSystems Review , 44(2):35–40, 2010.[16] I. Pandis, et al. Data-oriented transaction execution.
PVLDB , 3(1):928–939, 2010.[17] A. Pavlo, C. Curino, and S. B. Zdonik. Skew-awareautomatic database partitioning in shared-nothing,parallel oltp systems. In
SIGMOD Conference , pages61–72, 2012.[18] D. Peng and F. Dabek. Large-scale incrementalprocessing using distributed transactions andnotifications. In
OSDI , pages 251–264. USENIXAssociation, 2010.[19] J. Shute, et al. F1: A distributed sql database thatscales.
PVLDB , 6(11):1068–1079, 2013.[20] M. Stonebraker. New opportunities for new sql.
Commun. ACM , 55(11):10–11, 2012.[21] M. Stonebraker, et al. The end of an architectural era(it’s time for a complete rewrite). In
VLDB , pages1150–1160. ACM, 2007.[22] D. Terry. Replicated data consistency explainedthrough baseball.
Commun. ACM , 56(12):82–89, 2013.[23] A. Thomson, et al. Calvin: fast distributedtransactions for partitioned database systems. In
SIGMOD Conference , pages 1–12, 2012.[24] S. Tu, et al. Speedy transactions in multicorein-memory databases. In
SOSP , pages 18–32. ACM,2013.[25] G. Weikum and G. Vossen.
Transactional InformationSystems: Theory, Algorithms, and the Practice ofConcurrency Control and Recovery . MorganKaufmann, San Francisco, CA, USA, 2002.[26] Q. Yin, et al. Rhizoma: A runtime for self-deploying,self-managing overlays. In
Middleware , volume 5896,pages 184–204. Springer, 2009.[27] T. Zou, et al. Cloudia: A deployment advisor forpublic clouds.