[PDF] Migration in the Stencil Pluralist Cloud Architecture

Abstract

A debate in the research community has buzzed in the background for years: should large-scale Internet services be centralized or decentralized? Now-common centralized cloud and web services have downsides -- user lock-in and loss of privacy and data control -- that are increasingly apparent. However, their decentralized counterparts have struggled to gain adoption, suffer from their own problems of scalability and trust, and eventually may result in the exact same lock-in they intended to prevent. In this paper, we explore the design of a pluralist cloud architecture, Stencil, one that can serve as a narrow waist for user-facing services such as social media. We aim to enable pluralism via a unifying set of abstractions that support migration from one service to a competing service. We find that migrating linked data introduces many challenges in both source and destination services as links are severed. We show how Stencil enables correct and efficient data migration between services, how it supports the deployment of new services, and how Stencil could be incrementally deployed.

Full PDF

MMigration in the

Stencil

Pluralist Cloud Architecture

Tai Liu

Tencent America [email protected]

Zain Tariq

New York University Abu [email protected]

Barath Raghavan

University of Southern [email protected]

Jay Chen

International Computer Science [email protected]

ABSTRACT

A debate in the research community has buzzed in the backgroundfor years: should large-scale Internet services be centralized or de-centralized? Now-common centralized cloud and web services havedownsides—user lock-in and loss of privacy and data control—thatare increasingly apparent. However, their decentralized counter-parts have struggled to gain adoption, suffer from their own prob-lems of scalability and trust, and eventually may result in the exactsame lock-in they intended to prevent.In this paper, we explore the design of a pluralist cloud architec-ture,

Stencil , one that can serve as a narrow waist for user-facingservices such as social media. We aim to enable pluralism via a uni-fying set of abstractions that support migration from one service toa competing service. We find that migrating linked data introducesmany challenges in both source and destination services as linksare severed. We show how Stencil enables correct and efficient datamigration between services, how it supports the deployment ofnew services, and how Stencil could be incrementally deployed.

CCS CONCEPTS • Networks → Network architectures . KEYWORDS

Pluralist Architecture; User Lock-in; Data Migration

The centralization of Internet services and cloud applications hasbeen a boon to users worldwide in recent years. However, with thiscentralization comes the loss of privacy, the lack of autonomy, andapplication/service lock-in [33]. In recent years, some researchershave argued against centralized approaches, and for decentralizedalternatives, but have yet to reach even broad agreement on whatshould be done [1, 3, 4, 8, 16, 18, 20, 22, 23, 27, 34, 36, 37, 39, 42,43, 50, 51, 54, 55, 58, 61, 64]. Little progress has been made on thedeployment and adoption of decentralized platforms.There are different flavors of decentralization. Replacing a cen-tralized application or system with a single decentralized alternativeis scarcely better: while centralized systems have a host of problems,selection of a single decentralized system is not a viable alternativeas it may lead to re-centralization. However, the natural conclusionof this reasoning—a proliferation of decentralized systems, many ofwhich will overlap in their offerings—is both significantly harder forusers to navigate and harder for application developers, who haveto seemingly build their systems from scratch for each platform. There are many important technical design challenges in decen-tralized infrastructure, including security, naming, and more. Ourgoal is to narrow the focus to the key issues the architecture mustadjudicate as opposed to an individual application or service. Weargue that we need a pluralist architecture: one that allows theco-existence of applications and seamless migration between them.Not only can such an architecture prevent user lock in, but it canalso ease the pain of developing decentralized applications. Putanother way, a pluralist architecture is one that picks no winners:instead, it allows a marketplace of services to be developed, andprovides enough scaffolding and restrictions to ensure that thelandscape does not become balkanized.We describe

Stencil , a pluralist cloud architecture that enablesmigration between cloud services such as social media. In thiscontext, Stencil addresses the twin problems of a) social media appsmaking user data hard to download and b) once downloaded, thedata is no longer useful because it is no longer linked (to other dataheld by that user or others). Further compounding this challengeare the network effects that cause new applications to be relativelylow value until enough users have bootstrapped the system. Forexample, Diaspora [13] has long sought to provide an alternative toFacebook; if Alice wants to try it, she will find when she downloadsall her Facebook data that it cannot be easily imported into Diaspora,and even if it were possible, the data would have lost meaningwithout its links to data of others who did not migrate.The existence and growth of user lock-in are not entirely dueto technical factors. Rather, they stem from the financial incentivefor collecting and monetizing user data. Changing the prevalentprofit or business model is out of the scope of this paper, but policymeasures such as the General Data Protection Regulation (GDPR)have forced the hand of large-scale services, ensuring some degreeof data portability as a side-effect of privacy objectives [19]. In-deed, the providers of some large-scale services are beginning torecognize this same issue, as in 2018, a data transfer initiative wasintroduced by Google, Facebook, Microsoft, and Twitter, called DataTransfer Project (DTP) [12], in order to enable data migration be-tween various services of these organizations. However, DTP doesnot consider preserving contexts or reconciling semantic differ-ences during and after migration between different services, whichcould eventually make migrated data useless or cause anomalies.Stencil is developed as a solution to the inherent technical problemsin service-to-service portability in such applications.Stencil’s pluralism is rooted in the challenge applications facein agreeing upon data formats, and in coordinating any kind ofupdates or changes. In Stencil, applications are bridged by schemamappings that capture the semantic translation of data from one a r X i v : . [ c s . N I] F e b pplication to the other. Data can thus be migrated from one applica-tion to another through the use of the schema-mapping translation.A naive implementation of migration is likely to produce numerousdatabase- and application-level anomalies, as table rows and datafields are deleted from the source application and inserted into thedestination application. Locking all migrating data can prevent tran-sient anomalies of this nature, but not only do locks increase servicedowntime, but after migration is completed, persistent anomaliesmust still be handled through additional mechanisms that may notexist natively.To solve this problem, Stencil requires application writers to beable to specify two things: 1) a directed acyclic graph (DAG) thatdescribes the semantic relationships of data in their own application,and 2) a schema mapping from a source application to their owndestination application. Stencil uses the DAG to enforce correctnessat both the source and destination applications, and uses the schemamapping to map data from the source to the destination application.A migration system must protect the security and privacy of userdata as data passes through the system. Furthermore, migration orremoval of shared data may also lead to a great number of privacyor ownership questions: 1) Can Alice migrate Bob’s messages in thesame message group? 2) Even if a migration system allows Bob toshare his messages to Alice, can Bob still own his messages whichhas been migrated by Alice to a new application? 3) What if Bobwants to unshare his messages later which have been migrated?Stencil addresses security and privacy challenges using permissions,standard authentication and encryption techniques, and by relyingon policy definitions. We defer more subtle issues as a topic forfuture work; e.g. Bob’s posts can only be viewed by Alice in thesource application, but the destination application may make Bob’sposts public due to the lack of fine-grained visibility.This paper makes the following contributions. First, we definedata migration in a pluralist architecture, identify the inherent tech-nical challenges, and decompose the types of migration-inducedanomalies. Second, we present the design and implementation ofStencil, which aims to resolve the fundamental challenges in thisspace, and built a prototype of Stencil in 26,329 lines of Go. Thekey, however, is that applications can build upon the Stencil infras-tructure and need not make significant changes to their codebases.Third, we evaluate Stencil by integrating four social network ap-plications with Stencil, each of which only requires a few hundredlines of JSON specification, and demonstrate that Stencil can seam-lessly migrate user data without anomalies. Here we present three high-level examples of migration, which arenot possible today. From these examples, we motivate the definitionof three types of migration.1. Alice uses Facebook but is upset about the Cambridge AnalyticaScandal [24] and decides to migrate to Twitter. She wants to deleteeverything including her posts, comments, events, etc., from Face-book, and to automatically migrate as much as possible to Twitterinstead of rebuilding her online life all over again.2. Alice decides to migrate data from Facebook to Twitter since shewill mainly be using Twitter. However, she cannot completely leaveFacebook because most of her friends and colleagues are still there. She wants to keep everything in Facebook, and to copy as muchdata as possible from Facebook to Twitter automatically.3. Alice creates an event in her Facebook, and wants to share theevent in Linkedin, Instagram, and Snapchat as well. She wants tokeep the event consistent in all applications (e.g., if she changesthe event date in Facebook, she wants the change to be applied tothe other instances of the event automatically). Furthermore, shealso wants users’ replies or comments on the event in Facebook toappear in all the other applications.Based on these migration use cases, we define three types ofmigration: deletion migration , independent migration , and consis-tent migration . Deletion migration entails moving data from sourceapplications to destination applications. Independent migrationentails copying data from source applications to destination ap-plications. Consistent migration entails copying data from sourceapplications to destination applications, and keeping data in sourceapplications consistent with the copied data in destination applica-tions all the time. Our current implementation of Stencil handlesdeletion and independent migrations. Consistent migration requiressignificant additional support from application developers (e.g., syn-chronizing data between corporations or unified data stores). In any type of migration, moving data from one service to anotherfaces two main challenges: preserving context and reconciling se-mantic differences. User data is important and valuable for manyreasons. The content itself is the creative product generated byeach individual while on the platform, the links represent socialrelationships, and the interactions between users elaborate uponthese simple connections to form shared contexts between users.Today, large-scale services allow users to download their data; re-uploading content or importing contacts to a new service are alsoboth relatively simple. However, these methods for moving userdata break the links between users and the relationships betweentheir data. Thus, the data becomes inert and loses much of its value.A common data migration substrate is needed to preserve the rela-tionships between data as users move between services.Semantic differences must be negotiated if data is migrated fromone application to another.

Application semantics include not onlythe data models in which an application stores and manages itsdata but also the rules by which an application implements itsservices to serve users. Consider a shared conversation betweenAlice and Bob. If Alice moves her data to another service, wheredoes the conversation belong? [ownership] Who can still read it?[permissions] Who is Bob on the new service? [identity] If Bob alsomoves to the new service later, how is the conversation restored?[re-integration] These kinds of issues are greatly eased if there is acommon data format or centralized architecture where they couldbe dealt with by defining additional permissions and rules. However,a pluralist design by definition involves different applications andapplication semantics. A systematic approach to data migration isnecessary to resolve these differences.

Migrating data between Internet applications can create a widerange of potential anomalies depending on the semantics of the pplications as well as on how the migration process is implemented.To explore the problem space, we constructed dozens of examplesthat would produce migration-related issues and classified theminto six broad categories: dangling data, data loss, data ownership,incoherent data, data confusion, and service interruption. Here, wefirst introduce a simple canonical scenario of each type. From thesescenarios, we distill three central design challenges that Stencil orany other system that supports migration must resolve.In each of these scenarios, Alice wants to migrate her posts andmessages from application X to application Y. Scenario 1: Dangling Data.

Alice migrates her post from X to Y,but the system does not allow the migration of Eve’s comments onAlice’s post to Y. Eve’s comments become “dangling data”, whichwe define as the data that loses semantic meaning because it ismissing some other data that it depends on. As a result, X does notdisplay Eve’s comments without the corresponding posts and Eve’scomments are also useless without the post.

Observations:

Assuming for the moment that the applicationsemantics on both/either end of the migration make it ok to removethe dangling data, a naive solution to the dangling data problem isto identify and delete dangling data (i.e., garbage collection).

Scenario 2: Data Loss.

Alice migrates her post from X to Y, whichhas attached likes from other users that application semantics allowto be migrated. During migration, these likes may arrive at Y earlierthan the post; since Y considers these likes as dangling data, theyare deleted.

Observations:

We could solve this issue perhaps by waiting forsome time before deleting the data, but this can cause additionalproblems as we will see later. Scenarios 1 and 2 suggest that wemust consider how two pieces of data are related and how theyshould be handled before, during, and after migration.

Scenario 3: Data Ownership.

Alice migrates the whole messagehistory of a message group to Y, including Eve’s messages. Thisscenario is similar to how Facebook allows users to download wholemessage histories of conversations including messages from others.Eve is surprised to see her messages in Y because she only sentthese messages in X.

Observations:

A naive solution to this problem is that a piece ofdata can only be migrated by its owner (or creator).

Scenario 4: Incoherent Data.

In a message group of X, Alice isbooking tickets for her friends and asks if anyone can’t go. Eve saysshe can’t go. Alice fails to see Eve’s reply because she is migratingto Y. If the migration only allows Alice’s own data to be migrated,Eve’s reply cannot be migrated to Y. Thus, Alice ends up buyingtoo many tickets.

Observations:

In social media applications, it is essential to readdata from different users to understand whole contexts. In thisexample, ownership is “correct”, but the context is lost. Scenarios 3and 4 show the importance of considering how ownership of shareddata is handled across applications.

Scenario 5: Data Confusion.

Alice migrates her long post, withall the comments replying to her post, to Y, but because her postcontains a large video, it arrives at Y much later than the corre-sponding comments. Y allows migrated data to be displayed once data arrives. The comments specific to the “missing” post in Y giverise to confusion.

Observations:

This data confusion is short-term. A naive solutionis to lock Alice’s data until it arrives in Y.

Scenario 6: Service Interruption.

Alice migrates her post andcorresponding comments. A comment contains a large video, andarrives much later than its preceding comments and the post, whichcan actually be displayed without any confusion, so there is anunnecessary delay in service.

Observations:

This service interruption is unnecessary accordingto the semantics of application Y. As more data is migrated, serviceinterruptions may be more common or longer in duration. Scenarios5 and 6 illustrate the importance of considering how migration isimplemented in relation to application semantics.The scenarios highlighted here have been simplified to be illus-trative of the problems that can arise during migration, but morepernicious scenarios are possible, particularly in relation to securityand privacy. For example, Alice blocks Eve from viewing her postsin X and migrates to Y. Eve wants to exploit the vulnerability ofthe garbage collection mechanism from Scenario 2, so she waits forsome period of time. Y eventually garbage collects the block thatprevents Eve from viewing Alice’s posts because Y has not seenEve before, and thus the block is considered dangling. Finally, Evemigrates to Y and is able to see the posts she could not see in X.Altogether, these scenarios outline the design space of migration-aware systems. Each of the three scenario pairs (scenarios 1 and 2,3 and 4, and 5 and 6) represents the results of diametrically opposeddesign decisions that do not explicitly take migration into account.We distill these scenarios into three key migration challenges:C1. How to define, identify, and act upon the relationships betweendata before, during, and after migration. (Scenarios 1 and 2)

C2. How to handle the ownership of data in shared contexts. (Sce-narios 3 and 4)

C3. How to migrate data in the face of application-level semanticswhile ensuring service uptime. (Scenarios 5 and 6)

In this section, we describe Stencil’s overall design and how itaddresses the migration challenges from the previous section.

Data in social media applications heavily interlinks with each other.For example, in Facebook, posts may have many comments andcomments may have their replies. A reply cannot be displayedwithout its corresponding comment. We define a data dependency as a relationship between data, where one piece of data depends onanother to function. In social media applications, replies, comments,and posts are generally authored by specific users; we thus define data ownership as a relationship between data and a user, whichrepresents that a piece of data belongs to a user. Finally, we define data sharing relationships to be situations where data is sharedwith a user who is not the owner.Data dependencies, ownership, and sharing relationships to-gether form the complete contexts. In other words, these are therelationships between data that are necessary to preserve the se-mantics of social media applications. Individual pieces of data (e.g., igure 1: DAG of dependencies, ownership, and sharing a comment) have very little value in social media applications with-out corresponding replies, likes, and retweets. Thus, one of ourfirst design decisions is to make explicit these three types of datarelationships that are present in social media applications. Once de-fined, these relationships must then be preserved across applicationboundaries during and after migration.These relationships linking data and users form DAGs , and aDAG node, such as Post1 in Figure 1, is a logical and atomic unitof migration and display. Application writers need to decide howto group data into nodes based on their application logic becausethe granularity of grouping data will not only affect correctnessbut also migration latency; i.e., grouping all data as a node meanslocking all data before migration, whereas dividing an atomic nodeinto multiple nodes violates atomicity. A root node is a special nodecontaining basic information about users, such as profiles. Figure 1illustrates data dependencies; e.g.,

Post1 depends on

Party Group .Application writers should specify data ownership because it is hardfor Stencil to infer or know data ownership by simply examiningapplication-specific attribute names. For instance, Mastodon [38]uses resource_owner_id , owner_id , account_id , etc. in differenttables to represent data belonging to users, whereas Diaspora [13]uses author_id and user_id .Stencil currently asks application writers to define DAGs inJSON, but we envision semi-automated approaches to streamlinethis process such as the use of foreign keys to make inferences orstatic analysis of database queries. In DTP [12], the project creators envision a structured data modelthat serves as a common intermediary for migrating data; thus,application writers need to write data adapters to the common datamodel, and all participating services must agree to adopt this unifieddata model. This requirement creates a barrier to adoption. Ratherthan a centralized approach taken by DTP, Stencil uses PairwiseSchema Mapping (PSM) [21], which can be defined as follows: 𝑎 → 𝑏, 𝑏 → 𝑐 ⇒ 𝑎 → 𝑐 (1) Extending Stencil to general graphs would require atomic migration of cycles. where a , b , and c represent compatible parts of data schemas inthree different applications, and → represents transformation. PSMallows an application to transform data into other applications with-out direct transformation specifications by transforming throughcompatible data schemas transitively. Since every application writesmappings between a few other similar applications, this path toadoption is decentralized and incremental. PSM also reduces thework of defining mappings which can automatically be obtainedby streamlining transitive mappings.Due to the heterogeneity and complexity of data models, PSMcannot completely eliminate the tasks of writing mappings for everyother application since it is not guaranteed that each pairwisetransformation is lossless. Thus, transitive mappings in PSM mightsuffer translation errors and intermediate applications cannot actperfectly as a ‘bridge’ to other applications in such cases. Despitethis limitation, PSM allows application developers to select and mapto desirable applications with similar data models and thus reducethe more tedious mapping work. Data that cannot be mapped acrossapplications must also be handled properly. In Stencil, this data isstored as a ‘bag’ that the owner can elect to discard, download, andpotentially re-integrate later as more compatible mappings becomeavailable (Section 5.3). So far, we have not distinguished between whether migration in-duced anomalies occur at the source application or destinationapplication, but anomalies may occur at either or both ends of themigration. Each of the three types of migration (i.e., deletion, inde-pendent, and consistent migrations) affects application endpointsdifferently, but predictably. Deletion migration, for example, hasthe potential to create anomalies at the source application, but inde-pendent migration does not, since data at the source application isuntouched. Our current implementation does not handle consistentmigration, but consistent migration is identical to independent mi-gration except that original and migrated data are kept consistent.All three types of migration introduce new data in the destinationapplication. Stencil uses two mechanisms to enforce applicationsemantics and optimize for service uptime.The first mechanism is a migration order that “deletes” data froma source application during migration and minimizes issues such asdata incoherence and service interruption. Stencil uses the DAG todetermine the migration order as follows: in the source application,Stencil always migrates a piece of data after migrating the datadepending on it, and copies the migrating user’s root node at firstfor displaying data in the destination application, but deletes it last.To preserve destination application semantics, Stencil adds a mi-gration flag to the migrated data in the destination service, whichexposes the migration process to the destination application. Des-tination application developers are then free to decide how thesystem and users can interact with migrated data during migrationby annotating the DAG with validation rules. Stencil follows thedestination application’s DAG, validates migrated data based onthe rules defined by application developers, and removes migrationflags on valid data to allow it to be accessed by the application. Forexample, in Figure 1, without posts, comments may not be eligible or display; without the root nodes, all the data owned or sharedby the roots may not be displayed.The order of deleting data from a source application and theorder of adding data to a destination application may be in conflict(assuming similar application semantics). For example, a sourceapplication wants to delete comments before posts, while a des-tination application wants to add comments after posts. Thus, ifwe preserve application semantics, then there is a tension betweenthe source and destination applications with respect to service in-terruption. Stencil’s design preserves service continuation at thesource application. Other designs for negotiating this tradeoff are atopic for future work. Stencil allows users to migrate data owned (or created) by otherusers in the same application only if those users explicitly ‘share’their data with them. For instance, Alice wants to migrate to a newapplication; her post can be migrated with her as she is the owner,but Bob’s comment on her post will only be migrated with Alice ifBob has allowed Alice to do so.Migration/deletion of shared data may lead to potential privacyor ownership violations. We describe Stencil’s methods of migratingdata in shared contexts by discussing four important questions: 1)What shared data rules regarding migration can data owners specifyand at what granularity? 2) What happens when different migrationtypes and sharing rules conflict? 3) Will data ownership changeafter shared data is migrated?The granularity of specifying shared data rules depends on appli-cations. Some applications may allow users to choose each specificpiece of data that they want to share with other users. Other appli-cations may only allow high-level specifications (e.g., allowing allcomments to be shared with the friends group). Stencil providesapplications with low-level APIs to store the migration instructionsand mark shared data. A piece of data by default has no sharingspecifications, and in this case it cannot be migrated by others otherthan the owner in any type of migration.While Stencil provides a general way for applications to spec-ify sharing policies, selecting the right policy may not always bestraightforward, especially when applications have different seman-tics. For example, if Alice wants to migrate a post shared with agroup of users to another application that does not have groupsharing, but only global visibility/invisibility, then this migrationcould violate the privacy setting of the shared post in the sourceapplication. Stencil places the onus of negotiating the differences inapplication semantics when writing schema mappings on applica-tion developers. However, we expect this translation burden to bealleviated as cross-application migration becomes more commonand therefore standardized through increased adoption of Stencil.Sharing data with other users for migration may result in differ-ent behaviors in different types of migration. In deletion migration,a comment owned by Bob and shared with Alice will be deletedfrom the source application if Alice migrates from that applicationto a new application. In independent and consistent migrations,the comment will stay in both source and destination applications.Stencil allows users to specify the types of migration they want toallow their data to be migrated in. For example, Alice allows her

Figure 2: Stencil architecture friends to migrate her posts only if they are doing independent mi-gration. If Bob wants to migrate using a different type of migration,Alice’s posts will not be migrated.After shared data has been migrated to a new application, Stencilenables the original owners of the data to retain their ownershipby tracking data relationships before migration, maintaining datatransformations, including user and data identity changes, and pre-serving data relationships after migration. Even if data owners havenot yet migrated to the new application, Stencil allows applicationsto use placeholders to re-link data with owners if they join later.

Stencil allows applications to use their own data models with datastored in their own storage systems. Stencil uses their data APIsto migrate data between their storage systems. Figure 2 showsStencil’s architecture consisting of three layers: a communicationlayer, a control layer, and a storage layer.The communication layer manages the secure communicationand authentication between different parties. To minimize the barri-ers to adoption, our current implementation allows users to interactwith Stencil through a client running on their local machines. Theclient authenticates and communicates with one of the Stencil mi-gration servers, and authorizes the servers to perform migrationsfor users. Since these servers store data relationships and are re-sponsible for migrating user data (although they do not store userdata or data bags), for the purpose of data privacy/security, usersand applications should only use trusted Stencil migration servers.We expect to bootstrap the overall system by running the initialStencil migration servers, but we envision these servers to eventu-ally be run by users themselves, trusted third parties, or most likely ithin the applications themselves as a migration service. Stencilvalidation servers should be run by destination applications sincethese servers run a validation algorithm to preserve the semanticsof destination applications.The control layer consists of migration and validation controllers.The controllers run on each Stencil server to create and managemigration and validation threads and relationship trackers locally,and coordinate with other controllers. Based on queries and ap-plication specifications, migration threads perform different typesof migration, and validation threads validate data to only allowmigrated data correctly obeying destination application semanticsto be displayed. These threads identify, collect, and re-integratedangling data. Relationship trackers are responsible for preservingdata relationships after migration. The control layer passes downto the storage layer the data relationships before migration, howdata changes during migration, and what data becomes danglingafter migration.The storage layer primarily consists of two tables and data bags.Stencil servers use the reference table and attribute table to storedata relationships before migration and data changes during migra-tion respectively. The data in tables is necessary for the control layerto re-link migrated data and re-integrate dangling data. Danglingdata is stored in data bags, and can be downloaded and stored bydata owners or left in applications according to owner preferences. We implemented Stencil in 26,329 lines of Go. The implementationconsists of several key components: the migration (5,192 lines) andvalidation controllers (3,797 lines), the relationship tracker (2,070lines), and PSM (1,381 lines). We use goroutines and channels formigration and validation thread parallelism, and use PostgreSQL inthe storage layer to store data. Stencil currently uses a JSON-basedformat for the specifications of DAGs and schema mappings, anddoes not handle data model changes. Throughout this section, wewill use a running example (Figure 3) to explain how migration isimplemented in Stencil.

Schema mappings define how data can be converted between differ-ent schemas in different applications. The Schema Mappings box inFigure 3 shows mappings from three nodes in the source application(rounded rectangles) to three nodes in the destination application(rectangles), and unmapped attributes, such as

Post.loc , are notshown there. In this example, most data does not change much dur-ing migration (e.g., from

Post.text to Status.body ), except that id of each node has to be re-generated with newID() provided bythe destination application to avoid id conflicts. For the attributes,such as Reply1.comment_of , referring to changed id , Stencil usesa relationship tracker to update them accordingly (Section 5.5). Migration controllers can start many migration threads that mi-grate data concurrently. Each thread runs migration algorithmsindependently and synchronizes with others implicitly through anapplication’s storage system rather than by sending messages toeach other directly, which simplifies the design of algorithms. Since Stencil ensures the continuity of application services, migrating alldata could take an arbitrarily long time, especially if new data iscontinually being created. In this situation, the application couldallow users to migrate data created before a specified time. Stencilcurrently supports deletion and independent migrations, but doesnot implement consistent migration. In our current architecture,consistent migration would require significant application changesto support data consistency across applications.

To preserve the semantics of source ap-plication services, a migration thread runs the deletion migrationalgorithm to 1) move data from the source application DAG in theorder in which a node is migrated after migrating the nodes de-pending on that node, and 2) identify and put dangling data into data bags . During a user’s deletion migration, Stencil prevents newconcurrent deletion migrations by that user from either the sourceor destination application to other applications to avoid incompletemigrations due to conflicts.The example in Figure 3 shows how Stencil performs deletionmigration. In this example, Alice migrates from the source applica-tion to the destination application. The DAG (Source) box on theleft presents the state of the DAG in the source application beforemigration, and the dashed rectangles indicate the migrated nodes.The DAG (Destination) box on the right presents the state of theDAG in the destination application after migration.A migration thread starts migration by copying

Alice’s Root2 in DAG (Source) on the left to

Alice’s Root3 in DAG (Destination)on the right for validation threads to validate data in the destina-tion application, and then traverses the DAG in DAG (Source) byfollowing the data relationships until it reaches

Comment3 , whichhas no next node. The thread migrates

Comment3 owned by Aliceto

Reply2 , but

Reply2 ends up in Alice’s Data Bags on the right.The thread then goes back to either

Comment2 or Alice’s Root2 depending on how it traverses the DAG. Supposing it goes backto

Comment2 , it cannot migrate this node as the node belongs toBob and is not shared with Alice. Then the thread goes back to

Comment1 , and finds that it can migrate

Comment1 . Before migrating

Comment1 , it needs to put

Comment2 into Bob’s Data Bags on theright since

Comment2 only depends on

Comment1 to function andwill become dangling after

Comment1 is migrated.Generally, before migrating a node or putting a node into databags, a migration thread needs to put any other node only dependingon that node into data bags. After migrating

Comment1 to Reply1 ,supposing the thread goes back to

Post1 , it can migrate

Post1 since the node is shared to Alice. However, there is no mappingfor the attributes lang and loc in Post1 , so part of

Post1 is putinto Alice’s data bags and the other part is migrated to

Status1 .(In future work, other than deleting a shared node from the sourceapplication, we expect to allow owners to choose to keep a sharednode in place if the node is being migrated by others in a deletionmigration; however, this will lead to multiple copies of data andthus version control is required.) Finally, after migrating

Status2 inAlice’s Data Bags (Section 5.3), the thread deletes

Alice’s Root2 .The DAG (Source) on the right shows that only

Bob’s Root1 isstill in the DAG of the source application after migration.

In independent migration, as data isnot deleted in the source application, data will not become dangling igure 3: A simplified migration example. Alice migrates from the source application to the destination application usingStencil’s deletion migration. The Schema Mappings box shows the mappings from the source application to the destinationapplication. Below the Schema Mappings box, the states of the DAGs and Data Bags before and after migration are presented. due to missing related data. For example, in Figure 3, rather thanthe DAG (Source) after deletion migration shown in the figure, theDAG (Source) after independent migration will be identical to theDAG (Source) before migration. The DAG (Destination) after inde-pendent migration will remain identical to the DAG (Destination)after deletion migration. However, after independent migration,the Data Bags will only contain Post1 and

Reply2 . This is because

Post1 lacks schema mappings for migration to the destination ap-plication, and

Reply2 is still dangling at the destination application.Also, Bob’s

Comment2 will not be in Bob’s Data Bags after migra-tion because it does not become dangling in the DAG (Source) afterindependent migration.There is no need to preserve the migration order in indepen-dent migration, or migrate data at a node granularity; instead, to accelerate migration, the independent migration algorithm simplyfollows data ownership and sharing relationships to copy all datato the destination application. Given concurrent migration threads,during migration, migrated data in the source application needs tobe marked as migrated to avoid being migrated more than once. Ifusers migrate back to an application, there could be duplicate datawith different identities, which we expect to address in future workusing version control.

Stencil stores dangling data in data bags . Data bags are an abstrac-tion composed of dangling data metadata stored on Stencil servers,and the dangling data itself stored by data owners themselves orleft in applications according to owners’ preferences. Migration hreads migrate data not only in the DAG as examined in Section 5.2but also in data bags. Since migrating data from data bags deletesdata in the bags regardless of migration type, Stencil servers onlyallow a user to perform one migration at a time to avoid conflicts.Migrating data bags has two phases.In the first phase, migration threads attempt to migrate data bymerging data in data bags and the DAG of the source applicationto preserve the relationships between data in the two places. Forinstance, in Figure 3, if Alice wants to migrate back to the sourceapplication later, migration threads will merge Post1 in Alice’s DataBags with

Status1 , migrated from

Post1 before, and eventuallymigrate a complete

Post1 back to the source application. To achievethis, when data becomes dangling, node IDs are stored with datain data bags. While migrating a node, a migration thread tracksdata changes (e.g.,

Post1 with id Status1 with id Status2 which was put in Alice’sdata bags in an earlier migration is migrated from Alice’s data bagsto

Status2 in the DAG (Destination) in this phase.

As with migration, validation controllers can also start many val-idation threads running the validation algorithm independentlyon Stencil servers. To preserve the semantics of the destinationapplication, a validation thread checks and allows migrated datato be displayed in the following order: a node is displayed beforethe nodes depending on that node (exceptions can be specified invalidation settings), and the migrating user’s root node is alwaysallowed to be displayed regardless of other nodes. For the datathat fails validation, a validation thread puts it into data bags. Tominimize service interruption for users, the validation algorithmis two phases. The first phase is run continuously until the end ofa migration to enable a migrating user to use the services of thedestination application during migration to shorten service inter-ruption for already migrated data. The second phase is run after amigration is completed; validation threads validate the remainingundisplayed data in just one round since there is no need to waitfor unmigrated data.Following the example in Figure 3, a validation thread randomlypicks (more advanced picking rules could be used) and validatesmigrated but undisplayed data. The numbers in the dotted circles in-dicate the sequence of nodes arriving at the destination application.Even though

Reply1 is migrated to the destination application be-fore

Status1 , a validation thread only allows

Reply1 to be eligiblefor display after

Status1 is allowed to be displayed since

Reply1 depends on

Status1 to function. According to the validation rulesin this example, even though Bob is not in the destination applica-tion,

Status1 is valid for display once it arrives at the destinationapplication because its sharing relationship is satisfied (

Alice’sRoot3 arrives first).

Status2 is also valid for display once it arrivesbecause its ownership relationship is satisfied. All the migrated nodes except

Reply2 can be displayed after passing validation inthe first phase;

Reply2 is eventually put into Alice’s data bags inthe second phase because the data it depends on is missing.

Due to the differences of application semantics, migration couldbreak data relationships. For instance, in Figure 3,

Comment1 ismigrated to

Reply1 . If

Comment1.reply_to is directly copied to

Reply1.comment_of , then

Reply1 will reply to a wrong Statuswith id Status1 . This kind of problem exists notonly for data dependencies but also for the ownership and sharingrelationships.Stencil uses a relationship tracker to handle changing identitiesin these scenarios. To preserve data relationships after migration,the relationship tracker runs during migration and validation tomaintain data state changes to re-link data after migration.There could be cases where data relating to each other is not inthe same application. Stencil allows applications to specify specificplaceholders in data attributes to indicate related data is in anotherapplication. For example, a comment is not migrated with its cor-responding post, so there could be a placeholder in the reply_to attribute of the comment.

Since we want Stencil to migrate either all or none, implementingmigration as a traditional cross-database transaction would satisfythe requirement. However, this could block reads and writes whilemigrating bulky data, which interrupts ongoing services and in-creases user-perceived migration latency. Thus, we implement auser’s migration as a migration transaction .In a migration transaction, all migration and display operationsare performed in separate and individual database transactions.This ensures the atomicity and durability of migrating each nodeand also allows users to continue interacting with their data duringmigration. As we cannot rely on a database to roll back a commit-ted database transaction, Stencil keeps its own write-ahead log toensure the atomicity of a migration transaction by logging rollbackinformation for each migration operation. To perform a rollback,the migration and validation threads are first stopped. Then Stenciluses the transaction log while traversing the DAG of the destinationapplication to identify migrated data and roll back the migration.Our current implementation does not handle dirty reads at thedestination application.

Since defining scheme mappings and DAGs require us to accessapplication data schemas, we were not able to evaluate Stencil withclosed-source applications. Instead, we selected and deployed threeopen-source applications: Mastodon [37], Diaspora [13], and GNUSocial [20]. In addition, we implemented a Twitter clone. We inte-grated each application with Stencil by writing DAG specificationsand schema mappings.Since DTP [12] is not yet released, as a baseline for comparison,we implemented a naive migration system without the correctnessguarantees and other features of Stencil. The naive system migratesdata using the same schema mappings that we use for Stencil. The igure 4: Anomalies prevented by Stencil in the source (Di-aspora) and destination (Mastodon) applications naive system migrates in arbitrary but intuitive orderings; for ex-ample, migration from Diaspora is in the following order: posts,likes, comments, conversations, messages, and users’ other data.To drive our evaluation, we implemented a social-network datagenerator in 6,508 lines of Go to generate synthetic data. We gener-ated synthetic datasets based on the prior work analyzing socialnetwork data [29, 35, 62]. Each user was assigned a popularity scorein the Pareto distribution [2]. We then probabilistically assignedposts, likes, comments, etc. to users proportional to their popularity.Conversations pairs were assigned only between friends. We gener-ated four Diaspora datasets with 1,000 users (Diaspora 1K dataset),10,000 users (Diaspora 10K dataset), 100,481 users (Diaspora 100Kdataset), and 1,008,108 users (Diaspora 1M dataset). We also gen-erated three datasets each with 10,000 users for the other threeapplications (Mastodon 10K datasets, GNU Social 10K datasets,and Twitter 10K datasets). In the Diaspora 1M dataset, there are2,676,829 follows (81,169 friends), 7,562,681 posts, 30,626,969 likes,13,481,411 comments, 81,119 conversations, 5,400,995 messages,46,785,209 notifications, and 3,692,680 photos. In total, the Diaspora1M dataset has about 51 GB data in the database and 10 TB me-dia on disk. Unless otherwise stated, all evaluations used a singlemigration thread and the Diaspora 1M dataset.We performed the evaluation in a cloud environment similarto one we expect Stencil would be deployed within. Specifically,we performed migrations in the standalone databases of the testapplications, and moved data between databases from a virtualmachine (VM) with 128 GB memory and 2 cores to a blade serverwith 32 GB memory and 16 cores. The two machines were locatedin a campus data center. As the primary focus of Stencil is to address the anomalies consid-ered in Section 2.2, in our first evaluation, we show what happenswhen the naive migration system migrated 1,000 users randomlyfrom the Diaspora 1K dataset to Mastodon one at a time. The sys-tem deleted dangling data after each migration since it cannotre-integrate dangling data.Figure 4 shows the anomalies produced by the naive systemthrough the course of migration until all users are migrated. Thetotal object count in Diaspora is defined as the number of objectsbefore all migrations, and the total object count in Mastodon is

Table 1: Lines of code (LOC) in JSON for DAG specifications

Diaspora Mastodon Twitter GNU Social

383 275 254 195

Table 2: LOC in JSON for schema mappings between appli-cations without PSM

Diaspora Mastodon Twitter GNU SocialDiaspora - 282 197 156

Mastodon

284 - 162 138

Twitter

228 172 - 101

GNU Social

185 181 107 -

Table 3: Maximum LOC in JSON for schema mappings be-tween applications with PSM

Diaspora Mastodon Twitter GNU SocialDiaspora - 154 87 *

Mastodon

154 - * 67

Twitter

139 * - 5

GNU Social * 65 22 - the number of objects after all migrations. The percentage of dan-gling objects is the number of dangling objects produced in anapplication divided by the corresponding total objects. We observethat the percentage in Diaspora increases to nearly 40%. Duringa user’s migration, other users’ data may become dangling; e.g.,Bob’s comments become dangling without Alice’s post. Also, somedata becomes dangling because there is no mapping from Diasporato Mastodon. The percentage in Mastodon increases to over 70%because some migrated data fails validation due to lacking the datait depends on; e.g., Alice’s migrated likes on Bob’s post becomedangling because Bob’s post is still in Diaspora.Stencil prevents all anomalies in both applications by identifyingall the dangling data, storing it in data bags, and re-integrating itin future migrations.

A key requirement for the adoption of Stencil is the ease with whichcloud services can be integrated with the Stencil architecture. Foreach of the applications, two of the student authors wrote DAGspecifications and schema mappings. Table 1 summarizes LOCrequired to write DAG specifications for test applications. Each rowin Table 2 shows LOC to directly define schema mappings from theapplication in that row to the other applications.To evaluate how PSM helps ease Stencil integration, we startedan experiment by selecting a ‘new’ application to be integrated intoStencil and assumed that the other three applications already hadmappings between each other. We then selected a bootstrap applica-tion that is the application most similar to the new application sinceit would make sense to write mappings between applications withsimilar data models. Among our test applications, Diaspora is mostsimilar to GNU Social, and Mastodon is most similar to Twitter. Wethen used PSM to derive mappings between the new applicationand other applications, and calculated the LOC to change derivedmappings to be identical to the mappings obtained without PSM. igure 5: Percentage of data objects originally in each ofthe four applications that are successfully migrated back tothose applications after migrating through a series of threeother applicationsFigure 6: Migration rates of different migration types We repeated the experiment four times by treating each test appli-cation as the new application. Table 3 summarizes the maximumLOC required. Overall, we find that PSM can reduce the task ofspecifying schema mappings by roughly half.

One of the major benefits of Stencil is the ability to identify, store,and re-integrate dangling data. Figure 5 shows how Stencil re-integrated dangling data using data bags after 100 users migratedfrom each of the four test applications (10K datasets), through aseries of the other three other applications, and eventually back tothe original application. We observe that Stencil with data bags caneffectively re-integrate dangling data, and migrate the vast majorityof user data back to the original applications. When the startingapplication is Diaspora, GNU Social, or Twitter, some data cannotbe migrated back because our current Stencil implementation doesnot support Mastodon’s DAG specification of owner-less nodes thatare shared by multiple users. This causes some nodes to get stuckin Mastodon when data from other applications traverse throughMastodon. We plan to improve Stencil to address this issue in ourfuture work.

Figure 7: Cumulative distributions of the percentage of timewhen data is unavailable in Stencil compared to Naive+

Figure 6 compares migration rates when migrating the same 100users in the same order using Stencil deletion migrations withand without running the validation algorithm, Stencil independentmigrations, and the naive system. We can observe that runningthe validation algorithm adds little overhead to migrations butpreserves the semantics of the destination application. The inde-pendent migrations are about 30 times faster than the deletionmigrations mainly because, compared with deletion migrations,independent migrations only need to traverse the data ownershipand sharing relationships to copy data. The independent migra-tions can achieve nearly the same performance as the naive systemmigrations while ensuring correctness.

Figure 7 compares how Stencil deletion migration and a naive mi-gration system affect application service continuity. We define thedowntime of a data object as the period of time between when theobject becomes inaccessible in the source application, and when itpasses validation and is allowed to be displayed in the destinationapplication. The total time of a migration is the period betweenwhen the first object is deleted from the source application, andwhen the last object is allowed to be displayed in the destinationapplication. The percentage of time when data is unavailable isthe downtime of a data object during migration divided by thetotal migration time. We augmented the naive system by addinga correctness guarantee in the destination application, and call itNaive+. Specifically, we locked up migrated data to prevent thedestination application from displaying it until a migration wascomplete, and then validated and displayed the data using the vali-dation algorithm. We migrated the same 100 users from Diasporato Mastodon using the two systems to plot Figure 7.We observe that, compared with Naive+, Stencil effectively re-duces the percentage of time when data is unavailable. In Stencil,the two-phase validation algorithm continuously allows valid mi-grated data to be displayed during migration. There is a long tail,however, since some data cannot be displayed until its related dataeventually arrives.

Figure 8 shows how the deletion and independent migration al-gorithms and the validation algorithm scale with the number of igure 8: Migration and validation times with different num-bers of DAG nodes and edges, and each linear regression linefitting the corresponding set of pointsTable 4: Total migration times in the four Diaspora datasets

1K dataset 10K dataset 100K dataset 1M dataset

DAG nodes and edges. We obtained the figure by migrating thesame 100 users using Stencil deletion and independent migrations.Before migration, we calculated the number of nodes and edges ofa user to be migrated for the migration algorithms. We disabledthe validation process until a migration was complete to avoid in-fluencing migration times. When a migration was complete, wecalculated the nodes and edges of a migrated user in the destinationapplication for the validation algorithm. A user always has fewernodes and edges after migration as some data cannot be migrateddue to lacking mappings, and a user also has fewer connectionswith other users in the new application. After that, we started vali-dation threads to validate data there. We can see that all the threealgorithms scale linearly with edges and nodes.To evaluate how the dataset size may influence performance,we selected 100 users with same number of nodes and similar mi-gration sizes across the four Diaspora datasets, and migrated themusing deletion migrations. Table 4 shows the total migration timesobtained in the four datasets. We observe that when the datasetsize is below 1 million, it does not have much influence on the totalmigration time, but when the dataset size reaches 1 million, thetotal migration time is about 3 times of that in the 100K dataset.This is because basic database operations, such as JOIN and SELECTqueries, take much longer as the dataset grows large.

Systems research and practice has cycled between centralized anddecentralized designs several times over the past 50 years. Recentcalls for a transition back to decentralization have often focused oninfrastructure—the low-level systems (e.g., distributed ledgers [31,59]). Some attention has been paid to applications—user-facingcode achieving some specific functionality, such as implementingdistributed private messaging systems [28, 56, 57]. However, notonly have these discussions typically occurred in isolation, but alsothey have been premised on being the sole replacement for a prior(current) centralized system.

The history of popular Web platforms should teach us that pick-ing winners is incredibly difficult. Web search evolved througha bevy of services, from WebCrawler to Altavista to Google. So-cial media went through a similar transition, from SixDegrees toFriendster to Myspace to Facebook. Recent fragmentation of thelatter—younger users tending to use a variety of services, includingSnapchat, Instagram, YouTube, and TikTok—presages the move topluralism we argue for, but not in the manner one would hope.Pluralist architectures have been explored in Internet architecturein the past [10, 40, 47], in which they have been seen as a way toprevent lock-in to any replacement to the IP-based narrow waist.One layer up the stack, pluralism has been explored in Internetrouting, with an aim to enable end hosts to select among manypossible paths through the Internet rather than a single defaultpath [5, 49, 53, 63]. However, this concept has yet to be extendedto the cloud; in the cloud context, pluralism is harder as applica-tions have greater diversity from the perspective of the underlyingarchitecture.

Data migration in the context of databases, key-value stores, andsimilar domains has been studied extensively in various parts ofthe literature [9, 11, 12, 14, 15, 25, 26, 30, 32, 41, 44, 45]. Kown andMoon migrate the states of a web application from one device toanother by reconstructing JavaScript closures [26]. Rocksteady [25]is a migration technique for the RAMCloud scale-out in-memorykey-value store. It aims to migrate data fast while minimizing theimpact on response time. Squall [14] aims to achieve fine-grained re-configuration in partitioned main-memory DBMSs by interleavingdata migration with executing transactions. Like Stencil, such workhas aimed to handle common problems while migrating data amongdistributed storage systems, such as live migration (i.e., movingdata from the source to the destination while keeping the servicealive). However, these systems mainly migrate data within a singleapplication rather than across different applications with differentsemantics. Stencil migrates data between different applications byreconciling semantic differences and preserving contexts.Tanon et al. designed a tool called Primary Sources [46] to mi-grate data between collaborative knowledge bases (i.e. Freebase [17]to Wikidata [60]). Like Stencil, Primary Sources reconciles the dif-ferences between two application data models, but their tool alsoconsiders the non-technical aspects (e.g. community culture, licens-ing, and requirements of data). Primary Sources is a crowd-sourcedhuman curation tool to verify data from outside datasets and displayit in Wikidata, whereas Stencil is a general architecture designed tomigrate data between different data models in ecosystem of socialmedia applications.The closest related work to ours, DTP [12], attempts to enableusers to migrate their data between applications. Unlike Stencil,however, DTP advocates using a single and standard data model,and requires application writers to write data and authenticationadapters translating a given provider’s APIs into that standardmodel. However, such a standard data model would likely lead tolimited and inconsistent feature support, and DTP does not considerthe anomalies that could arise during migration. olid [52] aims to provide data independence from applications.Users store their data in online storage space called personal on-line datastores (pods). However, pods and applications must bedeveloped by strictly following a number of Solid protocols, whichcombine several Web standards such as WebID, to enable usersto switch between applications. Solid does not directly examinehow users’ switches between applications affect application ser-vices. BSTORE [7] and Oort [6] also achieve some degree of dataindependence from applications by decoupling data storage fromapplications and by providing unified interfaces for applications toaccess data, but they don’t provide ways for users’ data to be reusedby applications with different semantics such as data models.MgCrab [30] uses determinism to maintain the consistency ofdata on the source and destination nodes. However, it can onlybe used in a deterministic database system, and cannot be easilyapplied in the scenarios of data migration between different appli-cations. Similar to how Stencil uses PSM, MWEAVER [48] appliesthe concept of PSM in sample-driven schema mappings to generatecomplete schema mapping paths between tuples through possiblepairwise mapping paths between samples. In this paper, we have proposed a pluralist architecture that en-ables the co-existence of social network applications, and seamlessdata migration between them. We systematically considered a widerange of anomalies that may arise during migration, motivatingour design of a migration-aware system. We implemented a pro-totype of Stencil using a combination of mechanisms to realizemigration while handling various possible anomalies. We evaluatedStencil in terms of its correctness as well as ease of integration,performance, and scalability. Based on our evaluation, Stencil canbe incrementally deployed for real applications that allows theseamless migration of user data between cloud services.

REFERENCES [1] Muneeb Ali, Ryan Shea, Jude Nelson, and Michael J Freedman. 2017. Blockstack:A new decentralized internet.

Whitepaper, May (2017).[2] BC Arnold. 1983. Pareto Distributions, International Cooperative PublishingHouse.[3] beaker 2017. Beaker: A peer-to-peer Web browser. https://github.com/beakerbrowser/beaker.[4] Juan Benet. 2014. Ipfs-content addressed, versioned, p2p file system. arXivpreprint arXiv:1407.3561 (2014).[5] Ignacio Castro, Aurojit Panda, Barath Raghavan, Scott Shenker, and SergeyGorinsky. 2015. Route bazaar: Automatic interdomain contract negotiation. In

Workshop on Hot Topics in Operating Systems .[6] Tej Chajed, Jon Gjengset, M Frans Kaashoek, James Mickens, Robert Morris, andNickolai Zeldovich. 2016. Oort: User-Centric Cloud Storage with Global Queries.(2016).[7] Ramesh Chandra, Priya Gupta, and Nickolai Zeldovich. 2010. Separating webapplications from user data storage with BSTORE. In

Proceedings of the 2010USENIX conference on Web application development . USENIX Association, 1–1.[8] cjdns 2017. An encrypted IPv6 network using public-key cryptography foraddress allocation and a distributed hash table for routing. https://github.com/cjdelisle/cjdns.[9] Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Chris-tian Limpach, Ian Pratt, and Andrew Warfield. 2005. Live migration of virtualmachines. In

Proceedings of the 2nd conference on Symposium on Networked Sys-tems Design & Implementation-Volume 2 . 273–286.[10] Jon Crowcroft, Steven Hand, Richard Mortier, Timothy Roscoe, and AndrewWarfield. 2003. Plutarch: an argument for network pluralism.

ACM SIGCOMMComputer Communication Review

33, 4 (2003), 258–266.[11] Sudipto Das, Shoji Nishimura, Divyakant Agrawal, and Amr El Abbadi. 2011.Albatross: lightweight elasticity in shared storage databases for the cloud using live data migration.

Proceedings of the VLDB Endowment

4, 8 (2011), 494–505.[12] datatransferprojectoverview 2018. Data Transfer Project Overview and Funda-mentals. https://datatransferproject.dev/dtp-overview.pdf.[13] diaspora 2020. The diaspora* Project. https://diasporafoundation.org/.[14] Aaron J Elmore, Vaibhav Arora, Rebecca Taft, Andrew Pavlo, Divyakant Agrawal,and Amr El Abbadi. 2015. Squall: Fine-grained live reconfiguration for partitionedmain memory databases. In

Proceedings of the 2015 ACM SIGMOD InternationalConference on Management of Data . ACM, 299–313.[15] Aaron J Elmore, Sudipto Das, Divyakant Agrawal, and Amr El Abbadi. 2011.Zephyr: live migration in shared nothing databases for elastic cloud platforms.In

Proceedings of the 2011 ACM SIGMOD International Conference on Managementof data

Handbook of Peer-to-Peer Networking

Proceed-ings of the 26th Symposium on Operating Systems Principles . ACM, 390–405.[26] Jin-woo Kwon and Soo-Mook Moon. 2017. Web application migration withclosure reconstruction. In

Proceedings of the 26th International Conference onWorld Wide Web . 133–142.[27] Protocol Labs. 2017. Filecoin: A Decentralized Storage Network. https://filecoin.io/filecoin.pdf.[28] David Lazar, Yossi Gilad, and Nickolai Zeldovich. 2018. Karaoke: DistributedPrivate Messaging Immune to Passive Traffic Analysis. In

First Monday

18, 5 (2013).[30] Yu-Shan Lin, Shao-Kan Pi, Meng-Kai Liao, Ching Tsai, Aaron Elmore, and Shan-Hung Wu. 2019. MgCrab: transaction crabbing for live migration in deterministicdatabase systems.

Proceedings of the VLDB Endowment

12, 5 (2019), 597–610.[31] Joshua Lind, Oded Naor, Ittay Eyal, Florian Kelbert, Emin Gün Sirer, and Peter Piet-zuch. 2019. Teechain: a secure payment network with asynchronous blockchainaccess. In

Proceedings of the 27th ACM Symposium on Operating Systems Principles .63–79.[32] Haikun Liu, Hai Jin, Xiaofei Liao, Liting Hu, and Chen Yu. 2009. Live migrationof virtual machine based on full system trace and replay. In

Proceedings of the18th ACM international symposium on High performance distributed computing .101–110.[33] Tai Liu. 2020.

Re-democratizing the Internet in an Era of Feudalism . Ph.D. Disser-tation. New York University Tandon School of Engineering.[34] Tai Liu, Zain Tariq, Jay Chen, and Barath Raghavan. 2017. The barriers tooverthrowing internet feudalism. In

Proceedings of the 16th ACM Workshop onHot Topics in Networks . ACM, 72–79.[35] Yao Lu, Peng Zhang, Yanan Cao, Yue Hu, and Li Guo. 2014. On the frequencydistribution of retweets.

Procedia Computer Science

31 (2014), 747–753.[36] maidsafe 2017. MaidSafe - The New Decentralized Internet. https://maidsafe.net/.[37] mastodon 2020. Giving social networking back to you - the Mastodon Project.https://joinmastodon.org/.[38] mastodonschema 2020. Mastodon. https://github.com/tootsuite/mastodon/blob/master/db/schema.rb.[39] matrix 2017. Matrix.org. http://matrix.org/.[40] James McCauley, Yotam Harchol, Aurojit Panda, Barath Raghavan, and ScottShenker. 2019. Enabling a permanent revolution in internet architecture. In

Proceedings of the ACM Special Interest Group on Data Communication . ACM,1–14.[41] Takeshi Mishima and Yasuhiro Fujiwara. 2015. Madeus: Database live migrationmiddleware under heavy workloads for cloud environment. In

Proceedings ofthe 2015 ACM SIGMOD International Conference on Management of Data . ACM,315–329.[42] namecoin 2020. Namecoin. https://namecoin.org/.[43] nextcloud 2017. Nextcloud. https://nextcloud.com/.

44] John Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Lev-erich, David Mazières, Subhasish Mitra, Aravind Narayanan, Guru Parulkar,Mendel Rosenblum, et al. 2010. The case for RAMClouds: scalable high-performance storage entirely in DRAM.

ACM SIGOPS Operating Systems Review

43, 4 (2010), 92–105.[45] Ippokratis Pandis, Pinar Tözün, Ryan Johnson, and Anastasia Ailamaki. 2011. PLP:page latch-free shared-everything OLTP.

Proceedings of the VLDB Endowment

Proceedings of the 25th international conference on world wide web . 1419–1428.[47] Lucian Popa, Ali Ghodsi, and Ion Stoica. 2010. HTTP as the narrow waist of thefuture Internet. In

Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topicsin Networks . ACM, 6.[48] Li Qian, Michael J Cafarella, and HV Jagadish. 2012. Sample-driven schemamapping. In

Proceedings of the 2012 ACM SIGMOD International Conference onManagement of Data . ACM, 73–84.[49] Barath Raghavan, Patrick Verkaik, and Alex C Snoeren. 2008. Secure and policy-compliant source routing.

IEEE/ACM Transactions On Networking

17, 3 (2008),764–777.[50] ring 2017. Ring. https://ring.cx/en/news.[51] riot 2017. Riot - open team collaboration. https://about.riot.im/.[52] Andrei Vlad Sambra, Essam Mansour, Sandro Hawke, Maged Zereba, NicolaGreco, Abdurrahman Ghanem, Dmitri Zagidulin, Ashraf Aboulnaga, and TimBerners-Lee. 2016. Solid: A platform for decentralized social applications basedon linked data.

MIT CSAIL & Qatar Computing Research Institute, Tech. Rep. (2016).[53] Stefan Savage, Thomas Anderson, Amit Aggarwal, David Becker, Neal Cardwell,Andy Collins, Eric Hoffman, John Snell, Amin Vahdat, Geoff Voelker, and J. Zahorjan. 1999. Detour: Informed Internet routing and transport.

IEEE Micro

Proceedingsof the 26th Symposium on Operating Systems Principles . 423–440.[57] Jelle Van Den Hooff, David Lazar, Matei Zaharia, and Nickolai Zeldovich. 2015.Vuvuzela: Scalable private messaging resistant to traffic analysis. In

Proceedingsof the 25th Symposium on Operating Systems Principles

Washington,DC: Pew Internet & American Life Project. Retrieved May

Proceedings of ACM SIGCOMM .[64] zeronet 2017. ZeroNet: Decentralized websites using Bitcoin crypto and theBitTorrent network. https://zeronet.io/..[64] zeronet 2017. ZeroNet: Decentralized websites using Bitcoin crypto and theBitTorrent network. https://zeronet.io/.