[PDF] Continuous Prefetch for Interactive Data Applications

Abstract

Interactive data visualization and exploration (DVE) applications are often network-bottlenecked due to bursty request patterns, large response sizes, and heterogeneous deployments over a range of networks and devices. This makes it difficult to ensure consistently low response times (< 100ms). Khameleon is a framework for DVE applications that uses a novel combination of prefetching and response tuning to dynamically trade-off response quality for low latency. Khameleon exploits DVE's approximation tolerance: immediate lower-quality responses are preferable to waiting for complete results. To this end, Khameleon progressively encodes responses, and runs a server-side scheduler that proactively streams portions of responses using available bandwidth to maximize user's perceived interactivity. The scheduler involves a complex optimization based on available resources, predicted user interactions, and response quality levels; yet, decisions must also be real-time. To overcome this, Khameleon uses a fast greedy approximation which closely mimics the optimal approach. Using image exploration and visualization applications with real user interaction traces, we show that across a wide range of network and client resource conditions, Khameleon outperforms classic prefetching approaches that benefit from perfect prediction models: response latencies with Khameleon are never higher, and typically between 2 to 3 orders of magnitude lower while response quality remains within 50%-80%.

Full PDF

CContinuous Prefetch for Interactive Data Applications

Technical Report ∗ Haneen Mohammed ‡ , Ziyun Wei § , Eugene Wu ‡ , Ravi Netravali †‡ Columbia University, § Cornell University, † UCLA

ABSTRACT

Interactive data visualization and exploration (DVE) appli-cations are often network-bottlenecked due to bursty requestpatterns, large response sizes, and heterogeneous deploy-ments over a range of networks and devices. This makes itdiﬃcult to ensure consistently low response times ( < Khameleon is a framework for DVE applications that usesa novel combination of prefetching and response tuning todynamically trade-oﬀ response quality for low latency.

Khameleon exploits DVE’s approximation tolerance: im-mediate lower-quality responses are preferable to waiting forcomplete results. To this end,

Khameleon progressivelyencodes responses, and runs a server-side scheduler thatproactively streams portions of responses using availablebandwidth to maximize user-perceived interactivity. Thescheduler involves a complex optimization based on availableresources, predicted user interactions, and response qual-ity levels; yet, decisions must also be made in real-time.To overcome this,

Khameleon uses a fast greedy heuris-tic that closely approximates the optimal approach. Usingimage exploration and visualization applications with realuser interaction traces, we show that across a wide rangeof network and client resource conditions, Khameleon out-performs existing prefetching approaches that beneﬁt from perfect prediction models:

Khameleon always lowers re-sponse latencies (typically by 2–3 orders of magnitude) whilekeeping response quality within 50–80%.

1. INTRODUCTION

Interactive data visualization and exploration (DVE) ap-plications, such as those in Figure 1, are increasingly popularand used across sectors including art galleries [15], earthscience [41], medicine [11], ﬁnance [49], and security [29].Like typical web services, DVE applications may be run onheterogeneous client devices and networks, with users expect-ing fast response times under 100 ms [18, 44, 63]. However,the resource demands of DVE applications are considerablymagniﬁed and highly unpredictable, making it diﬃcult toachieve such interactivity.Traditional interactive applications are based on point-and-click interfaces such as forms or buttons, where theremay be seconds or minutes of delay between user requests.In contrast, DVE applications update the visual interfacecontinuously as the user drags, hovers, or otherwise manip-ulates the interface [20] (Figure 1). For example, all of thecharts in Figure 1b are updated continuously as the user ∗ this is an extended version of a paper that appeared in VLDB 2020 . (a)(b) Figure 1: Example interactive DVE applications. (a) Im-age exploration application; hovering over the mosaic ofthumbnails on the left loads the high resolution image on theright. (b) Falcon interactive visualization [53]; users dragand resize range ﬁlters on any subset of charts, which triggersupdates to the others. We evaluate these applications withand without

Khameleon in § bursty [8] user interactions that gener-ate a huge number of back-to-back requests with nearly no“think time” between them, and issue data-dense requestsfor tens of kilobytes to megabytes of data in order to renderdetailed statistics or high-resolution images [16].As a result of these combined factors, DVE applicationsplace considerable and unpredictable pressure on both net-work and server-side data processing resources. Fortunately,the database and visualization communities have made con-siderable progress in reducing server-side [49, 86, 45, 14,81] and client-side [29, 45] data processing and renderinglatencies. However, network bottlenecks still persist , andcan cripple user-facing responsiveness even if server andclient-side overheads are eliminated. Addressing networkbottlenecks is becoming paramount with the continued shifttowards cloud-based DVE applications that must remain1 a r X i v : . [ c s . D B ] J u l esponsive across a wide variety of client network conditions(e.g., wireless, throttled).The primary approach to masking network delays in in-teractive applications is prefetching [23, 6, 39, 22, 78, 13, 3],where responses for predicted future requests are proactivelycached on the client before the user requests them. Prefetch-ing beneﬁts inherently depend on prediction accuracies, butsuﬃciently high accuracies have remained elusive, even forwell-studied classic applications such as web pages [66]. DVEapplications pose an even more challenging setting for severalreasons. Their bursty request patterns, when combined withdata-dense responses, can easily exceed the bandwidth ca-pacities of existing networks and cause persistent congestion.At the same time, the massive number of potential requestsmakes building a near-perfect predictor that can operate overlong time horizons infeasible—developing such oracles is anopen problem. Thus, prefetching for DVE applications iseither ineﬀective or wastes precious network resources which,in turn, can cause (detrimental) cascading slowdowns on lateruser requests.In this paper, we depart from traditional prefetching frame-works that hope to accurately predict a small number offuture requests to prefetch, towards a framework that contin-uously and aggressively hedges across a large set of potentialfuture requests. Such framework should also decouple requestburstiness from resource utilization so that the network doesnot get overwhelmed at unpredictable intervals, and insteadcan consistently employ all available network resources forsubsequent prefetching decisions.A trivial, but undesirable, way to meet these goals is tolimit the user’s ability to interact with the interface, therebyreducing the burstiness and scope of possible requests. In-stead, we leverage the fact that DVE applications are ap-proximation tolerant , meaning that response quality canbe dynamically tuned [47, 85, 27] to enable more hedgingwithin the available resources (at the expense of lower re-sponse quality). Of course, this introduces a fundamentaltradeoﬀ: the prefetching system can focus on low-qualityresponses for many requests to ensure immediate responses,or high-quality responses for a few requests at the risk ofmore cache misses and slow responses. Balancing this trade-oﬀ requires a joint optimization between response tuning andprefetching , which, to date, have only been studied inde-pendently. This involves a novel and challenging schedulingproblem, as the optimization space needs to consider thelikelihood of the user’s future access patterns over a largenumber of possible requests, applications preferences betweenresponse quality levels and responsiveness, and limited re-source conditions. At the same time, the scheduler must runin real-time.We present Khameleon , a novel prefetching frameworkfor DVE applications that are bottlenecked by request la-tency and network transfer.

Khameleon dynamically tradesoﬀ response quality for low latency by leveraging two mech-anisms that collectively overcome the aforementioned jointoptimization challenges.First, we leverage progressive encoding to enable ﬁne-grained scheduling. Each response is encoded as an ordered Progressive encoding is distinct from progressive com-putation, such as online aggregation [34], which returns full,yet approximate, responses by processing a sample of thedatabase. § Khameleon is a framework that is compatible with exist-ing DVE applications.

Khameleon transparently managesthe request-oriented communication between the DVE clientand server, and shields developers from the challenges of thejoint optimization problem. Developers can instead focuson high-level policies, such as determining their preferencebetween latency and quality, and developing application-speciﬁc progressive encoding schemes and prediction models. § Khameleon with an existing DVE application.We evaluate

Khameleon using the two representativeDVE applications in Figure 1. Our experiments considera broad range of network and client resource conditions,and use real user-generated interaction traces. Across theseconditions, we ﬁnd that

Khameleon is able to avoid net-work congestion and degraded user-facing responsivenessthat arises from using indiscriminate prefetching (even ifthat prefetching uses a 100% accurate predictor). For in-stance, for the image exploration application,

Khameleon (using a simple predictor [77]) reduces response latencies byup to 3 orders of magnitude ( >

10s to ≈ Khameleon ’s progressive encodingimproves response latencies on average by 4 × and improvesresponse quality by up to 1.6 × . Our experimental setup alsoreveals that porting existing applications to use Khameleon entails minimal burden. For example, modifying Falcon touse

Khameleon as the communication and prefetching layerrequired fewer than 100 lines of code to issue requests to the

Khameleon client library and use a formal predictor.To summarize, our contributions include 1) the design andimplementation of

Khameleon , a framework that combinesreal-time prediction, progressive encoding, and server-sidescheduling to address the diverse challenges of interactiveDVE applications, 2) the formalization of the server-sidescheduling optimization problem that explicitly balances thequality and likelihood of a request, along with a fast greedyheuristic implementation, 3) and an extensive evaluationusing two interactive applications that highlight the beneﬁtsof the

Khameleon design.2 . DVE APPLICATIONS

Cloud-based DVE applications are information dense ,in that they render hundreds or thousands of data items(e.g., records, images) that users can directly interact with.The request patterns from these interactions are bursty with negligible think time between requests. These characteris-tics lead to a rate of requests that stresses the network, oftenexceeding the available capacity and resulting in congestion.In order to address the potential network bottlenecks,

Khameleon leverages two key properties of DVE applica-tions. First, interactions are preemptive : since responsescan arrive out of order (e.g., due to network or server delays),the client renders the data for the most recent request and(silently) drops responses from older requests to avoid con-fusing the user [82, 83]. Second, they are approximationtolerant : it is preferable to quickly render a low-qualityresponse (e.g., fewer points [64] or coarser bins [45]) thanto wait for a full quality response. As concrete examples,consider the following two DVE applications which exhibitthese properties; we use both in our evaluation ( § Large-scale image exploration.

Scientists and usersincreasingly wish to interactively explore massive imagedatasets of e.g., art [15], satellite data [41], cellular mi-croscopy [11], and maps [32]. Along these lines, we developedan image gallery DVE application (Figure 1a). The user’smouse hovers over a dense array of 10, 000 image thumbnailson the left (akin to a zoomed-out view) to view the fullresolution 1.3 – 2Mb image of the hovered-over thumbnail onthe right (akin to a zoomed-in tile).We consider this an exemplar and diﬃcult DVE applicationto evaluate, because it has a high request rate, large responsesizes, and with 10K thumbnails, it is diﬃcult to build anaccurate predictor for. For instance, from the user tracesused in our experiments, clients request up to 32 imagesper second (32–64 megabytes(MB)/s), not including anyprefetching requests. In addition, this application imposesfewer interaction restrictions than existing exploration ap-plications that are plagued by prefetching ineﬃciencies. Forinstance, applications like Google Maps only let users pan toadjacent tiles and incrementally zoom; this both simpliﬁesprediction and limits the rate of images that the user canrequest at a time. Interactive data visualizations.

Falcon [53] is a state-of-the-art interactive visualization application speciﬁcallyoptimized for prefetching (Figure 1b). As the user selectsand resizes range selections in any of the charts, the othernon-selected charts immediately update their statistics toreﬂect the conjunction of all of the selections. The challengeis that the space of possible combinations of selections isexponential in the number of charts, and is infeasible tofully precompute and send to the client up front. Yet, evenmovements across a single pixel trigger many requests toupdate the charts.In order to minimize interaction delays, the Falcon devel-opers [53] manually implemented prefetching to mask requestlatencies. They observed that the user can only interact withone chart at a time, and in the meantime, selections in theother charts are ﬁxed. When the user’s mouse moves ontochart A, Falcon sends SQL queries to a backend databaseto compute low dimensional data cube slices between chart For reference, streaming HD and 4K video typicallyrequires 5-20 Megabits (Mb)/s. A and each of the other charts to update. Once these slicesare constructed, user interactions in chart A are handledinstantaneously.Falcon’s predictor prefetches data slices when the user hov-ers over a chart, and it progressively encodes the data slicesas cumulative counts. However, these policies are hardcodedin a monolithic codebase, making it challenging to improvethe predictor (e.g., estimate the chart the mouse will interactwith, rather than wait for a hover event), response encod-ing (e.g., pixel-resolution and a coarse resolution ), or userpreferences (e.g., which attributes they favor). § Khameleon as the communication layer, and switched its database fromOmniSci to PostgreSQL.

3. KHAMELEON OVERVIEW

This section ﬁrst describes traditional prefetching archi-tectures and their limitations for DVE applications. It thenprovides a high-level design overview of

Khameleon , ex-plaining its individual components (we elaborate on eachin the following sections), how they collectively overcomethe limitations of prior architectures, and how existing DVEapplications can be seamlessly migrated to use

Khameleon . Figure 2(a) depicts the workﬂow of a common prefetchingarchitecture for an image exploration DVE application. Inthis application, a user interacts with a grid composed of 16image thumbnails such that mousing over an image enlargesits display size. As the user moves the mouse, the local cachemanager receives requests and immediately responds if thedata is cached, or forwards the request to the server. Inparallel, the gaussian distribution representing predictions ofthe mouse’s future location is updated based on the mouse’scurrent position (to improve prediction accuracy), and isused to pick a set of requests to prefetch; these requests areissued to the cache manager in the same way.In this example, the user mouse has moved quickly alongthe yellow path, triggering requests for images 16 and 11.Given the bursty nature of these requests, the correspond-ing responses are still being transmitted as the next user-generated request is issued (for image 7). Unfortunately,because the full response data for images 16 and 11 fullyutilize the available network bandwidth, the response for7 will have to contend with those old responses, delayingits arrival. To make matters worse, the client prefetchingmechanism will concurrently issue requests for the k mostlikely next requests (2 and 6 in this example).

Limitations.

The problem here is that the information thatenables the most accurate predictions (i.e., the mouse’s cur-rent position) is available exactly when the user issues burstsof requests that already congest the network. This has twoconsequences. First, this largely eliminates any prefetchingbeneﬁts, and highlights its drawbacks: accurate prefetch-ing requests are unlikely to return to the cache before theuser explicitly requests them (recall that DVE applicationsexperience low user think times), and inaccurate prefetch-ing requests add unnecessary network congestion that slowsdown explicit user-generated requests and future prefetch-ing. Second, it is diﬃcult to know what requests shouldbe prefetched during the times between user interactions3 oad imagesUtility functionPrediction Distribution6 2 716 11 Load imagesServer File SystemrequestsCachePredictor Schedulerreqsupcalls Progressive Encoder

211 76 62 31 5

PredictorManager Schedule File SystemDBEncoder … Senderevents Cache Manager (a) Traditional prefetching architecture (b) Khameleon architecture fullresponses ResponseblocksDB

InterfaceInterface

Figure 2: Comparing

Khameleon to a traditional prefetching architecture for an image exploration DVE application. Theinterface is a 4 × Khameleon separates the predictor from the cache,sends probability distributions instead of explicit requests, and uses a scheduler to determine the sequence of small requestblocks to send to the client.because the user, by deﬁnition, is not generating events; un-fortunately, prefetching everything is impractical given thehigh data footprint of DVE applications ( § Khameleon (Figure 2(b)) consists of client-side and server-side libraries that a cloud-based DVE application can importand use to manage data-intensive network communication.These components operate as follows to overcome the afore-mentioned limitations of traditional prefetching architectures.The client-side library serves to decouple prefetching re-quests from the (bursty) network utilization triggered explic-itly by the user. User-generated requests are not sent out onthe network, and instead are registered with the local

CacheManager . The Cache Manager waits until there is cacheddata to fulﬁll the request, and then makes an applicationupcall to update the interface with that data. This approachhelps absorb high user request rates. As this happens, clientevents (e.g., mouse movements) and requests are also passedto an application-provided

Predictor Manager that continu-ally updates a distribution of predicted future requests andsends a summary of that distribution (e.g., the parametersof a gaussian distribution) to the server.The server-side library uses intelligent push-based schedul-ing and progressive encoding of responses to make the most ofthe available network resources, i.e., balancing user-perceivedlatency and response utility while hedging across potentialfuture requests. The

Scheduler continually maintains a sched-ule of response blocks to push to the client; the set of blockscovers a wide range of explicit and anticipated requests, e.g.,images 11, 7, etc. in Figure 2(b). The speciﬁc sequence ofblocks depends on the predicted request probabilities receivedfrom the client, as well as an optional application-provided

Utility Function that quantiﬁes the “quality” of a responsebased on the number of preﬁx blocks available. Note that asingle block is a complete response, with additional blocksimproving “quality” according to the Utility Function. A sep-arate

Sender thread reads the schedule and retrieves blocksfrom backend systems. For example, the ﬁle system couldbe pre-loaded with the blocks for progressively encoded im-ages, or a database could dynamically execute queries andprogressively encode the results before returning the subsetof required blocks to the Sender. Finally, the server streamsthe sequence of response blocks to the client, which updatesits local cache accordingly. As we describe in § Khameleon architecture is agnostic to how the applicationclient interprets and decodes the blocks, as well as to thespeciﬁc backend system that is employed.

Predictor Manager.

This client-side component relies onan application-provided predictor to forecast future requests(as a probability distribution over the possible requests), andperiodically sends those predictions to the server. Predictorsmust satisfy two properties. First, at time t, the predictormust return a probability distribution P t (q | Δ ) over requestsq and future times t + Δ . Second, it must be Anytime ,so that the Predictor Manager can ask for distributions ofpredicted requests to send the server at any time duringsystem operation. It is also important that the predictor’sstate is eﬃcient to maintain, and that the distributions can becompactly represented for transmission to the server. Thesemechanisms enable the Predictor Manager to control policiesfor how often to make and send distributions.

Progressive Results and Caching.

Each request’s pro-gressively encoded response is modeled as an ordered list ofﬁxed size blocks; any preﬁx is suﬃcient to render (a possiblylower quality) result, and the full set of blocks renders thecomplete result. Smaller blocks can be padded if block sizesdiﬀer. Our client-side cache implementation uses a ring buﬀer(FIFO replacement policy) for its simplicity and determinism;in particular, this simpliﬁes the server-side scheduler’s abilityto track cached state at the client, since the FIFO policy canbe simulated without explicit coordination. During operation, the cache puts the i th block receivedfrom the server into slot i%C, where C is the cache size. Thecache responds to a request if there is ≥ § Utility Functions.

In practice, the ﬁrst few blocks ofa response are likely to contribute more than subsequent Other deterministic replacement policies are possible andincorporating them is left for future work.4 .000.250.500.751.00 0.00 0.25 0.50 0.75% Blocks U t ili t y ImageVis

Figure 3:

Utility function for image exploration application usingstructural similarity (red) and the visualization application whichuses the system default linear function (blue). blocks [64, 75, 46]. To quantify this, the application can op-tionally provide a monotonically increasing

Utility Function

U : [0, 1] (cid:55)→ [0, 1], which maps the percentage of data blocksfor a request to a utility score. A score of 0 means mostdissimilar, and 1 means identical to the full result, in expec-tation. By default,

Khameleon conservatively assumes alinear utility function. As an example, Figure 3 plots theutility curve for the image exploration application, which isbased on the average visual structural similarity measure [76]between the progressive result and the full image.

Scheduler and Backends.

The Scheduler decides thesequence of blocks to push to the client. It schedules inbatches of C blocks (the client cache size) because the client’sring buﬀer will overwrite itself once it is ﬁlled. A givenbatch (a schedule ) is chosen to maximize the user’s expectedutility with respect to the probability distribution over futurerequests. The separate

Sender thread reads the schedule tolearn about the order in which blocks should be retrievedfrom the backend and placed onto the network. The backendmay be a ﬁle system, a database engine, a connection pool, orany service that can process requests and return progressivelyencoded blocks. Note that, given a progressive encoder, anybackend can be retroﬁtted by encoding its results. We retroﬁtPostgreSQL for our visualization experiments.By default, we assume that retrieving blocks from thebackend incurs a predictable delay. In addition, we assumethat the backend is scalable , in that the delay does not in-crease considerably when more concurrent queries are issued(e.g., speculatively for prefetching). This is natural for pre-computed responses or backends such as a ﬁle system or keyvalue store. In cases where the backend can only scale to alimited number of requests,

Khameleon employs a heuristicto limit the amount of speculation in accordance with thesupported scalability ( § This subsection describes how a DVE application (imageexploration in this case) can be easily and incrementallyadapted to use

Khameleon . Recall that the applicationissues an image request when the user’s mouse hovers over athumbnail; the server retrieves the full-sized image from theﬁle system, and sends it back to the client.To use

Khameleon , the application should provide aprogressive encoding of its responses, a utility function, anda predictor. Since traditional requests and responses arespecial cases of

Khameleon ’s predictor and encoder, westart with generic defaults for these components. The genericencoder treats each image as a response with a single block,and the predictor treats each request as a point distribution.By specifying this, an immediate beneﬁt is that the schedulerwill use the point distributions to select the full requested image (as in the existing application), and use the remainingbandwidth to push random images for the client to cache.We now show how a developer Jude can improve eachcomponent for her application. A beneﬁt of

Khameleon ’smodular design is that the components can be improvedindependently.

Improve the Encoder:

Finer-grained blocks help improvethe scheduler’s ability to hedge across many requests givenﬁnite bandwidth resources. Since JPEG is a progressive en-coding, Jude replaces the naive encoder with a JPEG encoderand conﬁgures the system with the block size. Further, shecan adjust the JPEG encoding parameters to create ﬁner-grained block sizes, or switch to an alternative progressiveencoding altogether [60].

Improve the Utility Function:

By default,

Khameleon uses the linear utility function, where each block contributesthe same additional utility to the user. Jude computesthe structural similarity [76] for diﬀerent preﬁx sizes overa sample of images, and uses this to derive a new utilityfunction (e.g., Figure 3).

Improve the Predictor:

Jude now uses her applicationexpertise to incrementally improve the predictor. One direc-tion is to weigh the point distribution with a prior based onhistorical image access frequency. Alternatively, she coulddirectly estimate the user’s mouse position using a varietyof existing approaches [59, 79, 80, 31, 5]. She can assessthe beneﬁts of any modiﬁcations to the predictor based onits empirical accuracy over live or historical user traces, orhigher-level metrics such as cache hit rates and number ofblocks available for each request—

Khameleon reports both.In § Khameleon with <

100 LOC.

4. PREDICTOR MANAGER

The application-provided prediction model P t (q | Δ , e t ) usesinteraction events and/or requests e t up until the currenttime t in order to estimate the probability of request q at Δ time steps in the future. Of course, there exist a wide range ofprediction models that satisfy this deﬁnition, with the appro-priate one varying on a per-application basis. For example,button and click-based interfaces beneﬁt from Markov-basedmodels [33, 10, 19], whereas continuous interactions such asmouse- or hover-based applications beneﬁt from continuousestimation models [5, 77, 59, 79]. Regardless of the predic-tion model used, a commonality with respect to Khameleon is that the events e t (e.g., mouse movements, list of previ-ous user actions) are generated on the client, whereas thepredictions are used by the server-side scheduler.Given these properties, Khameleon provides a genericAPI for applications to register their desired predictors;

Khameleon is agnostic to the speciﬁc prediction modelbeing suggested. The API (described below) decomposes apredictor into client-side and server-side components, and

Khameleon ’s Predictor Manager handles the frequency ofcommunication between the two components. The mainrequirement is that the predictor is usable at any time to es-timate a probability distribution over possible requests at ar-bitrary time steps. We note that

Khameleon does not man-date a speciﬁc prediction accuracy. However,

Khameleon can report prediction accuracies, as well as application-levelperformance metrics resulting from those accuracies, based5n live and historical user traces; developers can then usethis feedback to continually improve their predictors.

Predictor decomposition.

Applications specify the pre-dictor P t as server and client components (correspondinglycolored): P t (q | Δ , e t ) = P ts (q | Δ , s t )P tc (s t | Δ , e t )The client component P tc collects user interaction eventsand requests e t and translates this information into a bytearray that represents the predictor state s t . s t may be themost recent request(s), model parameters, the most recentuser events, or simply the predicted probabilities themselves.The server uses s t as input to P ts in order to return futurerequest probabilities for the Khameleon scheduler’s jointoptimization between prefetching and response tuning.Importantly, this decomposition is highly ﬂexible and cansupport a variety of diﬀerent conﬁgurations for predictorcomponents. For example, a pre-trained Markov model [33,10, 19] may be instantiated on the server as P ts , and theclient may simply send each event to the server (s t = e t ).Alternatively, the Markov model could be placed on theclient as P tc , with the state sent being a list of the top k mostlikely requests, and the server component assuming that allnon-top k requests have probability of ≈ Devising A Custom Predictor.

We now walk throughthe design of a custom predictor for interfaces with staticlayouts, i.e., the two example DVE applications in Figure 1.These are the predictors that we use in our experiments in § l , can directlytranslate a distribution of mouse locations P ts (x, y | Δ , s t ) intoa distribution over requests:P t (q | Δ , e t ) = P l (q | Δ , x, y, l)P ts (x, y | Δ , s t )P tc (s t , l | Δ , e t )We model P ts (x, y | Δ , s t ) as a gaussian distribution repre-sented by the centroid and a 2 × Δ values(50, 150, 250, 500ms in our experiments) to predict over, andlinearly interpolate between these times. Thus, the state s t only consists of 6 ﬂoating point values for each Δ , which weestimate using a naive Kalman Filter [77] on the client, anddecode into a request distribution on the server.Although it may appear challenging to devise a custompredictor, we note that any cloud application that wishesto use prefetching will need to develop or adapt a predictor.Further, our results in § Khameleon is eﬀective in spite of the genericKalman Filter described above. Indeed, the fundamentalchallenge that

Khameleon solves is in determining how toexplicitly and robustly account for the predictions in its jointscheduling problem.

Client Cache (C blocks) w

Network P t+1 (q i |s t ) = Scheduler Server

Sender Client Backend

U=sched = [ , , ] Figure 4: Setting for

Khameleon ’s scheduling problem.

5. SCHEDULER

Khameleon ’s server-side scheduler takes as input a util-ity function U and a probability distribution over futurerequests, and allocates ﬁnite network bandwidth and clientcache resources across progressively encoded data blocks tomaximize the expected user utility. Ultimately, it balancescompeting objectives: ensuring high utility for high probabil-ity requests and hedging for lower probability requests (i.e.,sending some blocks for a low-quality response).Developing this scheduler is challenging for several rea-sons. First, the scheduler must keep track of previouslysent blocks and ensure that they are not evicted from theclient’s circular buﬀer cache by the time they are needed.Second, the scheduler needs to make decisions in real-timein order to not block data transmission, but still must adjustits scheduling decisions quickly when new predictions arrivefrom the client. This section presents the formal schedulingproblem description, an ILP-based solution, and a greedyapproximation.

Let time be discretized such that each time interval [t, t+1)is the time that it takes for the server to add one block ofa response onto the network. In this problem deﬁnition, weassume that each response is progressively encoded into N b equal-sized blocks.Let Q = q , . . . , q n be the set of all possible requests. InFigure 4, there are n = 16 possible image requests with ids1 to 16. The gaussian parameters estimated at time t arethe state s t . The scheduler has received predictor state s t ,which lets it estimate the probability P(q i | Δ , s t ) of q i beingissued at Δ time steps in the future. Let us assume that atthe start of scheduling, t = 0.The client cache can hold C blocks, and the network band-width is w blocks per time interval. The cache at time tcontains B ti blocks for q i . In the example, 2 refers to theﬁrst block in image 2. The cache holds the ﬁrst block ofimage 11 (B t11 = 1), and the ﬁrst two blocks of 7 (B t7 = 2).Thus, B t+1 = { B t+11 , . . . , B t+1n } is the allocation at the endof the interval [t, t + 1]. Problem 1 (Server-side Scheduling).

Find the bestnext allocation B t+1 that maximizes V(B t ) , given the cache B t and predictor state s t : V(B t+ Δ ) =max B t+ Δ +1 (cid:40)(cid:88) i U(B t+ Δ +1i )P(q i | Δ + 1, s t ) + γ V(B t+ Δ +1 ) (cid:41) (1) Our objective function V includes two terms (colored in for-mula and text). The ﬁrst term is the expected user utility6t the next timestep t + Δ + 1. It weighs the utility ofB t+ δ +1i blocks (using the utility function) for request i byits probability. The second term recursively computes thefuture beneﬁts. This enforces the dependency between timeintervals—it accounts for the long term and steers the sched-uler towards a global optimum. γ ∈ [0, 1] is a discount on thefuture. γ = 0 means we only care about the next timestep,and γ = 1 means we care about all timesteps equally.In Figure 4, the scheduler computes the best allocation forthe next three time steps as the requests 2, 7, 6. The clientcache’s deterministic replacement policy lets the sender pushthe appropriate block sequence 2 , 7 , then 6 . Equation 1 is intractable because it contains an inﬁniterecursion, as well as terms that are unknown at predictiontime t. However, due to the design of the client cache as acircular buﬀer, the cache will overwrite itself after every Cblocks. Thus, we approximate the solution by optimizingover a ﬁnite horizon of C blocks:

V(s t , B t ) = max B t+1 , ... ,B t+C C (cid:88) k=1 (cid:32) γ k–1 n (cid:88) i=1 U(B t+ki )P(q i , k) (cid:33) (2) This formulation is a Markov Decision Process [62], whereactions (chosen block) in each state (contents of the cache)receive a reward (user utility). We now describe an ILP-based scheduler, followed by a fast real-time approximation.In §

8, we discuss the relationship with reinforcement learningand future extensions.

Objective Function.

Equation 2 can be expressed as aninteger linear programming (ILP) problem. ILP problemsrequire a linear objective, but the utility function U couldbe arbitrarily concave. We linearize it by approximatingU with a step function ˜U, deﬁned such that ˜U(0) = 0 and˜U(b) = (cid:80) bi=1 g(i) where: g(i) = U (cid:18) iN b (cid:19) – U (cid:18) i – 1N b (cid:19) | i ∈ [1, N b ] This approximation has no impact on the ﬁnal result becauseU is already discrete due to discrete block sizes.Let U ti,j denote the expected utility gain of the j-th blockfor q i sent during time interval [t – 1, t], where t ∈ [1, C].Because this block is guaranteed to stay in the client cacheuntil timestep C, it will provide a constant amount of utilitygain through time interval [t, C]. Note that we dropped s t from P since, from the perspective of the scheduler, it is aﬁxed prediction. U ti,j = C (cid:88) k=t γ k–1 P(q i | k)g(j) We denote by f ti,j a binary variable that indicates if the j-thblock of q i is sent at time interval [t–1, t]. With this notation,we can transform the objective into a linear function: C (cid:88) k=1 (cid:32) γ k–1 n (cid:88) i=1 U(B ki )P(q i | k) (cid:33) = n (cid:88) i=1 N b (cid:88) j=1 C (cid:88) k=1 f ki,j U ki,j (3) Constraints.

Our ILP program must account for threeconstraints. The ring buﬀer’s limited capacity is implicitlyencoded in our choice of maximum time horizon C in theobjective. The ILP hard constraints ensure that (1) the network bandwidth is not exceeded, and that (2) each blockis only sent once: ∀ k (cid:88) i,j f ki,j ≤ l ∀ i, j (cid:88) k f ki,j ≤ Limitations.

The LP scheduler is very slow because theLP problem size, as well as the cost to compute the utilitygain matrix U ti,j , increases with the time horizon (cachesize), the interaction space (number of possible requests),and the granularity of the progressive encoding (number ofblocks). For instance, if the image application (10k possiblerequests) has a cache size of 5k blocks, and 10 blocks perrequest, the LP will contain 0.5 billion variables. Simplyconstructing the problem in the appropriate format for asolver is too expensive for a real-time system, and further,this cost is incurred for every C blocks to be sent. § A.1reports our micro-experiments, including comparisons withthe fast, greedy scheduler described next.

This subsection describes

Khameleon ’s fast greedy sched-uler (Listing 1). The main design consideration is that itcan rapidly make scheduling decisions as the client sendsdistributions at unpredictable rates, and without blocking (orbeing blocked by) the sender. We ﬁrst describe the schedulerdesign, and then discuss the interaction between the sched-uler and the sender. § A.2 describes the formal semantics of acorrect schedule, given a sequence of distributions sent fromthe client.

Our greedy scheduler uses a single-step horizon (ﬁrst termin Equation (1)). It computes the expected utility gain forgiving one block to each request (accounting for the numberof blocks that have already been scheduled), and samples arequest q i proportional to its utility gain. The next block isallocated to q i . It schedules batches of C blocks to fully ﬁllthe client cache, then it resets its state and repeats. State.

The algorithm keeps three primary pieces of statethat vary over time. The ﬁrst is the number of blocks assignedto each request B = [b , . . . , b n ]. This is used to estimatethe utility gain for the next scheduled block, and is resetafter a full schedule (C blocks) have been scheduled. Thesecond state materializes g() as an array. The third stateprecomputes the matrix P i,t = (cid:82) C–1k=t

P(q i | k) that representsthe probability that the user will request q i over the restof the batch. This is estimated as a Reimann sum via theTrapezoidal Rule (lines 8-11).Scheduling is now a small number of vectorized opera-tions. The expected utility gain at timestep t is the dotproduct P t • g[B], where P t = [ P , . . . , P n,t ] and g[B] =[g(b ), . . . , g(b n )] are vectorized lookups (line 16). Scheduler Algorithm.

The client is allowed to send newprobability distributions at arbitrary times. If a new distribu-tion arrives, we wish to use its more accurate estimates, butalso do not wish to waste the resources used for prior schedul-ing work. Further, the scheduler should progress irrespectiveof the rate at which the client sends distributions.To make progress, each iteration schedules up to a batchof bs blocks at a time (default of 100). After each batch,it checks whether a new distribution has arrived, and if so,recomputes the P i,t matrix (lines 6-11). Since t blocks may7 C, g, n // cache size, utility array, bs = min (C,bs) // blocks to schedule per iter B = [0 ,.. ,0] // t = 0 // while True : if received_new_distribution () dist = get_new_distribution () for i ∈ [1 , n] P i,C = dist (i, C) for t’ ∈ [C -1 , t] P i,t (cid:48) = (cid:48) +1 P i,t (cid:48) +1 + tt (cid:48) +1 dist (i,t’) S = [ ] // generated batch of blocks while t < C -1 and |S| < bs t += 1 u = P t • g[B]17 q = sample requests proportional to u S. append (q) B[q] += 1 send S to sender if t == C // reset after a full schedule t,B = 0, [0 ,.. ,0] Listing 1: Pseudocode of the greedy scheduler algorithm.already have been chosen for the current schedule, we onlyneed to materialize the time slots for the rest of the schedule( P i,t (cid:48) where t (cid:48) ∈ [t + 1, C – 1]). After sending the scheduledblocks to the sender, it resets t and B if a full schedule hasbeen allocated (lines 21-23). Optimizations.

We employ several straightforward opti-mizations beyond the pseudocode in Listing 1. The main oneavoids materializing the full P i,t matrix when the numberof requests is very high. Most requests will have the sameprobability of ≈ × reduction). Using this concept to furtherstratify the probability distribution may further reduce run-time, but we ﬁnd that this optimization is suﬃcient in ourexperiments. Our current prototype assumes that the client and serverclocks are synchronized to ensure that servers can ensuresuﬃcient conﬁdence in predictions, and that the Senderthread can be preempted. When a new prediction arrivesat the scheduler, it identiﬁes the position i of the sendingthread in the current batch, and reruns the scheduler fromi to C. The blocks for 0 to i do not change since they havealready been sent. This is analogous to setting t = i.The scheduler sends this partial schedule to the sendingthread, which in the meantime, may have sent an additional hblocks. Thus, it simply starts sending using the partial sched-ule at i + h. Concurrently, the scheduler begins schedulingthe next batch using the updated predictions. Note that thescheduler may be modiﬁed to match a diﬀerent client-cachereplacement strategy; we leave this to future work.

Bandwidth Estimation.

The sender thread and schedulerrequire knowledge of the available network bandwidth, andaim to run at a rate that will not cause congestion on thenetwork.

Khameleon is agnostic to the speciﬁc end-hostbandwidth estimation (and prediction) technique that isused to compute this information [85, 84, 38]. Further, notethat

Khameleon can alternatively be conﬁgured to use auser-speciﬁed bandwidth cap (e.g., to comply with limiteddata plans). In our implementation, the

Khameleon clientlibrary periodically sends its data receive rate to the server;the server uses the harmonic mean of the past 5 rates as itsbandwidth estimate for the upcoming timestep, and aimsto saturate the link. This approach capitalizes on recentobservations that bandwidth can be accurately estimatedover short time scales, particularly in backlogged settingsthat avoid transport layer ineﬃciencies (e.g., TCP slow-start-restart) [48] that mask the true available bandwidth at theapplication layer [85].

Backend Scalability.

This work assumes that backendquery execution is scalable, i.e., data stores can executemany concurrent speculative requests without performancedegradation. This is often true for key-value-oriented back-ends [81] or cloud services, but may not hold for other datastores. For instance, databases such as PostgreSQL have aconcurrency limit, after which per-query performance suf-fers. Thus, it is crucial for the scheduler to avoid issuing toomany speculative requests such that the backend becomes abottleneck in response latency.Although formally addressing this interaction challengebetween

Khameleon and data processing backends is beyondthe scope of this paper, we use a simple heuristic to avoidtriggering congestion in the backend. We benchmark thebackend oﬄine to measure the number of concurrent requestsC that it can process scalably. Let n be the number ofrequests the backend is currently processing; we post-processschedules to ensure that they do not refer to blocks from morethan C – n distinct requests. In essence, we treat backendrequest limits in the same way as network constraints.

6. EXPERIMENTS

We evaluate

Khameleon on the DVE applications de-scribed in §

2. Our experiments use real user interactiontraces and a wide range of network and client cache con-ditions. We compare against idealized classic prefetchingand response tuning approaches, highlight the beneﬁts pro-vided by each of

Khameleon ’s features, and investigate

Khameleon ’s sensitivity to environmental and system pa-rameters. The results comparing

Khameleon with thebaselines are consistent across the applications. Thus, weprimarily report results for the image application, and useFalcon to illustrate how

Khameleon goes well beyond thestate-of-the-art hand-optimized implementation ( § Our prototype uses a Typescript client library and aRust server library. The client periodically sends predic-tions to the server every 150ms. Each prediction consistsof a distribution over the possible requests at timesteps Δ = {

50, 150, 250, 500ms } from the time that the predictionis made; the 500ms values follow a uniform distribution. The8 mage Exp App Vis App (Falcon) CD F Figure 5: CDF of think times (time between consecutiverequests) over interaction traces for the image and vis appli-cations ( § §

2. For the image application, we collectedmouse-level interaction traces from 14 graduate students thatfreely used a version of the application that was conﬁguredwith no response latency. Each trace is 3 minutes long, with20ms average think time. For Falcon, we used the 70 tracesfrom [7]. The interface used to collect these traces diﬀersfrom the interface in the Falcon paper [53] by one chart (a barchart instead of the heat map in [53]). Thus we translatedthe interactions over the bar chart to generate semanticallyequivalent requests consistent with [53]. In this way, the per-formance numbers are comparable with [53]. We ﬁnd thatincreasing the number and length of traces doesn’t aﬀect ourresults; Figure 5 reports the think-time distributions.

Performance metrics:

Khameleon balances response la-tency and quality for preemptive interactions. However, dueto the bursty nature of interactions, some requests (and theirresponse data) may be preempted when later requests re-ceive responses sooner. Thus, we report the percentage ofpreempted requests. For the non-preempted requests, wemeasure the cache hit rate as the requests that have blocksin the cache at the time of the request, the response latencyas the time from when a request is registered with the cacheto the upcall when one or more blocks are cached, and theresponse utility at that time. We will also evaluate howquickly the utility is expected to converge to 1 (all blocks).We use the utility curves in Figure 3. The image applica-tion’s utility function is described in § Environment parameters:

Our experiments consider awide range of network and client-side resource scenarios.We ﬁrst use netem [1] to consider ﬁxed bandwidth valuesbetween 1.5–15MB/s (default 5.625MB/s) and request la-tencies between 20–400ms (default 100ms); note that becausewe precompute all responses, request latency is meant toinclude both network latency (between 5–100ms) and sim-ulated backend processing costs (15–300ms). We vary theclient’s cache size between 10–100MB (default 50MB). Wealso use the Mahimahi network emulator [57] to emulatereal Verizon and AT&T LTE cellular links; in these exper-iments, the minimum network round trip time was set to100ms [58]. We simulate varying think time between requestsfrom 10–200ms, which is favorable to the baseline approaches We report the bandwidth as

MB/s instead of

Mb/s to usethe same units as block sizes. ● ●●● ●●● ●●● ●● ● ●●● ●●● ●●● ●● ● ●●● ●●● ●●● ●●

Cache: 010MB Cache: 050MB Cache: 100MB % C a c he H i t s % P r ee m p t edLog La t en cy , ( m s ) A v g U t ili t y ● Khameleon ACC − − − − − − Figure 6: Idealized prefetching baselines and

Khameleon across varying cache and network bandwidths (x-axis). Paneheaders list the metric for each y axis. Latency charts in allﬁgures render a black dashed line at 100ms.described below. Figure 5 shows CDFs of think times in ouruser traces.

Performance baselines:

Baseline is a standard request-response application with no prefetching.

Progressive mimics

Baseline , but only retrieves the ﬁrst block of anyresponse—this is intended to reduce network congestion butdoes not use prefetching to mask request latency.Prefetching techniques primarily focus on prediction accu-racy and the number of parallel requests to make. Modernpredictors exhibit ≤

70% accuracy [6]. To create strong base-lines (

ACC-- ), we use a perfect predictor thatknows the next hor requests with acc accuracy per request.After each user-initiated request, the prefetcher issues up to hor prefetching requests; to avoid triggering network con-gestion, it does not prefetch if the number of outstandingrequests will exceed a bandwidth-determined threshold. Forexample, after the i th user request, ACC-.8-2 will predict thei + 1 th and i + 2 th requests, and each will have 80% chance ofbeing correct (i.e., matching the actual request in the trace).We use ACC-0.8-1 , ACC-1-1 , and

ACC-1-5 (following [6]). Allbaselines use an LRU cache.We also evaluate an

Oracle version of

Khameleon wherethe predictor knows the exact position of the mouse after Δ milliseconds (by examining the trace). We ﬁrst compare

Khameleon with the aforementionedbaselines including no prefetching, and

ACC-0.8-1 , ACC-1-1 ,and

ACC-1-5 . Recall that these are upper bounds for existingprefetching approaches—typical predictors have accuraciesof < Varying Bandwidth and Cache Resources:

We faith-fully replayed the user traces, and varied the cache size(10–100MB) and bandwidth resources (1.5–15MB/s), whilekeeping request latency ﬁxed to 100 ms. The top two rowsin Figure 6 report the percentage of requests for which9 ache:010MB Cache:050MB Cache:100MB U t ili t y − − Figure 7: Response time vs utility for the prefetching base-lines and

Khameleon . Size denotes bandwidth; upper leftis better. Black dotted line shows 100ms latency threshold.one or more blocks are present in the cache at the timeof request (i.e., % Cache Hits), and the percentage of pre-empted requests.

Khameleon increases the cache hit rate by23.38 – 256.73 × above Baseline , and by 1.11 – 16.12 × abovethe idealized prefetching baselines ( ACC-*-* ). Khameleon reduces the number of preempted requests by 3 × in lowbandwidth settings, and has slightly higher preemption ratethan ACC-*-* at higher bandwidths because its high cachehit rate causes more out-of-order responses. The

ACC-*-* baselines have lower cache hits because think times are lowerthan request latency, thus the user has moved on by the timethe prefetched data arrives.The bottom two rows plot the utility and user-perceivedresponse latency for requests that are not preempted . Wesee that the baselines consistently prioritize full responses—their utilities are always 1 at the expense of very long re-sponse latencies (note latencies are log scale). In contrast,

Khameleon gracefully tunes the utility based on resourceavailability—all the while maintaining consistently low av-erage response latencies that do not exceed 14ms acrossthe bandwidth and cache size conﬁgurations.

On aver-age, across diﬀerent cache resources and bandwidthlimits, Khameleon has up to × better cache hitrates than ACC-*-* , resulting in – × lowerresponse times. To better illustrate the tradeoﬀ between resources, respon-siveness, and utility, Figure 7 compares average responselatency (across all requests) and the response utility, forevery condition (shape, color), bandwidth (size), and cachesize; upper left means faster response times and higher utility.Across all conditions, increasing the bandwidth improves theresponse times. However, the baselines remain at perfectutility and have high latencies. In contrast,

Khameleon always has < Request latency:

We now ﬁx network bandwidth (15MB/s)and cache size (50MB), and vary request latency (20–400ms).Recall that request latency includes both network and serverprocessing delays. Figure 8 shows that

Khameleon consis-tently achieves higher cache hit rates than the prefetchingbaselines. As request latencies grow,

Khameleon degradesresponse utility to ensure that response latencies remain low(on average 11ms) and have on average 3 × higher preemptedrequests than baselines because of the out-of-order responsesthat results from higher cache hit rate. In contrast, the ●● ●● ●● ●● ●● ●● ●● ●● Log Latency, (ms) Avg Utility% Cache Hits % Preempted

20 50 100 400 20 50 100 400510150.70.80.91.00255075123 Request Latency (ms, log) ● Khameleon ACC − − − − Figure 8:

Khameleon vs prefetch baselines across varyingrequest latencies; request latency includes both network andserver processing delays.alternatives pursue perfect utilities at the detriment of re-sponsiveness. When the request latency is 400ms, Khameleonperforms 79 × faster than Baseline, and 37 × than ACC-*-* .The baselines become highly congested as the request latencyincreases.

Think time:

So far, we have faithfully replayed the usertraces. Now we synthetically vary the think times in thetraces between 10–200ms to assess its eﬀect. We ﬁx re-quest latency to 100ms, and use three resource settings: low(bandwidth=1.5MB/s, cache=10MB), medium (5.625MB/s,50MB), and high (15MB/s, 100MB).Figure 9 indeed shows that high think times improve allprefetching methods by reducing congestion and giving moretime to prefetch. This is most notable in the high resourcesetting, where the

Baseline response latency converges tothe cost of the network latency plus the network transmissiontime.

ACC-1-* has high response latency when the think timeis short due to congestion, but the cache rate increases to75 – 100% with high think time and high resources. Withlow resources and low think times,

Khameleon achieveslow latency by hedging, as shown by the low utility values.Despite this, the next experiment shows that

Khameleon converges to full utility faster than the baselines. With moreresources,

Khameleon shifts to prioritize and improve utility.We ﬁnd that

Khameleon is close to

Oracle , except in highresource settings, where perfect prediction can better use theextra resources and further reduces latency by 2 × . Khameleon maintains near-instant response laten-cies, and uses the additional think time to increasethe response utility.

This highlights

Khameleon ’s ef-ﬁcacy for DVE applications with low think times relativeto request latency, i.e., where there is not enough time toprefetch between requests, even with perfect prediction.

Convergence:

Although trading utility for responsivenessis important, the response should quickly converge to the fullutility when the user pauses on a request. We now pause auser trace at a random time, and track the upcalls until theutility reaches 1. We use the high, medium, and low resourcesettings described above. Figure 10 reports the average andstandard deviation of the utility after waiting an elapsed10 ● ●● ●● ● ●● ●● ● ●● ●● ● ●● ● ● ● ●● ●● ● ●● ●● ● ●● ●● ● ●● ● ● ● ●● ●● ● ●● ●● ● ●● ●● ● ●● ●

LowResource MedResource HighResource A v g U t ili t y % C a c he H i t s Log La t en cy , ( m s ) % P r ee m p t ed

10 50 100 200 10 50 100 200 10 50 100 2000.60.81.002550751000123450255075 Think Time (ms) ● Khameleon: Kalman Khameleon: Oracle ACC − − − − Figure 9: Varying think time between consecutive requests.Comparing

Khameleon vs ACC using prefect and kalmanﬁlter predictors for 1 and 5 request horizons.

LowResource MedResource HighResource

100 1000 100 1000 100 10000.250.500.751.00 Elapsed time since request (ms, log) U t ili t y KhameleonACC−1−1ACC−1−5Baseline

Figure 10:

Convergence rate of utility: average utility (y-axis)over time after the user stops on a request. time after pause. Khameleon consistently converges to autility of 1 faster than all of the baselines, in expectation.This is explained by the additional congestion incurred dueto the high rate of requests issued by the two baselines.We expect that better application-speciﬁc predictors [6] cangreatly improve convergence for

Khameleon . We now perform an ablation study, and vary system con-ﬁgurations to better understand

Khameleon . Ablation Study.

To show the importance of jointly opti-mizing resource-aware prefetching and progressive responseencoding, we perform an ablation study. Starting with anon-prefetching

Baseline , we add the kalman ﬁlter and jointscheduler but without progressive encoding (

Predictor ),and we add progressive encoding but without prefetching toshow the beneﬁts of cache ampliﬁcation (

Progressive ). Forreference, we compare with

ACC-1-5 . We use a bandwidthof 15MB/s, cache size of 50MB, and vary request latencies.Figure 11 shows that as the request latency increases,the cache hit rate for all approaches decreases (a negligi-ble decrease for

Khameleon ). Predictor improves over

Baseline because the joint scheduler pushes predicted re-quests proactively (thus increasing the cache hit rate) with-out increasing network congestion.

Progressive improvesover

Baseline by reducing the network transmission time The baselines always have utility of 0 or 1, so we onlyreport the average. ●● ●● ●● ●● ●● ●● ●● ●●

Log Latency, (ms) Avg Utility% Cache Hits % Preempted

20 50 100 400 20 50 100 40002040600.60.81.00255075123 Request Latency (ms, log) ● Khameleon ACC − − Figure 11: Results of ablation study. ● ●●● ●● ● ●●● ●●

Log Latency, (ms) Avg Utility% Cache Hits % Preempted

Khameleon: Kalman Khameleon: Oracle Khameleon: Uniform ACC − − Figure 12: Varying

Khameleon predictors.and alleviating congestion, yet its utility is also the lowest.The combination of the two optimizations are needed in

Khameleon for higher utility, consistently < > Sensitivity to Predictors.

Figure 12 assesses the im-pact of the predictor by comparing the

Uniform predictor,

Kalman , and the

Oracle predictor as the upper bound.We ﬁx request latency to 100ms, and include

ACC-1-5 and

Baseline as references. At low bandwidth, simply using the

Khameleon framework already improves latency comparedto

ACC-1-5 ; and

Kalman further improves on top of

Uni-form and is close to

Oracle . As bandwidth increases, amore accurate predictor better uses the resources to pushmore blocks for more likely requests. Thus,

Oracle furtherreduces response times by 1.7 – 5.7 × compared to Kalman . System Parameters and Bandwidth Overheads.

Weevaluated the frequency that the prediction distributionsare sent to the server, and ﬁnd that

Khameleon is robustto frequencies between 50 – 350ms, but deteriorates whenfrequencies are lower. We also measured the percentage ofblocks that were pushed to the client but unused by the11 lllllll llllllll % Cache Hits Log Latency, (ms) l Khameleon ACC−1−5

Figure 13: Comparing

Khameleon with baselines on time-varying cellular networks.application (overpushed blocks): we ﬁnd that

Khameleon overpushes 50–75% of the blocks, as compared to 35–45% for

ACC-1-5 . Since the user can limit the amount of bandwidthallocated to prefetching, we believe these rates are acceptablegiven the orders of magnitude lower latency. More detailsare in § B Real Network Traces.

On the real Verizon LTE, andAT&T LTE cellular network traces, with a ﬁxed 100msrequest latency and 50M cache size, Figure 13 shows that

Khameleon considerably outperforms

ACC-1-5 . The cachehit rate is over 10 × higher on AT&T, and the latency islower by 348.36 – 430.12 × . We now adopt Falcon [53] to

Khameleon , and show thatthe ability to easily change the predictor and introduceprogressive encoding lets

Khameleon further improve overthe already–optimized Falcon.

Porting Falcon:

We modiﬁed the Typescript client toregister requests to the

Khameleon client library. Originally,when the user hovers over a chart, Falcon issues 5 separateSQL queries to the backend database to generate a dataslice for each of the other ﬁve charts (we call this group ofqueries a single request ). We simulate this with a predictorthat assigns a probability of 1 to the currently hovered uponview, and 0 to all others. Similarly, when the schedulerallocates one block for a given request, the sender issues 5queries to the query backend (PostgreSQL database), andprogressively encodes the combined query results into blocks.In contrast to the image application, the backend only scalesto 15 concurrent queries before query performance degradesconsiderably. Thus, prefetching even 3 requests can issueenough queries to saturate the backend.Adapting the client required ≈

50 LOC—mostly decodingblocks into Falcon’s expected format. The code to encodequery results into blocks on the server required ≈

60 LOC.

Experiment Setup:

We create two databases using sub-sets of the ﬂights dataset from Falcon [53];

Small has 1Mrecords with query latencies of ≈ Big has 7Mrecords with latencies of 1.5–2.5s. We veriﬁed that the portedversion performed the same as the original Falcon, so wereport metrics based on varying the ported version.

Predictor and Progressive Encoding:

We change thepredictor from Falcon’s “on-hover” (dashed lines) to thekalman ﬁlter (solid lines) used in earlier experiments. The x-axis varies the number of blocks that each request is encodedinto (each block has fewer result records). The red linesin Figure 14 (PostgresSQL) show that

Kalman improvesover

OnHover , delivering 1.4 × more cache hits, 5 × lowerlatency on average, lower preemption rate, and higher utility across the two datasets, particularly as the number of blocksincreases. llll llll llllllll llll llllllll llll llllllll llll llll llll llll llllllll llll llllllll llll llllllll llll llll Big DB Small DB A v g U t ili t y % C a c he H i t s La t en cy ( m s ) % P r ee m p t ed l l PostgreSQL ScalableSQL Kalman Onhover

Figure 14: Ported Falcon system on

Big (7M) and

Small (1M) datasets with varying number of blocks/request (x-axis), and predictors (line) using Postgres (red line) and asimulated scalable database backend (blue line).

Scalable Backend:

We now simulate a scalable database(blue lines). We ﬁrst precompute and log each query’s execu-tion times when running in isolation. The backend answersqueries from a cache and simulates the latency. Comparedto the PostgreSQL backend,

Kalman response latencies im-proves on average by 2 × , and OnHover by 1.2 × . Kalman still outperforms

OnHover with higher utility because ithedges more aggressively without congesting the backend.

7. RELATED WORK

DVE application optimizations.

Many existing approachesreduce DVE application response latencies by addressingserver-side data processing and client-side rendering costs.Query processing in backend datastores can be completed intens of milliseconds using a combination of precomputation(of datacubes [30], samples [21], indexes [24]), hardware [49,29] or vectorized [86] acceleration, and parallelization [81].Client-side computation and rendering delays can be simi-larly reduced using techniques such as datacubes [43] andGPU rendering and processing acceleration [29, 45, 50].

Khameleon is complementary to the above approachesand focuses on reducing the network bottlenecks (not client-or server-side computation delays) in DVE applications. In-deed,

Khameleon could be used as the communication layerto progressively encode and push optimized data structuresand query results based on anticipated user interactions. Ourcurrent implementation makes the simplifying assumptionthat data processing and progressive encoding incur a ﬁxedcost; leveraging the above query processing and response12ncoding optimizations will require incorporating data pro-cessing costs into the scheduler, a promising future direction.

Caching and Prefetching.

Interactive data explorationhas studied various caching architectures [72], as well asprefetching techniques to pre-populate the caches [13, 23, 3,6]. ATLAS [13] extrapolates scrollbar movements to prefetchsubsequent requests; ForeCache [6] extends this to prefetch-ing tiles of array data by forecasting likely future queriesbased on past user actions and statistical features of the un-derlying data; These approaches are crafted for their speciﬁcvisualization interface, and leverage restrictions of the user’sallowed interactions (often to linear slider actions or panningto nearby tiles) that help improve the prediction accuracy.Falcon [53] prefetches when a user hovers over a chart, anduses the time between hovering and interacting with the chartto prefetch datacube structures. Database query prefetchingtypically relies on Markov models that update when a newquery is issued [65, 69, 39, 12], which assumes longer thinktimes between queries, while web page requests [22, 54, 78]use mouse predictors similar to the Kalman Filter used inthe experiments.Though these techniques are able to perform backendquery computation early, they do not incorporate serverpush mechanisms or progressive response encoding, limitingtheir impact on alleviating network bottlenecks.

Khameleon borrows similar prediction information, but replaces explicitrequests and responses with probability distributions and aﬁne-grained scheduler for push-based streaming that accountsfor request probabilities and response quality.

Progressive Encoding.

Progressive encoding ensures thata small preﬁx of the overall response data is suﬃcient to ren-der an approximate result, while scanning additional blocksimproves result quality (ultimately converging to the fully ac-curate version) [35]. This has been applied to a wide range ofdata, including images [68, 71], layered encodings [37, 70, 28,17], visualization data [6], and even web page resources [56,55, 67].

Khameleon lets applications provide progressivelyencoded responses [68, 71, 6], which enables the scheduler’sjoint optimization to dynamically trade oﬀ response qualityfor low latency in DVE applications.

Progressive Computation.

Online aggregation [34, 2, 42,4, 64] and progressive visualization [25, 52, 26, 73, 61] seek toquickly return approximate results whose estimates improveover time, and could be used as backends in

Khameleon .DICE [40] also uses speculation to accelerate visualizationexploration. DICE bounds the query space to faceted datacube navigation, speculatively executes approximate queriesfor neighboring facets across a sharded database, and allo-cates sampling rates to the queries based on the expectedaccuracy gains.These progressive computation techniques are fundamen-tally diﬀerent than the progressive encoding techniques con-sidered by

Khameleon . In progressive computation, eachimprovement is a separate result set that requires sendingmore data and using more network resources. The progres-sive encoding with

Khameleon could be used to encode aresult set. Thus, although both techniques achieve a simi-lar eﬀect of progressively enhancing the rendered data, themechanisms are diﬀerent and complementary.

8. DISCUSSION AND CONCLUSION

Khameleon is a dynamic prefetching framework for datavisualization and exploration (DVE) applications that areapproximation tolerant. Rather than focusing solely on pre-dicting requests to prefetch or adapting response quality toavailable resources,

Khameleon uses a server-side schedulerto jointly optimize across these techniques. Responses areprogressively encoded into blocks, and proactively streamedto the client cache based on request likelihoods.

Khameleon consistently achieves sub-30ms response timeseven when requests take 400ms, and out-performs existingprefetching techniques (often by OOM). It gracefully usesresources to improve quality. To best leverage

Khameleon ,each component in the system (the backend scalability, net-work bandwidth, degree of speculative prefetching) shouldbe matched to produce and consume data at the same rates.

Learning Improved Policies.

This work used unsophisti-cated prediction models and scheduling policies, rather thansophisticated models or policies, to argue the eﬀectiveness ofa continuous push framework. As expected, we also found aconsiderable gap from an optimal predictor; we expect betterscheduling policies as well. One extension is to adapt a Re-inforcement Learning framework to improve the scheduler’spolicy. For instance, we could log explicit reward signalsin the client, and use Q-learning to better estimate futurerewards. We could also unroll beyond a single step, and usepolicy gradient methods [74] to learn a higher quality policyfunction that may account for deployment idiosyncrasies.The challenge is to balance more sophistication with theneed to schedule the next block in real-time (microseconds).

Acknowledgements:

Thanks to Dan Rubensein, AdamElmachtoub for advice on the ILP formulation; ThibaultSellam, Mengyang Liu on early system versions; NSF IIS1845638, 1564049, 1527765, and CNS-1943621.

9. REFERENCES [1] Netem - network emulator. https://man7.org/linux/man-pages/man8/tc-netem.8.html ,2011.[2] S. Agarwal, A. Panda, et al. Blink and It’s Done:Interactive Queries on Very Large Data. In

VLDB ,2012.[3] Z. Ahmed and C. Weaver. An adaptive parameterspace-ﬁlling algorithm for highly interactive clusterexploration. In

Visual Analytics Science andTechnology (VAST), 2012 IEEE Conference on , pages13–22. IEEE, 2012.[4] D. Alabi and E. Wu. Pfunk-h: Approximate queryprocessing using perceptual models. In

HILDA@SIGMOD , 2016.[5] G. A. Aydemir, P. M. Langdon, and S. Godsill. Usertarget intention recognition from cursor position usingkalman ﬁlter. In

Conf. on Universal Access inHuman-Computer Interaction , pages 419–426, 2013.[6] L. Battle, R. Chang, and M. Stonebraker. Dynamicprefetching of data tiles for interactive visualization. In

SIGMOD , 2016.[7] L. Battle, P. Eichmann, M. Angelini, T. Catarci,G. Santucci, Y. Zheng, C. Binnig, J.-D. Fekete, andD. Moritz. Database benchmarking for supportingreal-time interactive querying of large data. In13

IGMOD20-International Conference on Managementof Data , volume 17. ACM, 2020.[8] L. Battle and J. Heer. Characterizing exploratoryvisual analysis: A literature review and evaluation ofanalytic provenance in tableau.

Comput. Graph. Forum ,38:145–159, 2019.[9] L. Battle, M. Stonebraker, and R. Chang. Dynamicreduction of query result sets for interactivevisualizaton. In

Big Data, 2013 IEEE InternationalConference on , pages 1–8. IEEE, 2013.[10] R. Begleiter, R. El-Yaniv, and G. Yona. On predictionusing variable order markov models.

Journal ofArtiﬁcial Intelligence Research , 22:385–421, 2004.[11] Human cell atlas. https://chanzuckerberg.com/science/programs-resources/humancellatlas/ , 2018.[12] U. Cetintemel, M. Cherniack, J. DeBrabant, Y. Diao,K. Dimitriadou, A. Kalinin, O. Papaemmanouil, andS. B. Zdonik. Query steering for interactive dataexploration. In

CIDR , 2013.[13] S.-M. Chan, L. Xiao, J. Gerth, and P. Hanrahan.Maintaining interactivity while exploring massive timeseries. In

Visual Analytics Science and Technology,2008. VAST’08. IEEE Symposium on , pages 59–66.IEEE, 2008.[14] Y. Chen, A. Rau-Chaplin, et al. cgmOLAP: EﬃcientParallel Generation and Querying of Terabyte SizeROLAP Data Cubes.

ICDE , 2006.[15] Art institute of chicago: The collection. , 2018.[16] C. Cruz-Neira, D. Sandin, and T. A. DeFanti.Surround-screen projection-based virtual reality: thedesign and implementation of the cave. In

SIGGRAPH ,1993.[17] D. Dardari, M. G. Martini, M. Mazzotti, andM. Chiani. Layered video transmission on adaptiveofdm wireless systems.

EURASIP J. Adv. Sig. Proc. ,2004:1557–1567, 2004.[18] J. Deber, R. Jota, C. Forlines, and D. J. Wigdor. Howmuch faster is fast enough?: User perception of latency& latency improvements in direct and indirect touch.In

CHI , 2015.[19] J. DeBrabant, C. Wu, U. Cetintemel, and S. Zdonik.Seer: Proﬁle-driven prefetching and caching forinteractive visualization of large multidimensional data.Technical report, Brown University, 2015.[20] E. Dimara and C. Perin. What is interaction for datavisualization?

IEEE Transactions on Visualization andComputer Graphics , 26:119–129, 2020.[21] B. Ding, S. Huang, S. Chaudhuri, K. Chakrabarti, andC. Wang. Sample+ seek: Approximating aggregateswith distribution precision guarantee. In

Proceedings ofthe 2016 International Conference on Management ofData , pages 679–694. ACM, 2016.[22] J. Dom`enech, J. A. Gil, J. Sahuquillo, and A. Pont.Web prefetching performance metrics: A survey.

Performance Evaluation , 2006.[23] P. R. Doshi, E. A. Rundensteiner, and M. O. Ward.Prefetching for visual data exploration. In

DASFAA .IEEE, 2003. [24] M. El-Hindi, Z. Zhao, C. Binnig, and T. Kraska.Vistrees: fast indexes for interactive data exploration.In

HILDA , 2016.[25] J.-D. Fekete. Progressivis: A toolkit for steerableprogressive analytics and visualization. In

DSIA , 2015.[26] D. Fisher, I. Popov, S. Drucker, et al. Trust me, i’mpartially right: incremental visualization lets analystsexplore large datasets faster. In

Proceedings of theSIGCHI Conference on Human Factors in ComputingSystems , pages 1673–1682. ACM, 2012.[27] Geotiﬀ. http://trac.osgeo.org/geotiff/ , 2018.[28] M. M. Ghandi, B. Barmada, E. V. Jones, andM. Ghanbari. H.264 layered coded video over wirelessnetworks: Channel coding and modulation constraints.

EURASIP J. Adv. Sig. Proc. , 2006, 2006.[29] Graphistry: Find the stories in your data. , 2018.[30] J. Gray, S. Chaudhuri, A. Bosworth, A. Layman,D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh.Data cube: A relational aggregation operatorgeneralizing group-by, cross-tab, and sub-totals.

DataMining and Knowledge Discovery , 1995.[31] Q. Guo and E. Agichtein. Exploring mouse movementsfor inferring query intent. In

SIGIR , 2008.[32] M. Haklay and P. Weber. Openstreetmap:User-generated street maps.

IEEE PervasiveComputing , 7:12–18, 2008.[33] Q. He, D. Jiang, Z. Liao, S. C. Hoi, K. Chang, E.-P.Lim, and H. Li. Web query recommendation viasequential query prediction. In

Data Engineering, 2009.ICDE’09. IEEE 25th International Conference on ,pages 1443–1454. IEEE, 2009.[34] J. M. Hellerstein, P. J. Haas, and H. J. Wang. Onlineaggregation. In

ACM SIGMOD Record , 1997.[35] H. Hoppe. Progressive meshes. In

Proceedings of the23rd annual conference on Computer graphics andinteractive techniques , pages 99–108. ACM, 1996.[36] J. Jablonsk`y et al. Benchmarks for current linear andmixed integer optimization solvers.

Acta UniversitatisAgriculturae et Silviculturae Mendelianae Brunensis ,63(6):1923–1928, 2015.[37] S. Jakubczak and D. Katabi. A cross-layer design forscalable mobile video. In

MobiCom , 2011.[38] J. Jiang, S. Sun, V. Sekar, and H. Zhang. Pytheas:Enabling data-driven quality of experienceoptimization using group-basedexploration-exploitation. In

Proceedings of the 14thUSENIX Conference on Networked Systems Design andImplementation , NSDI’17. USENIX Association, 2017.[39] N. Kamat, P. Jayachandran, K. Tunga, and A. Nandi.Distributed and Interactive Cube Exploration. In

ICDE , 2014.[40] N. Kamat, P. Jayachandran, K. Tunga, and A. Nandi.Distributed and interactive cube exploration. , pages 472–483, 2014.[41] Nasa: Landsat science. https://landsat.gsfc.nasa.gov ,2018.[42] F. Li, B. Wu, K. Yi, and Z. Zhao. Wander join: Onlineaggregation via random walks. In

Proceedings of the

016 International Conference on Management of Data ,pages 615–629. ACM, 2016.[43] L. Lins, J. T. Klosowski, and C. Scheidegger.Nanocubes for real-time exploration of spatiotemporaldatasets.

TVCG , 2013.[44] Z. Liu and J. Heer. The eﬀects of interactive latency onexploratory visual analysis.

Vis , 2014.[45] Z. Liu, B. Jiang, and J. Heer. immens: Real-time visualquerying of big data. In

Computer Graphics Forum ,2013.[46] H. S. Malvar. Fast progressive wavelet coding. In

DataCompression Conference, 1999. Proceedings. DCC’99 ,pages 336–343. IEEE, 1999.[47] H. Mao, R. Netravali, and M. Alizadeh. Neuraladaptive video streaming with pensieve. In

SIGCOMM ,2017.[48] H. Mao, R. Netravali, and M. Alizadeh. Neuraladaptive video streaming with pensieve. In

Proceedingsof the Conference of the ACM Special Interest Groupon Data Communication , SIGCOMM ’17, pages197–210. ACM, 2017.[49] Omnisci immerse. ,2018.[50] L. A. Meyerovich, M. E. Torok, E. Atkinson, andR. Bodık. Superconductor: A language for big datavisualization. In

Workshop on Leveraging Abstractionsand Semantics in High-Performance Computing , pages1–2, 2013.[51] H. Mohammed, Z. Wei, E. Wu, and R. Netravali.Continuous prefetch for interactive data applications.In

ArXiv , 2020.[52] D. Moritz, D. Fisher, B. Ding, and C. Wang. Trust, butverify: Optimistic visualizations of approximate queriesfor exploring big data. In

Proceedings of the 2017 CHIConference on Human Factors in Computing Systems ,pages 2904–2915. ACM, 2017.[53] D. Moritz, B. Howe, and J. Heer. Falcon: Balancinginteractive latency and resolution sensitivity forscalable linked visualizations. 2019.[54] A. Nanopoulos, D. Katsaros, and Y. Manolopoulos. Adata mining algorithm for generalized web prefetching.

TKDE , 2003.[55] R. Netravali, A. Goyal, J. Mickens, andH. Balakrishnan. Polaris: Faster page loads usingﬁne-grained dependency tracking. In

NSDI , 2016.[56] R. Netravali, V. Nathan, J. Mickens, andH. Balakrishnan. Vesper: Measuringtime-to-interactivity for modern web pages. In

NSDI ,2018.[57] R. Netravali, A. Sivaraman, S. Das, A. Goyal,K. Winstein, J. Mickens, and H. Balakrishnan.Mahimahi: Accurate record-and-replay for http. In

USENIX ATC , 2015.[58] R. Netravali, A. Sivaraman, J. Mickens, andH. Balakrishnan. Watchtower: Fast, secure mobile pageloads using remote dependency resolution. In

Proceedings of the 17th Annual InternationalConference on Mobile Systems, Applications, andServices , MobiSys 19, page 430443. ACM, 2019. [59] P. T. Pasqual and J. O. Wobbrock. Mouse pointingendpoint prediction using kinematic template matching.In

CHI , 2014.[60] M. Pintus, G. Ginesu, L. Atzori, and D. D. Giusto.Objective evaluation of webp image compressioneﬃciency. In

International Conference on MobileMultimedia Communications , pages 252–265. Springer,2011.[61] M. Procopio, C. Scheidegger, E. Wu, and R. Chang.Load-n-go: Fast approximate join visualizations thatimprove over time. In

DSIA@InfoVis , 2017.[62] M. L. Puterman. Markov decision processes: Discretestochastic dynamic programming. In

Wiley Series inProbability and Statistics , 1994.[63] P. Rahman, L. Jiang, and A. Nandi. Evaluatinginteractive data systems.

The VLDB Journal , pages 1 –28, 2019.[64] S. Rahman, M. Aliakbarpour, H. K. Kong, E. Blais,K. Karahalios, A. Parameswaran, and R. Rubinﬁeld.I’ve seen enough: incrementally improvingvisualizations to support rapid decision making.

Proceedings of the VLDB Endowment ,10(11):1262–1273, 2017.[65] K. Ramachandran, B. Shah, and V. V. Raghavan.Dynamic Pre-Fetching of Views Based on User-AccessPatterns in an OLAP System.

SIGMOD , 2005.[66] L. Ravindranath, S. Agarwal, J. Padhye, andC. Riederer. Give in to procrastination and stopprefetching. In

Proceedings of the Twelfth ACMWorkshop on Hot Topics in Networks , HotNets-XII,New York, NY, USA, 2013. ACM.[67] V. Ruamviboonsuk, R. Netravali, M. Uluyol, and H. V.Madhyastha. Vroom: Accelerating the mobile web withserver-aided dependency resolution. In

SIGCOMM ,2017.[68] D. Salomon and G. Motta.

Handbook of datacompression . Springer Science & Business Media, 2010.[69] C. Sapia. PROMISE: Predicting Query Behavior toEnable Predictive Caching Strategies for OLAPSystems.

DaWaK , 2000.[70] C. A. Segall and G. J. Sullivan. Spatial scalabilitywithin the h.264/avc scalable video coding extension.

IEEE Transactions on Circuits and Systems for VideoTechnology , 17:1121–1135, 2007.[71] J. M. Shapiro. Embedded image coding using zerotreesof wavelet coeﬃcients.

IEEE Transactions on signalprocessing , 41(12):3445–3462, 1993.[72] R. Sisneros, C. Jones, J. Huang, J. Gao, B.-H. Park,and N. F. Samatova. A multi-level cache model forrun-time optimization of remote visualization.

IEEETransactions on Visualization and Computer Graphics ,13, 2007.[73] C. D. Stolper, A. Perer, and D. Gotz. Progressive visualanalytics: User-driven visual exploration of in-progressanalytics.

IEEE Transactions on Visualization andComputer Graphics , 20(12):1653–1662, 2014.[74] R. S. Sutton, D. A. McAllester, S. P. Singh, andY. Mansour. Policy gradient methods for reinforcementlearning with function approximation. In

NIPS , 1999.1575] G. K. Wallace. The jpeg still picture compressionstandard.

IEEE transactions on consumer electronics ,38(1):xviii–xxxiv, 1992.[76] J. Wang, A. C. Bovik, H. R. Sheikh, and E. P.Simoncelli. Image quality assessment: from errorvisibility to structural similarity.

IEEE Transactions onImage Processing , 2004.[77] G. Welch, G. Bishop, et al. An introduction to thekalman ﬁlter. 1995.[78] R. W. White, F. Diaz, and Q. Guo. Search resultprefetching on desktop and mobile.

ACM Transactionson Information Systems (TOIS) , page 23, 2017.[79] J. O. Wobbrock, J. Fogarty, S.-Y. S. Liu, S. Kimuro,and S. Harada. The angle mouse: target-agnosticdynamic gain adjustment based on angular deviation.In

CHI , 2009.[80] J. O. Wobbrock, A. D. Wilson, and Y. Li. Gestureswithout libraries, toolkits or training: a $1 recognizerfor user interface prototypes. In

UIST , pages 159–168.ACM, 2007.[81] C. Wu, J. Faleiro, Y. Lin, and J. Hellerstein. Anna: Akvs for any scale.

IEEE Transactions on Knowledgeand Data Engineering , 2019.[82] Y. Wu, J. M. Hellerstein, and E. Wu. A devil-ishapproach to inconsistency in interactive visualizations.In

HILDA@SIGMOD , 2016.[83] Y. Wu, L. Xu, R. Chang, J. M. Hellerstein, and E. Wu.Making sense of asynchrony in interactive datavisualizations.

CoRR , abs/1806.01499, 2018.[84] Q. Xu, S. Mehrotra, Z. Mao, and J. Li. Proteus:Network performance forecast for real-time, interactivemobile applications. In

Proceeding of the 11th AnnualInternational Conference on Mobile Systems,Applications, and Services , MobiSys ’13. ACM, 2013.[85] X. Yin, A. Jindal, V. Sekar, and B. Sinopoli. Acontrol-theoretic approach for dynamic adaptive videostreaming over http.

Computer Communication Review ,45:325–338, 2015.[86] M. Zukowski and P. A. Boncz. Vectorwise: Beyondcolumn stores.

IEEE Data Eng. Bull. , 35:21–27, 2012.

APPENDIXA. SCHEDULER DETAILSA.1 Microexperiments for LP and Greedy Sched-ulers

We implemented the LP-based scheduler in Rust usingGurobi, a state-of-the-art linear program (LP) solver [36].Figure 15 reports the runtime when varying the numberof possible requests between 5-15, the cache size from 10-30 blocks, and the number of blocks per request from 5-15.The LP scheduler is very expensive, as the size of the LPincreases proportionally with the number of possible requests,the cache size, and the number of blocks. Even for such trivialscenarios, generating a schedule can take up to 30 minutes.Figure 15: LP scheduler runtimeFigure 16 shows the performance of the (optimized) greedyscheduler as the cache size, number of blocks, and numberof possible requests vary; in particular, the ﬁgure lists thescheduler’s runtime and the fraction of requests that havenon-uniform probabilities and thus must be materialized. Asshown, the running time is independent of the number ofblocks and it increases proportionally with the number of re-quests and cache size. The bulk of the cost is in precomputingthe probability matrix (Listing 1, lines 8-11). Consequently,the running time of the scheduler is heavily dependent on thefraction of requests with non-uniform probabilities. Fromour experience, this fraction is low for many DVE applica-tions. For example, for the image gallery application with10k requests, the fraction was less than . Moreover, thegreedy algorithm oﬀers ﬂexibility to conﬁgure the batch sizeto produce a schedule in realtime ( < × less than LPscheduler), while beneﬁting from a ≥ × reduction inruntime. A.2 Details of Multiple Prediction Distribu-tions

This subsection describes the formal semantics of a sched-ule when the client sends a sequence of prediction distribu-tions to the scheduler.Formally, consider an analysis session of n timesteps, whereeach timestep corresponds to the time to place one block onthe network. Suppose the one-way network latency is ﬁxedas l, and the client sends k predictions to the server, whereprediction P j () is sent at time t j – l and arrives at the serverat t j . Let the global schedule be s = [b , . . . , b n ] where theserver simply sends each block in sequence. Let b ji be theblock that the scheduler would pick for timestep i if it usedP j to schedule the block, under the following conditions: • if i < t , then assume a uniform probability distribution, • if i < t j , then b ji is null,16 ●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●●●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●●●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●●●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●●●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●●●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●●●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●● ●●●● Cache: 100 Cache: 500 Cache: 5000 N B l o cks : N B l o cks : N B l o cks :

10 100 1k 10k 10 100 1k 10k 10 100 1k 10k110100100010000110100100010000110100100010000 R un t i m e ( l og , m s ) ● ● ● ● mat. probs: 1 mat. probs: 1/100 mat. probs: 1/4 mat. probs: 1/8 Figure 16: Runtime of the greedy scheduler across varyingcache size, number of requests, number of blocks per request(N Blocks), and the percentage of the requests that requiresto be materialized (mat. probs) in the matrix P i,t Figure 17: Greedy and LP schedule utilities. • if i ≥ t j , then b ji is computed as part of batch m = (cid:106) iC (cid:107) (e.g., timesteps [mC, (m + 1)C]), and is the i – mC blockin the batch.Given this, b i = b ji , where j = max { t j ≤ i | j ∈ [0, k] } isthe most recent prediction to arrive prior to timestep i. InFigure 18, the scheduler creates a schedule for the ﬁrst batchof C = 5 blocks using a uniform distribution. When P arrives, it is used to reschedule b in the ﬁrst batch, allblocks in batch 2, and blocks in batch 3. However, P arrivesbefore timestep 13, so the rest of batch 3 is rescheduled usingP . The superscript for each block denotes the predictionthat the scheduler used. Naturally, this is an idealized settingwhere there are no scheduler delays, network variance, andother timing nondeterminism. B. ADDITIONAL EXPERIMENTSB.1 System Parameters

We varied the frequency that the client library sends pre-dictions to the server from every 50–350ms across the low,medium, and high resource settings (recall that the defaultvalue used in our experiments thus far is 150ms). Overall,varying the frequency has a minor eﬀect on the reported met-rics (cache hit rate, response latency, utility, and preemptionrate) and negligible compared to the eﬀects of other settings(e.g., network or cache resources). The one exception is inthe low cache size and low bandwidth setting, where sending P P use P use P use uniform dist b Batch 1: [0, C) Batch 2: [C, 2C) Batch 3: [2C, 3C) b C b C Figure 18: Semantics of idealized scheduler as new predic-tions P j arrive. The blocks have the same color hue as theprediction they rely upon.Figure 19: Overpush rate: percentage of data pushed to theclient that were not used in an upcall to the application.predictions less frequently ( > B.2 Bandwidth Overheads

It is clear that prefetching makes a trade-oﬀ between band-width and responsiveness. We measured the percentage ofblocks sent by the server that were not involved in upcallsto answer application requests (called overpushed blocks).We collected these statistics during the Think Time exper-iments. Overall, at most 75% of the prefetched blocks by

Khameleon are overpushed . This is typically expected—particularly if the intention is to hedge across many possiblefuture requests.