[PDF] Browselite: A Private Data Saving Solution for the Web

Abstract

The median webpage has increased in size by more than 80% in the last 4 years. This extra complexity allows for a rich browsing experience, but it hurts the majority of mobile users which still pay for their traffic. This has motivated several data-saving solutions, which aim at reducing the complexity of webpages by transforming their content. Despite each method being unique, they either reduce user privacy by further centralizing web traffic through data-saving middleboxes or introduce web compatibility (Webcompat) issues by removing content that breaks pages in unpredictable ways. In this paper, we argue that data-saving is still possible without impacting either users privacy or Webcompat. Our main observation is that Web images make up a large portion of Web traffic and have negligible impact on Webcompat. To this end we make two main contributions. First, we quantify the potential savings that image manipulation, such as dimension resizing, quality compression, and transcoding, enables at large scale: 300 landing and 880 internal pages. Next, we design and build Browselite, an entirely client-side tool that achieves such data savings through opportunistically instrumenting existing server-side tooling to perform image compression, while simultaneously reducing the total amount of image data fetched. The effect of Browselite on the user experience is quantified using standard page load metrics and a real user study of over 200 users across 50 optimized web pages. Browselite allows for similar savings to middlebox approaches, while offering additional security, privacy, and Webcompat guarantees.

Full PDF

BBrowseLite: A Private Data Saving Solution for the Web

Conor Kelton

Stony Brook University

Matteo Varvello

Nokia Bell Labs

Andrius Aucinas

Brave Software

Benjamin Livshits

Brave SoftwareImperial College London

ABSTRACT

The median webpage has increased in size by more than 80% inthe last 4 years. This extra complexity allows for a rich browsingexperience, but it hurts the majority of mobile users which still payfor their traffic. This has motivated several data-saving solutions,which aim at reducing the complexity of webpages by transformingtheir content. Despite each method being unique, they either reduceuser privacy by further centralizing web traffic through data-savingmiddleboxes or introduce web compatibility (Web-compat) issuesby removing content that breaks pages in unpredictable ways.In this paper, we argue that data-saving is still possible withoutimpacting either users privacy or Web-compat. Our main observa-tion is that Web images make up a large portion of Web traffic andhave negligible impact on Web-compat. To this end we make twomain contributions. First, we quantify the potential savings thatimage manipulation, such as dimension resizing, quality compres-sion, and transcoding, enables at large scale: 300 landing and 880internal pages. Next, we design and build BrowseLite, an entirelyclient-side tool that achieves such data savings through opportunis-tically instrumenting existing server-side tooling to perform imagecompression, while simultaneously reducing the total amount ofimage data fetched. The effect of BrowseLite on the user experi-ence is quantified using standard page load metrics and a real userstudy of over 200 users across 50 optimized web pages. BrowseLiteallows for similar savings to middlebox approaches, while offeringadditional security, privacy, and Web-compat guarantees.

CCS CONCEPTS • Networks → Network performance analysis ; Network ex-perimentation . KEYWORDS

Web, Browsers, Performance, Optimizations, Images, Data saving

ACM Reference Format:

Conor Kelton, Matteo Varvello, Andrius Aucinas, and Benjamin Livshits.2021. BrowseLite: A Private Data Saving Solution for the Web. In

Proceed-ings of the Web Conference 2021 (WWW ’21), April 19–23, 2021, Ljubljana,Slovenia.

ACM, New York, NY, USA, 12 pages.

This paper is published under the Creative Commons Attribution 4.0 International(CC-BY 4.0) license. Authors reserve their rights to disseminate the work on theirpersonal and corporate Web sites with the appropriate attribution.

A multitude of studies from academia and industry alike suggest up-trends in the complexity and size of the mobile Web [9, 11, 15, 19, 39],so much so, that the median page now has reached 2 MB, upover 80% from 2016 [15]. While this complexity has undoubtedlycreated a richer browsing experience, it does cause downsides formobile users. Byte-heavy pages are responsible for frustratinglyslow browsing experiences on slower networks, along with sig-nificant monetary costs for users on limited mobile data plans. InCanada, for example, the median webpage costs $0.24 to load [34].As a result, emphasis has been placed on changing Web brows-ing to consume less data [9, 11, 44, 46, 51]. While there has beensome public confusion over the exact workings and reach of such data-saving methods [33], recent studies of their actual implemen-tations [31, 37, 54] reveal a few key shortcomings.First and foremost, these solutions impose various privacy con-cerns to their users when compared to regular Web browsing. Someare deployed as middlebox services which either transparentlyproxy the user’s unencrypted traffic [11, 51], or, apply URL redirec-tion [23] or Man-in-the-Middle proxies [44, 46, 58] to also operateon encrypted traffic (HTTPS) [37, 54]. Given the rise of HTTPS [28],the former sees limited use, while the latter breaks the end-to-endprinciples of TLS, exposing potentially private or personalized Webcontents to third parties [37, 54].Further, the exact measures these systems take to actually savedata are often cryptic [31, 33, 37, 54]. As determining Web-compatissues, or the ability to quantify broken webpages, is an open prob-lem that requires large amount of manual effort [41], it is hard todetermine how and when these solutions actually break webpages;though when they do so there is usually public outcry [12].The goal of this work is to devise a data-saving solution whichsolves the above shortcomings. Our intuition is twofold: first, sucha solution needs to be client-side in order to eliminate the privacyand reach concerns of middlebox approaches. Namely, a client-side only approach has the potential to save data for personalizedwebpage content without exposing it to third parties. Second, bybeing image-centric it can impose virtually no impact on Web-compat, in comparison to, for example, JavaScript code elision [22,23, 44]. As the median webpage is comprised of 900 KB of images(or about 44% of its total size) [15], an image-based solution stillhas high potential for data-savings.Many of the aforementioned middlebox techniques [11, 51] auto-matically apply popular image manipulation techniques (resizing,quality compression, and transcoding) to save data before the lastmile (see Section 2). While the weight of images across webpagesis known to be quite high, it is unclear which fraction of pages in a r X i v : . [ c s . N I] F e b WW ’21, April 19–23, 2021, Ljubljana, Slovenia Kelton et al. the wild can benefit from such techniques. Our first contributionis to quantify the impact such middlebox techniques have whenoptimizing images. This quantification represents an upper boundof the data savings which a more private client-side solution canpotentially obtain. We analyze such techniques by compressingwebpages across a crawl of size ∼ ∼

16% of webpages from our crawls.From there, BrowseLite takes a second approach to save-datafor more general pages. This approach, which we call image fetchreduction, uses a component of the HTTP standard known as range requests to actually fetch less data from all Web images,thereby reducing their network data usage. To alleviate the im-pact on the user’s quality of experience (QoE) caused by renderingonly the requested portions of images, BrowseLite differentiatesbetween two standard image types on the Web, baseline and pro-gressive images, during the page load. We show progressive imagescan be rendered almost completely with only a fraction of theirdata requested. For baseline images, we introduce our own tech-nique, reflection, to make empty spaces on the webpage from thesepartial images appear less visually broken. We also show such tech-niques incur a modest trade-off in user QoE using both systematicmetrics [16] and real user studies across 50 of our pages with 200crowdsourced users (See Section 5).Our experiments show that BrowseLite improves the band-width consumption of pages, taken from the same ∼ < The research community has dedicated significant effort to design page load optimizations . These works [36, 42, 49, 57] aim at speedingup the load times of Web pages by optimizing the order of Webobject retrieval over the network. Such reordering schemes require server-side knowledge or assistance, and/or are unlikely to help interms of bandwidth savings [44].Aside from load performance, data-savings is also a largely ex-plored topic, with several commercial solutions already available(e.g., [23, 46]). State of the art data-saving methods can be cate-gorized as: 1) middlebox transformations, 2) server-side resourceoptimizations, and 3) entirely client-side data saving approaches.In the following, we discuss each category in detail.

Middlebox Transformations . These methods rely on transpar-ent HTTP proxies or middleboxes [11, 39, 51] to offer data savingsvia resource transformations. By operating transparently on path,they do not require server-side support. However, they cannot beused in presence of encrypted content (HTTPS) which is nowadaysused by most websites [28, 55]. Popular resource transformationsadopted by middleboxes include plain text gzip compression, imagedownsizing from its native dimensions to its rendered dimensionsin the client’s viewpoint, and content transcoding to formats whichoffer higher compression, e.g., WebP.Flywheel [11], Google’s Web compression proxy, is perhaps theseminal work in the data-saving space. This proxy claims up to 80%data savings via transcoding of image formats, and gzip compres-sion of JavaScript and CSS. FlexiWeb [51] is a follow-up implemen-tation of Flywheel which leverages machine learning to optimizethe trade-off between data savings and user quality-of-experience(QoE). Work from Alibaba [39] extends Flywheel’s techniques toany mobile application by intercepting (unencrypted) mobile traffic.Given the near-ubiquitous adoption of end-to-end encryption [28,55] the potential for implementing such transformations via trans-parent middleboxes remains in question. Further, privacy is of con-cern to these methods, given that they require third party access tocontents of webpages, much of which may be personalized.

Server-Side Resource Optimizations . These methods rely onsome server side support to allow the clients to only fetch a min-imalist version of the page [9, 18, 22, 29, 44, 46]. The methodsproposed include removal and reorganization of various Web ob-jects from the page [9, 44, 46], detection and elision of unnecessarycode [18, 22, 29], and URL rewriting based on content similarity toenable smarter and more effective caching at the client [43]. To bestunderstand which portions of pages to remove, or which content torewrite, requires knowledge of the page state gained by completelyloading the webpage before it is sent to the client, which is whythese solutions require server-side support.To allow for data-savings without explicit server-side control,many of the above methods [44, 46] can be implemented as man-in-the-middle proxies which either break TLS or leverage URLredirection, i.e., serve other Web page contents directly from theirservers. For example, Google’s Web Light [9] redirects the initialrequest for a webpage through its servers, without sharing thepage’s cookies to Google’s servers. While this is a plus for privacy,as no client-side state can be inferred, it limits the reach of theapproach, given that personalized content cannot be optimized [54].Further, such implementations leak information about which URLsare requested to third party servers, providing an opportunity tobuild full browsing profiles of end-users [54]. Recent work [37] hashighlighted the above privacy risks of such approaches, and has alsoshown that many current implementations are built on outdated rowseLite: A Private Data Saving Solution for the Web WWW ’21, April 19–23, 2021, Ljubljana, Slovenia

Savings (KB) C D F A c r o ss P a g e s Standard: top100Extreme: top100Standard: apr50kExtreme: apr50kStandard: apr100kExtreme: apr100k (a)

Savings (KB) C D F A c r o ss P a g e s Standard: top100Extreme: top100Standard: apr50kExtreme: apr50kStandard: apr100kExtreme: apr100k (b)

Savings (Fraction of Page Size) C D F A c r o ss P a g e s top100top100 w/ cacheapr50kapr50k w/ cacheapr100kapr100k w/ cache (c) Figure 1: Room for bandwidth savings by adjusting image sizes. Shown in (a) and (c) are CDFs of potential raw and normalizedsavings respectively across pages of different ranks from our crawls. Normalization in (c) are against page weights with andwithout browser caching. As shown in (b), pages in the 90th percentile see room for over 7MB of savings. software, use substandard TLS certificate validation, and/or useweak TLS cipher suites, opening up users to additional securityrisks.Last but not least, these methods often make use of complicatedrules for replacing/removing JavaScript [9, 22, 29, 58]. Other so-lutions label dead code based on offline randomized user inputtesting [18]. This implies that the efficacy of such solutions in termsof Web-compat remains quite uncertain and hard to measure, oftenrequiring much manual effort to quantify [41]. Furthermore, whenpages do break, there is often backlash from users and Web devel-opers alike due to the lack of transparency of these systems [12].

Client-Side Only Solutions . These methods run fully in theclient, with no server support (either directly or via redirection)or support of on path middleboxes. While this approach offers thehighest privacy guarantees, it is limited in which data-saving strate-gies can be adopted, since the actual contents of Web pages areunknown to clients until retrieved.

Content blocking is the most common client-side data saving solu-tion. This strategy simply blocks resources which can be identifiedas non-useful to users at the time of their request, such as advertise-ments in the case of ad-blocking. While more complicated solutionsthat block potentially useful page components, e.g., JavaScript, areavailable [22], they require apriori knowledge of page contentsgained from observing the page load over a period of time.Content blocking is also made available by Chromium under itsData-Saving mode to block all Web page images [40], replacingthem with a single placeholder image. Image blocking saves usersdata, but it has drastic impact on the user experience of Web pages.While users are allowed to download these images individually,they have little to no context as to which images may be importantto them. BrowseLite balances data-savings and the user’s brows-ing QoE; further, no user action is required. Our user study usingBrowseLite revealed that users tend to rate pages without imagesas completely broken (1 on a 1-5 scale at the median), whereaspages with data-optimized images received much more favorableresponses (3 at the median).The work in [59] outlines what can be done from the client interms of page load optimization, but focuses mainly on speculative caching (similar to [43]), as well as content pre-fetching to improve latency rather than data-savings. While caching clearly reducesdata sent over the network, it does not offer data savings undercold connections, which the same work stresses the prevalence andimportance of [59]. BrowseLite is primarily designed to save usersdata under cold caches, though our results (Section 5) highlighthow BrowseLite will not waste data in presence of hot caches.Further, recent works have shown security implications for tooliberal caching policies, such as enforcing the caching of similarcontent between domains [53].

We begin our measurement by determining the potential for savingsmade available through less private, middlebox based optimizationof images. Our analysis serves the purpose of creating a baseline forwhich to compare the data-savings of our more private, exclusivelyclient-side, BrowseLite. While the weight of images across web-pages is known to be quite high (44% as per HTTP Archive) [15],it is unclear how much of these images could be saved using themiddlebox approaches of image resizing, quality compression, andtranscoding (see Section 2).

Methodology . We resort to

Web crawling [56] to collect a repre-sentative dataset on the current status of image usage in the wild.To obtain a set of domains to crawl, we use the Majestic list [1]which contains the top million domains with the most referringsubnets. We chose the Majestic list, as opposed to the more popularAlexa list [2], as it is a free alternative that is still exclusively basedon Web browser traffic [50].We crawl 3 buckets of webpage rankings from the Majestic list(top100, apr50k, and apr100k) [1]. For each bucket, we select thefirst 100 websites, e.g., pages 100-200 for the top100 and pagesranked 50,000-50,100 in the apr50k bucket. Given the importanceof covering internal pages [14, 56], we crawl the first 10 links tothe same domain, if available, from each top level domain. Ourcrawls originated a data-set encompassing ∼ WW ’21, April 19–23, 2021, Ljubljana, Slovenia Kelton et al.

Percentile SS I M Strategy

StandardExtreme

Figure 2: The resulting trade-offs in image quality, mea-sured via SSIM, across the different levels of potential sav-ings from Figure 1. use Lighthouse to load a page and collect various network and in-page statistics, such as network bytes associated with each resourcerequested, and the final rendered locations and sizes associated withall images from the Cascading Style Sheets (CSS) of the webpage.We use Puppeteer to obtain the response bodies of all networkrequests, and to scroll through the page to capture information ofimages which may be lazy loaded, or those added only in presenceof user interactions. We ensure each webpage (landing and internal)is loaded using a cold cache. However, we also save the necessaryHTTP headers (i.e.

Cache-Control, Expires, Last-Modified,and Etag ), to implement browser logic for determining whether agiven request is cacheable [20]. This allows us to simulate the loadof these pages with caching enabled, offline after our crawls. Thisanalysis is particularly important for internal pages, where manyresources may be cached from the landing page.Lighthouse currently provides rough estimates of wasted bytesdue to a page failing to implement standard image data-savingoptimizations (See Section 2. For our measurements, we extendthese estimates in two ways. In the following, we detail both exten-sions to Lighthouse reports and, for each, provide an analysis ofthe extensions based on the ∼ Image compression pipeline . We estimate the potential datawaste in images by manipulating all the HTTP response bodies ofimage requests through 3 image optimization techniques as recom-mended by Lighthouse and employed by proxy based approaches(e.g., Flywheel [11]), that is, image resizing, quality compression,and transcoding. Our first extension to Lighthouse data-saving mea-surements is to pipeline images through these 3 optimizations, asLighthouse currently only applies these individually. This approachcan underestimate savings as these optimizations compound to savedata [11, 51]. We employ two versions of the pipeline, standard and extreme , which have trade-offs in terms of savings versus potentialimpact on the user’s QoE. To quantify the reduction in QoE causedby image savings, we use the structural similarity metric [60] (SSIM),a full image reference metric which is commonly used to measurequality degradation in images due to transformations (blurring,compression, color reduction, etc.).For the resizing component of the pipeline, in standard mode,we resize the image as sent over the network to its CSS attributeswidth and height; for extreme mode we resize the image to half of these values. We note that the CSS attributes depend on the sizeof the viewport, and hence smaller viewports may achieve higherrelative savings for the same image at the same perceptual quality.In all our experiments, we instrument Chromium to emulate aPixel 2 which has viewport size of 411x731 (5.5 inches), or the mostpopular size in 2019 [17]. Following image resizing, all jpeg, tiff, png,bmp, and gif images are transcoded to WebP, a “next generation”image format which offers higher compression with visual qualitycomparable to the other formats [13]. In standard mode WebPimages are compressed by reducing their quality setting to 85 (outof 100), as this is reported as the best trade-off between savings andSSIM degradation according to previous works [11, 51]. In extrememode, we aggressively reduce image qualities to 10 (out of 100).Results from the image compression pipeline are shown in Fig-ure 1 (a) and (b). Pictured first is the CDF of savings, in KBytes, forimages across pages from the top100 and apr50k websites. What wecan observe is that lower ranked pages see generally higher savings,with medians of 152KB, 292KB, and 308KB for standard mode oftop100, apr50k and apr100k respectively, and 189KB, 375KB, 390KBfor extreme mode on the same. This result is intuitive, as more pop-ular pages are expected to be more optimized. While the savingsoffered for the median page is rather modest, the distribution ofsavings is quite long-tailed, as shown in Figure 1 (b). The figureshows that even top ranked pages see savings of over 3MB at the95th percentile. Further, the gap between the standard and extrememodes is modest, being at most 100KB at the median page acrossthe ranks. This is likely because savings offered through resizingand reducing quality begin to diminish when configured furtherpast that of our standard level.Outside of raw bytes, we compare savings as a fraction of thetotal page weight. Figure 1 (c) provides the normalized savings,for our standard mode, across our crawls broken up by page rank.As before, the lower ranked pages offer more potential savings,with 10.3%, 20.1%, 21.9% of the median page being saved for top100,apr50k, and apr100k respectively. This jumps to 30.8%, 38.9%, and44.5% for in the 75th percentile across the respective crawls.We also analyze the savings normalized by the weight of pagesunder caching. Specifically, we do not count URLs (including im-ages) that are a) found on both top level pages and second-levelpages, and b) marked as cacheable by their HTTP headers [20]towards the total weight and savings of second-level pages. Notethat we assume a double-keyed HTTP cache [26, 30]. It follows thatwhen determining repeat URLs, the same image on different do-mains will be counted for savings across both pages. Figure 1 (c) alsoshows this result, where we can observe that savings increase in thepresence of warm caches; specifically savings of up to 14.2%, 21.6%,and 36.8% are observed for the median page in top100, apr50k, andapr100k respectively. This is because many Web resources, otherthan images, are shared between landing pages and inner pages,thus increasing the relative weight of data-saving techniques.Moving to SSIM analysis, we observe that standard mode pro-vides rather mild QoE trade-offs, while the trade-off for the extrememode is harsher. Figure 2 shows paired boxplots of the resultingSSIM metrics of images, optimized by both standard and extrememodes and bucketed by their resulting percentile of data-savings.While the median SSIM of images for all levels of savings in standard rowseLite: A Private Data Saving Solution for the Web WWW ’21, April 19–23, 2021, Ljubljana, Slovenia

Delta Savings (Log-Scale) C D F A c r o ss P a g e s (a) Delta Savings (Log-Scale) C D F A c r o ss P a g e s (b) Figure 3: Log scaled difference in savings (KBytes) when (a)not considering CSS background images and (b) when fail-ing to appropriately calculate savings for CSS sprites. mode does not fall below .95 (.83 for the 25th percentile), the me-dian of extreme mode sits at .74 across all levels of savings. Finally,we also observed that images that benefit more from data-savingssee lesser reductions to their visual quality. This analysis suggeststhat high data-savings are possible with modest QoE impact, withstandard mode offering the more preferable trade-off.

CSS sprites . Lighthouse ignores potential savings from resizing im-ages which are embedded by CSS, or CSS background images. Thisis because a typical use for background images is CSS sprites [25],or images that consist of multiple smaller images embedded in oneparent file (or sprite sheet). The sprite sheet is then dynamicallycropped by the webpage to render the component images. Mea-suring savings for CSS sprites correctly requires accounting forthe final size and location of all sprites, not just the original image.Due to this complexity, Lighthouse currently ignores calculatingdata-savings for all background images, thus leading to potentialsavings underestimations.We identify CSS sprites, or more generally any images that arecropped dynamically using the CSS property background-position ,and separate these from normal CSS background images. We com-pute savings for normal background images using the pipelinepreviously discussed in this section. To calculate savings for CSSsprites, we compare the total area used by sprite sheets on thepage with the total area of these images as sent over the network.Figure 3 (a) shows the Cumulative Distribution Function (CDF) ofthe data-savings which are currently missed by Lighthouse dueto ignoring background images. Figure 3 (b) shows a CDF of theoverestimation in savings seen from incorrectly resizing entire CSSsprite sheets to their component sprite size. The CDFs refer to thefull set of pages from our dataset, and show the change in KBytesof savings in log scale. The graphs start at (a) 0.875 and (b) 0.6 forvisibility as only 12.5% and 40% of pages were affected for (a) and(b) respectively. While not accounting for background CSS imagesonly misses < 𝐾𝐵 of savings for 88% of pages, there are 5% ofpages for which > 𝐾𝐵 are missed. Resizing the CSS sprite sheetto the size of a single sprite can cause an overestimation of 100 𝐾𝐵 and 2 𝑀𝐵 of savings for the pages at the 60th and 90th percentile,respectively. Measurements from our previous section are motivating evidencethat image optimizations contribute to significant bandwidth sav-ings for Web browsing, especially in the upper tail of lesser-optimizedpages which see savings on the order of MBytes. However, suchsavings represent a bound for image savings attained by eitherprivacy invasive approaches, e.g., using middleboxes, TLS intercep-tion, URL rewriting, or solutions which require some form of servercollaboration.In this section, we introduce BrowseLite, a collection of tech-niques which realize image data-savings directly at the client (e.g.,a browser), thus offering higher privacy guarantees. BrowseLiteconsists of two main techniques, URL Rewriting and image fetchreduction, which are both described in the following.

The key challenge for a client-side data saving approach is thatit cannot apply transformations in the same way as middleboxes;when images are received by the browser, the user’s data has al-ready been wasted. Instead, it requires a way to ask the serverfor a more compressed version of images. Our intuition is thatthis is possible thanks to the recent widespread of image services run by popular content delivery networks (CDNs) like Fastly [4],Akamai [5], and CloudFlare [6]. These image services offer similarmeans as middleboxes to reduce image file sizes. Such savings aretypically made accessible by configuring parameters in the URL ofthe image being served by the CDN.However, these image services are not always configured opti-mally. For one, images uploaded to these services may be resized ina “one-size fits all” manner, for convenience, even though mobilepages, for instance, are accessed from a large diversity of devicetypes and screen sizes. For smaller devices, this implies images canbe further resized without observable quality degradation. Further,browser fragmentation means modern image formats (e.g., WebPand AVIF) are not always supported. For these reasons, image ser-vices can be configured in an overly conservative manner to notdeliver these modern formats, missing out on significant savings(see Figure 1). Lastly, image services may be configured to deliverimages under higher visual quality (as determined, for example,by SSIM discussed in Section 3), rather than higher data-savings.However, many image formats are able to keep much of the samevisual quality at even a significant level of compression, e.g., jpegand WebP images provide no noticeable visual quality reduction at85% compression levels [11] (See Section 3). As we will show later(see Table 4), these configurations of image services can miss up to70% savings across real pages.We design BrowseLite to detect the use of such image ser-vices, and to uncover potential data-savings in their configurations.Specifically, BrowseLite detects whether or not an image serversupports the same transformations as the standard mode outlinedin Section 3, that is, CSS right-sizing, quality reduction, and for-mat transcoding. If such support is detected, BrowseLite modifiesHTTP requests for these images in real-time to automatically applysuch transformations, thereby optimizing bandwidth consumption.We call this component of our work URL Rewriting.

WW ’21, April 19–23, 2021, Ljubljana, Slovenia Kelton et al.

To identify the presence or lack of an image service associatedwith a given image, BrowseLite searches for parameters in the URLof images that might related to the image’s dimension, compressionlevel, and format. For example, in the URL , the parameters w_400 , q_100 , and extension .jpg correspond to the actual dimensions,quality compression level (out of 100), and format of the downloadedimage. The equality of the image data and the URL parameterssuggests that the image can be dynamically resized, compressed,and transcoded just by changing such parameters on the fly.Editing URL parameters is not without risk, as a URL may simplybe statically defined with no image service available, and shouldthus not be edited. This manner of false positive can, at the veryleast, cause an extra unnecessary round trip and mitigate bandwidthsavings, and at the worst actually hurt bandwidth savings.To avoid latency and bandwidth harming requests as much aspossible, BrowseLite takes a two step approach to URL Rewriting.First, we rewrite any value in the URL that matches a native size,quality, or format property of the image. Intuitively, this methodachieves high true positive rate, but also high false positive rate.Thus, second, we generate a series of rules from the true positivesto increase precision; instead of rewriting any location in the URLmatching a property, we only rewrite URL parameters matchingsuch rules (e.g., w_ in our example above). We further extend theserules by manually exploring mobile vs. non-mobile versions ofthe page and the image service APIs of 12 popular CDNs . Wecreate such rules across the ≈

10k images obtained from our crawlsoutlined in Section 3. In the end we chose a subset of rules formatching which gave the best trade off in terms of true positiveand true negative rates for all our observed images.Figure 4 summarizes the final results of the URL Rewritingprocess. We can observe that a total of 50 unique rules were utilizedwhich affect 16.1% of images for an error rate of 7.2%. Breakingdown the error rate, 6.6% of images returned a HTTP 404 status code,implying a need for a single re-fetch when running BrowseLite.The latter 0.6% returned a size greater than the original size. Overall,of the affected images, 69.9% of the original image size was savedon average. We discuss the impact URL Rewriting has on pagedata-savings in the following section.

The HTTP standard outlines the ability to request arbitrary bytesof an HTTP response over the network. This is achieved by anHTTP range header being attached to a request which indicates theamount of bytes of the resource that should be sent from the server.Assuming this capability is properly implemented at the server, a status is returned for the request, with the subset of bytes re-quested as the response body. As Web image formats were designedto display render even under slow or lossy networks, not only canimages be partially requested, but major image codecs (e.g., libpng,libjpeg, libwebp) support partial rendering as well. As 96% of theservers in our crawls supported range requests, our rationale is tocombine the use of range requests with URL Rewriting to achievesavings on a more general set of webpages. We call the process of https://static . wixstatic . com/media/98a2de_37749ccfe79f48d1a977af77d1c2bd0e~mv2 . jpg/v1/fill/w_400 , h_52 , al_c , q_100 , usm_0 . . . . jpg Rules and manual efforts found in the document: https://tinyurl.com/86srq141.

BrowseLite to request less for images its image fetch reductioncomponent. In Section 5 we compare the effectiveness of the com-bined URL Rewriting and image fetch reduction components ofBrowseLite to existing approaches, e.g., middleboxes and GoogleWeb Light [23], in terms of data-savings.While clearly requesting 50% of the bytes of an image can usherin 50% data-savings, what remains is the impact on semi-completeimages on user QoE. Generally, the first X% of the bytes of Webimages can be used to render the top most X% of the pixels ofimages. This implies that only the top half (approximately, givencompression) of the images will be displayed given a 50% rangerequest, making most images appear broken.However, there are some factors which can alleviate this impact.First, progressiveness is a rendering mechanism of jpeg and pngformats in which image data is encoded such that it is able to berendered in layers as opposed to top-to-bottom. What this implies isthat an image can be rendered in its entirety, albeit at a lower quality(SSIM), given only a fraction of the available bytes. As discussedin Section 3, much of the potential data-savings for images comefrom the fact that they are sent at a larger size than they will berendered at the client. Due to downsizing, when they arrive at theclient certain progressive images can be rendered at high qualityeven with a fraction of their payloads requested.For non-progressive images, when applying image fetch reduc-tion, BrowseLite performs a visual trick to make pages appearless visually broken while still attaining meaningful savings. Thistechnique takes the partial data of the image that is obtained overthe network, and fills in the broken gaps of empty space with re-flected and blurred content of the image, which we call reflections .The idea stems from a popular technique on the Web known as image previews [27]. This is a technique employed by many popularWeb services (e.g., Facebook [35] and Medium [48]) where pagesdisplay small and blurred portions (on the order of bytes) of im-ages before the full versions are downloaded, as opposed to emptyspaces or placeholders. However, since BrowseLite does not haveaccess to server-side control, and thus full image data, we cannotpre-process the images offline to make previews. Instead, we usethe partial data in the range request to make reflections on the fly,at the client.Figure 5 shows a visualization of image fetch reduction incase of regular (a) and progressive images (b). For Figure 5 (a),both images show the page with 50% the image data requested.When reflection is applied (left most), the page is 95% visuallycomplete (according to the SpeedIndex metric, see Section 5.2 formore details) and only 73% visually complete without reflection.Figure 5 (b) shows a page with 15% (left most) and 80% (right most)of the image data requested, respectively. Given the progressiveimage is sent over the network at dimensions much larger thanits final rendered ones, the page is still 95% visually complete withonly 15% of the data fetched.

We implement URL Rewriting and image fetch reduction as aPuppeteer [8] application. While we test with Chromium version 83,our application can function out of the box on any Chromium basedbrowser or one in which supports the Chrome Debugging Protocol(e.g., Brave, Edge, and Opera [21]). Further, while BrowseLite is rowseLite: A Private Data Saving Solution for the Web WWW ’21, April 19–23, 2021, Ljubljana, Slovenia

Image Optimization Unique Rewrite Rules TPR FPR (404) FPR (No Savings) Savings

Right-Sizing 39 15.4% 6.1% 0.6% 66.2%Quality Reduction 5 3.7% 1.2% 0.01% 53.7%Format Transcoding 6 3.9% 0.1% 0.01% 44.1%Any Transformation 50 16.1% 6.6% 0.6% 69.9%

Figure 4: Quantifying the effectiveness of instrumenting image services for better image optimizations from the client. Savingsis in terms of average reduction in size of these images. While a small fraction of pages are able to be rewritten in this way,the relative savings is quite large. prototyped as an external application, many of the components useinternal browser APIs. We discuss the potential for BrowseLite tobe fully integrated with the browser in Section 6.To perform URL Rewriting, BrowseLite intercepts all HTTPrequests associated with images as defined by the Chromium net-work stack. Each request URL is associated with a DOM node ofthe Web page which can be used to extract the image’s CSS widthand height as needed for URL Rewriting. Next, the actual requestURL is run through a regex to reconfigure the URL parameters tofetch a lighter version of the image, if possible, in the same manneras discussed in Section 3 and Table 4.Moving to image fetch reduction, since BrowseLite doesnot assume server cooperation, it does not know the file size ofan image apriori, which is needed to form a range request. Forthis reason, BrowseLite assigns a range header to instead requestthe first 2KB of every image. Contained in the server’s responseis the image’s full size in bytes, and the metadata for the image.This metadata is immediately passed to the browser since it is usedto facilitate the final layout of the page [10], which should not bedelayed. Once the full image size is known, a second range requestis immediately issued to obtain the lighter version of the image asa fraction of its known total size.Following the request procedure, the resultant image is builtin memory using the concatenated data from the first and secondrange requests to prevent wasted bandwidth. To display the im-age from memory, the image is in-lined in the Web page using a dataURI [24] on the associated DOM element which was previ-ously obtained. The metadata from the initial 2KB of the imageis used to determine, on the fly, its progressiveness. If the imageis progressive, the dataURI simply consists of the data requestedover the network. If the image is not progressive, the image data isdecoded, reflected, blurred, and re-encoded to a final dataURI.Finally, fallback cases are necessary for both URL Rewritingand image fetch reduction components of BrowseLite. If theinitial 2KB request returns a 404, then URL Rewriting is abortedand only image fetch reduction is used. In case of a false positive,e.g., the returned image is bigger than the original, BrowseLitecan only proceed to image fetch reduction. For image fetchreduction, range request support is determined on the fly if theserver does not respond with the expected initial 2KB, or does notreturn the expected response headers to notify BrowseLite of theimage’s total size. In these cases, the full image, after having beensubjected to URL Rewriting, is downloaded in its entirety. Asdiscussed before, BrowseLite only sees about 7% false positives when URL Rewriting, and only 4% of servers do not support rangerequests needed for image fetch reduction.

We move to evaluate BrowseLite in terms of its impact on user QoE,data-savings, and page load performance. We also compare the data-savings of BrowseLite to the less private middlebox optimizationsof Section 3 and Google Web Light.

We begin by analyzing the potential data-savings offered throughURL Rewriting. Figure 6 shows the CDF of the fraction of pagebytes saved, per the URLs in each bucket of our crawls (Section 3).The figure shows that URL Rewriting can only be used for 10–20%of the URLs in the two higher ranked popular buckets (top100 andapr50k), while it has little effect on less popular URLs (apr100k).This is because higher ranked pages are more likely to make use ofimage services which URL Rewriting opportunistically exploits.For the pages where URL Rewriting can be used, it offers signif-icant data savings. For example, 5% of pages in apr50k and top100see savings of 20-30% or 400KB saved per page, on average. Withrespect to user experience, images have a median SSIM value of .97(see Figure 1(c)), a negligible quality reduction which is not de-tectable by a user. These high SSIM values are observed becausethese images are rewritten using standard means of compression.Next, we quantify data savings and QoE impact of image fetchreduction. While SSIM is useful to quantify visual impacts madeto transformations that effect the entire contents of an image (e.g.,blurring and color compression), it was not intended to referencequality of full images against partially complete images, as areproduced by image fetch reduction. We instead quantify visualimpact for pages using the visual completeness , a component ofSpeedIndex [16, 45] which is a Web performance metric describingthe average time in which Web pages are rendered. Visual com-pleteness is the comparison mechanism used by SpeedIndex todetermine the fraction of a loading Web page that is rendered at agiven point in time, allowing it to reasonably measure the visualimpact of partially complete images and pages. Under the hood, thevisual completeness compares color histograms from screenshotsof the Web page at points in its load to a screenshot of its fully ren-dered state. The visual completeness at a given time is the fractionof pixels with colors matching the final state.

WW ’21, April 19–23, 2021, Ljubljana, Slovenia Kelton et al. (a)(b)

Figure 5: The visual completeness of (a) reflection and (b)progressiveness of images. The left pair shows the state ofthe page with 50% of the image data requested. The pagewith reflection is still 93% visually complete while the pagewithout is only 78% virtually complete. The right pair showsthe state of a page with a large progressive image. The pageis already 95% visually complete with only 15% of the datarequested (99% with 80% requested).

Figures 7 (a) and 7 (b) compare the visual completeness valuesof pages with various amounts of image data requested, in 10%increments, relative to the pages with 100% of the image data re-quested. Each point on the CDF represents the visual completenessreached by 25%, 50%, and 75% of pages. Figure 7 (a) pictures allpages together, and (b) pages with a supermajority (>=60%) of pro-gressive images ( ≈

11% of pages in our dataset). We can observe thatacross all pages, the median page is still 90% visually complete withonly 50% of data requested. Likewise, pages in the 75th percentileare 90% visually complete with only 30% of the data requested.

Savings (Fraction of Page Size) C D F A c r o ss P a g e s apr100kapr50ktop100 Figure 6: Page savings recovered by manipulation of server-side compression via URL Rewriting. As ≈

16% images areoptimized, savings are shown from the 80th percentile.

The median page with a supermajority of progressive images re-mains 90% visually complete even with only 10–15% of the imagedata requested.To expand on this result in terms of data saved, Figures 7 (c)and (d) show CDFs of savings across pages assuming various levelsof image data requested – distinguishing between all pages (c)and the subset of pages hosting a supermajority of progressiveimages (d). Savings for general pages in (c) are shown for 25%, 50%,and 75% of the image data requested. Given Figure 7 has shownthat, for progressive images, image fetch reduction can be moreaggressive and reach higher visual completeness, savings in (d) areshown for 10%, 25%, and 50% of image data fetched. We can observethat the median page sees a 28% reduction in size by requesting 50%of image data. This jumps to 42% savings for 25% of image datarequested. The progressive pages saw ≈

40% savings with 10% datarequested while remaining at least 90% visually complete.

While visual completeness is a useful proxy for quantifying theimpact BrowseLite has on the user experience at scale, it is not asubstitute for feedback from real users. Thus, we performed a userstudy to investigate how BrowseLite affects the end user.We selected 40 pages with regular images and 10 pages with su-permajority progressive images to show users in a crowdsourcingstudy, run via the Microworkers [7] platform. Our study was a sim-ple Web page that contained screenshots of the first viewport of thepages compressed via BrowseLite. For the screenshots, the imagefetch reduction component of BrowseLite was configured suchthat 50% of the image data was requested across all pages. The 40regular pages were chosen randomly from 4 buckets of visual com-pleteness to their original counterparts. These buckets were chosenbased on the distribution of pages in Figure 7 (a), i.e., VC >= 95%,90% <= VC < 95%, 85% <= VC < 90% and VC < 85% for an averagevisual completeness of 89%. The 10 progressive images had meanvisual completeness of 97%. On the study Web page, we showed theusers the compressed version of each page with BrowseLite sideby side with the original page. The formal question we asked tousers was, ‘How would you rate the quality of this compressed pagewhich can extend your mobile data plan so you can browse more?’ . rowseLite: A Private Data Saving Solution for the Web WWW ’21, April 19–23, 2021, Ljubljana, Slovenia Percent of Image Data Requested V i s u a l C o m p l e t e n e ss

25% of pages50% of pages75% of pages (a)

Percent of Image Data Requested V i s u a l C o m p l e t e n e ss

25% of pages50% of pages75% of pages (b)

Savings (Fraction of Page Size) C D F A c r o ss P a g e s (c) Savings (Fraction of Page Size) C D F A c r o ss P a g e s (d) Figure 7: CDFs of savings offered by BrowseLite as well as the tradeoff in visual rendering quality. Plots (a) and (b) quantifyvisual completeness values against the percent of image data requested. The plots picture all pages and only those with asupermajority of progressive images respectively. Plots (c) and (d) convey normalized savings for the same, broken up bythree levels of image data requested. For (d) the levels show less data requested, since pages with progressive images are morevisually complete with the same amount of data requested.

We provided the user a quality scale from 1–5 with meanings foreach choice given in Figure 8.For comparison, we also showed users screenshots of the samepages optimized with the Google Web Light tool [23] (see Section 2).While Web Light acts as a good upper bound for what can truly bedone in terms of data-savings, it is a less private approach, and noteasily quantified in terms of Web-compat. We showed 41 total WebLight pages (by forcefully navigating pages through Web Lightsservers at http://googleweblight . com/i?u = URL [23, 54], as 9 of thepages in our set of 50 could not be optimized by Web Light andsimply redirected to the original page, a behavior documented byrecent analyses [54]. Not including such 9 pages, the Web Lightpages had mean visual completeness of 73%.Participants to our study were shown 20 of the 50 pages; foreach of the 20, either our page or Web Light version (if available)was randomly shown, to avoid bias by having users compare twoof the same URLs during the study. Two controls were also shown,one with a perfect rendition of a page (visual completeness of 100),and one with a page where all images were replaced by imageplaceholder icons (visual completeness of 29). We only acceptedusers who rated these as 4-5 and 1-2, respectively, which filtered outapproximately 35% of user responses to our study. We collected rat-ings from 200 Microworkers users (after filtering) giving each pageapproximately 40 ratings. We provide a link to an anonymous videoof the study which shows both pages optimized by BrowseLiteand by Web Light, as seen by our Microworker users.Figure 8 shows the median rating of each page calculated over the ≈

40 user ratings collected per page. For BrowseLite, we distinguishbetween the pages with supermajority progressive images andpages where the reflection trick was used (see Section 4.2). FromFigure 8 we draw a few observations. The first is that users rated all10 progressive pages very highly (mostly very good and few good ),corroborating that progressive images can see higher data-savingsfor the same trade off in the user experience, discussed in Figure 7.The second is that the majority of pages with reflected images(21 out 40) were rated as usable for most users. Though none ofthe pages were rated as broken , about 20% were given a poor rating. https://streamable . com/e89yji Upon further investigation of these pages, we observed that thisrating was typically given if a) human faces were distorted or b)actual text embedded within these images was reflected (and thusunreadable). Conversely, pages with text overlaid on the image,and not part of the image data where rated quite positively (4 or 5).While we do not take means to prevent such cases in BrowseLite,we believe it to be a good direction for future work (see Section 6).Further, the only page that received a median score of 1 was ourcontrol page with broken placeholder images. This result suggeststhat reflections are generally more favorable for the user experiencethan placeholder images, a technique used by Chromium under itsData-Saving mode images [40].Finally, Figure 8 shows that Web Light, while removing andrearranging much of the page contents and having a much lowervisual completeness on average, was rated generally highly byusers (21 good pages). However, as Google’s servers have accessto the contents before they are sent to the client, Web Light hasmore context, and time, to analyze the important parts of the page,factors not available to BrowseLite.

One caveat in applying range requests to save data is the potentialimpact on caching. For example, let’s assume a user of BrowseLiteloads a page using a cellular connection, and then later loads the

Method Broken(1) Poor(2) Usable(3) Good(4) Very Good(5)

BrowseLite(Reflections) 0 8 21 8 3BrowseLite(Progressive) 0 0 0 3 7Web Light 0 2 11 21 7

Figure 8: User responses form our crowdsourced user study.The rows represent the type of optimization: BrowseLiteon pages with regular images, BrowseLite on the progres-sive image pages, and Web Light optimized pages.

WW ’21, April 19–23, 2021, Ljubljana, Slovenia Kelton et al. −2000 −1000 0 1000 2000

Change in SpeedIndex (ms) C D F A c r ss S i t e s Wi-Fi4G (a)

Extra Request DOM Search Image Transform

Overhead Type O v e r h e a d (b) Figure 9: The CDF in (a) shows SpeedIndex is inflatedby 70ms for the median page on Wi-Fi and 25ms on 4G. Theboxplots in (b) show the relative contributions from eachsource of delay caused by BrowseLite on the SpeedIndex. same page under a Wi-Fi connection, where no data-saving is neces-sary. Ideally, the portions of the images fetched using range requestsare not re-fetched when opting out of BrowseLite.The heuristics that browsers employ for caching content rangesare not well documented. We thus set up an experiment on Chromium(version 83) to determine how caching rules for range requests arecurrently handled. First, we instrument Chromium to fetch realpages and images from our study under a cold cache. Next, weperform the following experiments:(1) We issued two successive range range requests for images withrange of 0-10KB and 10-20KB and found they were both cachedon subsequent requests, suggesting range requests of the samerange are cacheable.(2) We issued two range requests with overlapping ranges andfound that the browser rewrites the range request to only fetchthe remaining data, e.g., a request for 0–10KB of an imagefollowing a range request for 0–20KB of the same was rewrittento request the range not yet in the cache: 10–20KB.(3) We issued a request for the full image following a range requestand found that no data was wasted. For example first request-ing 0–10KB of an image, followed by requesting the full image,only results in the remaining portion of the full image beingrequested by the browser.(4) The reverse of the above is also true: we issued a request forthe full image followed by a range request for 0–10KB of thesame and found the range request to be served from the cache.These results add to the realism of a data-saving approach thatleverages range requests, in that switching between BrowseLiteand normal browsing does not adversely affect HTTP caching, andhence waste data. This result further implies: a) flexible ranges canbe used, say, if network conditions change, and b) BrowseLite canbe turned off, presumably by reading existing functionalities likethe HTTP Save-Data header [31] with no excess data waste.

BrowseLite was designed with data-savings rather speed as itsmain goal. However, to be usable, it is still important that pages arenot significantly slowed down, which we investigate here.BrowseLite introduces a few extra operations to page loadswhich can potentially slow them down. The first is the extra range

Savings (Fraction of Page Size) C D F A c r o ss P a g e s MITM ProxyHTTP ProxyWeb LightBrowseLite

Figure 10: CDF of data-savings of middlebox approaches andGoogle Web Light compared to those of BrowseLite. request used to fetch image metadata. Second is the search in theDOM to associate an image request to an object on the page. Thirdis the extra processing needed to create the reflections of non-progressive images. However, the data saved by BrowseLite hasthe potential to offset such overhead. Note that while the extrarequest given an erroneous rule from URL Rewriting also addssome latency, this occurs relatively infrequently, as denoted byFigure 4.In order to quantify these additions to a Web page load, we mea-sure the average overhead in rendering time of pages as given bythe SpeedIndex [16] metric. Figure 9 shows the CDF of change inSpeedIndex when loading pages normally and with BrowseLiteacross the ∼ . com reported40Mbps up, 10Mbps down, 10ms RTT) and a Verizon 4G LTE connec-tion (4Mbps up, 3Mbps down, 40ms RTT). We benchmark BrowseLitewith a visual completion budget of 90%, implying that 50% of theimages are requested (see Figure 7).We can observe that ≈

80% of pages on Wi-Fi experience over-heads of <500 ms. Further, while 20% of pages experience a morenoticeable delay (>500 ms), 41-48% of pages actually see an im-proved SpeedIndex by an average of ≈ sooner since only 50% of the image contentneeds to be requested. While the fact that 20% of pages experiencenoticeable slowdowns is significant, we note that the primary objec-tive of BrowseLite is to save bandwidth, and we expect users willtolerate a slight delay in their pages in exchange for data savings.For the pages that see increased SpeedIndex, in Figure 9 weanalyze the 3 main causes of overhead from BrowseLite for theirrelative attributions to the increases. From this, we can observethat the largest fraction of overhead is indeed due to the extra RTT(causing 50% of the inflation in the median page), followed closelyby transformation times using the browser’s native image canvasAPIs (45% in the median page). The DOM search is relatively fast incomparison (5% in the median page). The contributions to overhead rowseLite: A Private Data Saving Solution for the Web WWW ’21, April 19–23, 2021, Ljubljana, Slovenia were similar between the Wi-Fi and 4G experiments, with the extrarange request taking up 5% more overhead on 4G than on Wi-Fi. Insection 6 we provide a few potential ways the transformation andsearch overheads can be minimized going forward, mainly throughtighter implementation of BrowseLite in the browser. Finally, we compare the data-savings of BrowseLite – the com-bined image fetch reduction and URL Rewriting components —to the bandwidth savings made available through middlebox ap-proaches as well as those offered by Google Web Light. For BrowseLite,we target a visual completeness budget of at least 90% and thusassume that 50% of image data should be fetched. For middleboxes,we used the savings for the standard mode of our measurementsSection 3. For Web Light pages, we navigated all the pages fromour crawls through the Web Light system and derived savings ascompared to the original versions of the same pages. All savingsare in terms of relative page weights saved, including hot caches ofinner-pages as described in Section 3.Figure 10 shows the CDF of data-savings offered by BrowseLite,middlebox approaches that act as (a) MITM proxies, and (b) HTTPonly proxies, and Google Web Light, across the pages of our crawls.The figure shows that Web Light acts as an upper bound for po-tential savings on pages, with the median page seeing up to 90%savings. As noted in Section 2, since all content is served throughWeb Light’s servers, more resources (outside images), and even theactual style of the page, can be directly manipulated. This comesnot only at the cost of privacy concerns, but also significant impacton Web-compat; from our experiments, pages served via Web Lightachieve only 60% visual completeness to their originals, on average.Web Light also failed to optimize ≈

10% of pages, as was observedfor our user study and existing work [54].While middlebox approaches see about 4% less savings comparedto BrowseLite at the median (21.6% vs 25.4%), the upper percentilessee up to 30% more savings. While middlebox approaches are alsolimited to images, they intercept the content before reaching theclient, allowing for more optimization opportunities at the cost ofprivacy concerns similar to those of Web Light. Further, if we lookat only HTTP images from our dataset (we attempted HTTPS con-nections for all pages), the median savings of middlebox approachesdrops to 0%, and only 20% at the 90th percentile, suggesting ≈ This section discusses some subtle privacy concerns of BrowseLiteas well as complexities of an in-browser implementation, along withsome future work based on results from our user study.

Privacy considerations . While BrowseLite is designed withprivacy in mind, one subtle privacy concern lies in the caching ofrange requests. The current version of Chromium modifies rangerequests based on information from its cache in order to only fetchthe next required portion of the range (see Section 5). Since re-sources in the HTTP cache can be hit across domains, this impliesthat a range request initiated on one domain can be resumed onanother, thus leaking information on what other sites have beenpreviously visited. This attack, known as a cross site leak or XS-Leak [52], is notspecific to range requests. Many browsers have begun discussingthe implementation of (or have already implemented in the caseof Safari [26, 30]) dual-key caching. This policy prevents access tocross-origin resources from the HTTP cache, with main intent ofstopping XS-Leaks. As this feature will also prevent such XS-leakswith range requests, we expect the image fetch reduction featureof BrowseLite to remain available and safe for the future.While BrowseLite is implemented entirely client side for pri-vacy, techniques for savings, such as image fetch reduction andmore curated rules for URL Rewriting, could be implemented pri-vately at a CDN. However, the fact that 80% of landing pages and60% of internal pages are not leveraging CDNs implies a client-sideintervention is currently of import [15].

Browser implementation . One concern for the adoption of BrowseLiteis the performance impact of >500ms for 20% of pages (Figure 9(a)).While the current version of BrowseLite is implemented as a pup-peteer application, a native implementation in the browser has thepotential to eliminate overhead caused by image processing (forreflections) and DOM searches, which combined account for up to50% of overhead (see Figure 9(b)). A native implementation can as-sociate DOM elements directly with network requests, eliminatingthe need for a DOM search after the initial range request. Further,native use of image libraries (e.g., libpng, libwebp) bundled withthe browser will allow for faster reflections, compared with ourcurrent use of the high level canvas APIs and conversion of the im-ages to dataURIs. If BrowseLite were to see adoption, we believethese to be the next steps in its performance improvement. We donot perceive a way to avoid the extra range request to discoverimage metadata, which is a required step for BrowseLite to workproperly.

User Studies and Quality of Experience . From our user studyanalyses, we made the observation that reflections were poorlyrated when image fetch reduction resulted in images with dis-torted faces or text. In the future, we wish to test these hypotheseswith additional user studies. However, if true, what can be done toalleviate this impact on the user experience remains in question.One possibility we deem worth exploring is to use facial or textualdetection models (e.g., via CNNs [38]) on the partially downloadedimage to identify presence of such features. Upon successful de-tection, an extra range request can be issued to better completesuch images. Another approach is experimentation with contextencoders [47], which can potentially complete our partially ren-dered images with no additional data cost. Either way, trade-offs interms of the load times and bandwidth implications these proposedtechniques may have on page loads will need to be explored.

Given the increased complexity and size of webpages, there hasbeen an enormous effort on the part of the web community to reducedata costs that the modern web now has on end users. However,existing approaches all trade off user privacy or web compatibilityin exchange for such data savings. This paper presents BrowseLite,a private data saving solution that focuses on optimization of im-ages during browsing. BrowseLite reduces the data strain imagesplace on the network by reducing their data requirements, through

WW ’21, April 19–23, 2021, Ljubljana, Slovenia Kelton et al. auto-configuring image services and by replacing standard HTTPrequests for images with range requests. As shown through ourexperimentation, BrowseLite is able to achieve 25% data savingson the median webpage, with only a minor overhead on the pageload time. Further, BrowseLite is able to reduce data requirements,while keeping the median webpage usable , as reported by real users.In future work, we plan to look at a tighter implementation ofBrowseLite in modern browsers and to explore the effects of amore advanced image processing pipeline to further reduce thepotentially negative impacts on the end-user experience.

REFERENCES [1] [n.d.]. https://majestic . . alexa . com/topsites.[3] [n.d.]. https://developers . google . . fastly . . akamai . . cloudflare . com//.[7] [n.d.]. https://microworkers . com/.[8] [n.d.]. Puppeteer. https://github.com/puppeteer/puppeteer.[9] 2020. Chrome LiteMode. tinyurl . com/1h8p6ykr.[10] addy osmani. 2020. Optimize Cumulative Layout Shift. tinyurl . com/2hcsh4zu.[11] Victor Agababov, Michael Buettner, Victor Chudnovsky, Mark Cogan, Ben Green-stein, Shane McDaniel, Michael Piatek, Colin Scott, Matt Welsh, and Bolian Yin.2015. Flywheel: Google’s data compression proxy for the MobileWeb. Proceedingsof the 12th USENIX Symposium on Networked Systems Design and Implementation,NSDI 2015 (2015).[12] Ankur P Agarwal. 2015. Google Web Lite - Move Fast, Break Things?https://pricebaba.com/blog/google-weblight-move-fast-break-things.[13] Jyrki Alakuijala and Vincent Rabaud. 2017. Lossless and TransparencyEncoding in WebP. https://developers . google . com/speed/webp/docs/webp_lossless_alpha_study.[14] Waqar Aqeel, Balakrishnan Chandrasekaran, Anja Feldmann, and Bruce M.Maggs. 2020. On Landing and Internal Web Pages: The Strange Case of Jekylland Hyde in Web Performance Measurement. In Proceedings of the ACM InternetMeasurement Conference (IMC ’20) . com/59d5gy7b.[18] Babak Amin Azad, Pierre Laperdrix, and Nick Nikiforakis. 2019. Less is More:Quantifying the Security Benefits of Debloating Web Applications. In . USENIX Association, Santa Clara, CA.[19] Michael Butkiewicz, Daimeng Wang, Zhe Wu, Harsha V. Madhyastha, and VyasSekar. 2015. Klotski: Reprioritizing Web Content to Improve User Experience onMobile Devices. In NSDI15 . USENIX Association, Oakland, CA.[20] Paul Calvano. 2019. Web Almanac by HTTP Archive: Part IV Chapter 16 Caching.https://almanac.httparchive.org/en/2019/caching.[21] Andrea Cardaci. 2020. Chrome-Remote-Interface. https://github . com/cyrus-and/chrome-remote-interface.[22] Moumena Chaqfeh, Yasir Zaki, Jacinta Hu, and Lakshmi Subramanian. 2020.JSCleaner: De-Cluttering Mobile Webpages Through JavaScript Cleanup. (2020).[23] Google Chrome. [n.d.]. Web Light: Faster and lighter mobile pages from search.https://support.google.com/webmasters/answer/6211428.[24] Chris Coyer. 2020. Data URIs. https://css-tricks . com/data-uris/.[25] Chris Coyier. 2017. CSS Sprites: What They Are, Why They’re Cool, and HowTo Use Them. https://css-tricks . com/css-sprites/.[26] Andy Davies. 2018. Safari Caching and Third-Party Resources.https://bit.ly/3cLJW22/.[27] Christoph Erdmann. 2019. Faster Image Loading with Embedded Image Previews.https://bit . ly/3kfeNoz.[28] Adrienne Porter Felt, Richard Barnes, April King, Chris Palmer, Chris Bentzel,and Parisa Tabriz. 2017. Measuring HTTPS Adoption on the Web. In USENIXSecurity 17 . USENIX Association.[29] Utkarsh Goel and Moritz Steiner. 2020. System to Identify and Elide SuperfluousJavaScript Code for Faster Webpage Loads. (mar 2020).[30] Google Groups. 2019. Intent to Implement Double-Keyed HTTP Cache. https://rb . gy/5ngyus.[31] Simon Hearne. 2020. Who Opts-in to Save-Data? tinyurl.com/10bxrn7d.[32] Katie Hempenius. 2019. CDNs for Optimizing Images. tinyurl . com/4n9g6qhp.[33] Tim Kadlec. 2019. Making Sense of Chrome Lite Pages. tinyurl.com/26hzk53b. [34] Tim Kadlec. 2020. What Does My Site Cost. https://whatdoesmysitecost.com/.[35] Edward Kandrot. 2015. The technology behind preview photos. https://bit . ly/37lG9FV.[36] Conor Kelton, Jihoon Ryoo, Aruna Balasubramanian, and Samir R. Das. 2017. Im-proving User Perceived Page Load Times Using Gaze. In . USENIX Association,Boston, MA.[37] B. Kondracki, A. Aliyeva, M. Egele, J. Polakis, and N. Nikiforakis. 2020. MeddlingMiddlemen: Empirical Analysis of the Risks of Data-Saving Mobile Browsers. In . IEEE Computer Society, LosAlamitos, CA, USA.[38] Haoxiang Li, Zhe Lin, Xiaohui Shen, Jonathan Brandt, and Gang Hua. 2015. Aconvolutional neural network cascade for face detection. In Proceedings of theIEEE conference on computer vision and pattern recognition .[39] Zhenhua Li, Weiwei Wang, Tianyin Xu, Xin Zhong, Xiang-Yang Li, Yunhao Liu,Christo Wilson, and Ben Y. Zhao. 2016. Exploring Cross-Application CellularTraffic Optimization with Baidu TrafficGuard. In . USENIX Association,Santa Clara, CA.[40] Scott Little. 2016. Image Replacement in Blink. https://rb.gy/lj4cdm.[41] Robert Nyman Mike Taylor. 2014. Intoducing webcompat.com.https://hacks.mozilla.org/2014/06/introducing-webcompat-com/.[42] Ravi Netravali, Ameesh Goyal, James Mickens, and Hari Balakrishnan. [n.d.].Polaris: Faster Page Loads Using Fine-grained Dependency Tracking. In

NSDI16 .[43] Ravi Netravali and James Mickens. 2018. Remote-control caching: Proxy-based urlrewriting to decrease mobile browsing bandwidth. In

HotMobile 2018 - Proceedingsof the 19th International Workshop on Mobile Computing Systems and Applications ,Vol. 2018-February. Association for Computing Machinery, Inc.[44] Ravi Netravali and James Mickens. 2019. Prophecy: Accelerating mobile pageloads using final-state write logs.

Proceedings of the 15th USENIX Symposium onNetworked Systems Design and Implementation, NSDI 2018 (2019).[45] Ravi Netravali, Vikram Nathan, James Mickens, and Hari Balakrishnan. 2018. Ves-per: Measuring time-to-interactivity for web pages. In { USENIX } Symposiumon Networked Systems Design and Implementation ( { NSDI } Proceedings ofthe IEEE conference on computer vision and pattern recognition .[48] Jose M. Perez. 2015. Medium progressive image loading. tinyurl . com/1l1ht86v.[49] Vaspol Ruamviboonsuk, Ravi Netravali, Muhammed Uluyol, and Harsha V. Mad-hyastha. 2017. Vroom: Accelerating the Mobile Web with Server-Aided Depen-dency Resolution. In Proceedings of the Conference of the ACM Special InterestGroup on Data Communication (Los Angeles, CA, USA) (SIGCOMM ’17) . Associa-tion for Computing Machinery, New York, NY, USA.[50] Quirin Scheitle, Oliver Hohlfeld, Julien Gamba, Jonas Jelten, Torsten Zimmer-mann, Stephen D. Strowes, and Narseo Vallina-Rodriguez. 2018. A Long Wayto the Top: Significance, Structure, and Stability of Internet Top Lists.

CoRR abs/1805.11506 (2018).[51] Shailendra Singh, Harsha V. Madhyastha, Srikanth V. Krishnamurthy, and RameshGovindan. 2015. Flexiweb: Network-aware compaction for accelerating mobileweb transfers. In

Proceedings of the Annual International Conference on MobileComputing and Networking, MOBICOM , Vol. 2015-September. Association forComputing Machinery.[52] Avinash Sudhodanan, Soheil Khodayari, and Juan Caballero. 2019. Cross-OriginState Inference (COSI) Attacks: Leaking Web Site States through XS-Leaks. arXivpreprint arXiv:1908.02204 (2019).[53] Erik Sy, Christian Burkert, Hannes Federrath, and Mathias Fischer. 2018. TrackingUsers across the Web via TLS Session Resumption.

CoRR abs/1810.07304 (2018).[54] Ammar Tahir, Muhammad Tahir Munir, Shaiq Munir Malik, Zafar Ayyub Qazi,and Ihsan Ayyub Qazi. 2020. Deconstructing Google’s Web Light Service. (2020).[55] Jovi Umawing. 2017. Google Reminds Website Owners to Switch to HTTPSBefore October Deadline. https://rb.gy/saxnj5.[56] Tobias Urban, Martin Degeling, Thorsten Holz, and Norbert Pohlmann. 2020.Beyond the front page: Measuring third party dynamics in the field. In

Proceedingsof The Web Conference 2020 .[57] Xiao Sophia Wang, Aruna Balasubramanian, Arvind Krishnamurthy, and DavidWetherall. 2013. Demystifying Page Load Performance with WProf. In .USENIX Association, Lombard, IL.[58] Xiao Sophia Wang, Arvind Krishnamurthy, and David Wetherall. 2016. Speedingup Web Page Loads with Shandian. In

NSDI16 . Santa Clara, CA.[59] Zhen Wang, Felix Xiaozhu Lin, Lin Zhong, and Mansoor Chishtie. 2012. HowFar Can Client-Only Solutions Go for Mobile Browser Speed?. In

Proceedings ofthe 21st International Conference on World Wide Web (Lyon, France) (WWW ’12) .Association for Computing Machinery, New York, NY, USA.[60] Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. 2004. Image qualityassessment: from error visibility to structural similarity.