[PDF] Designing for Tussle in (Encrypted) DNS

Abstract

Recent concerns over the privacy implications of the Domain Name System (DNS) have led to encrypting DNS queries and responses through protocols like DNS-over-HTTPS (DoH) and DNS-over-TLS (DoT). While the trend towards encryption is a positive development, the resulting centralization of the DNS has fomented tussles involving ISPs, browser and device vendors, content delivery networks, and users. Current deployment trends, should they continue, result in dynamics that will increase barriers to entry to competition and threaten consumer protection. This development makes it necessary for us to re-think name resolution to allow tussles to play out within the context of the design of the Internet architecture. This paper articulates several current DNS tussles and offers principles to guide system design and implementation such that all stakeholders in the space could participate. We then explore how a refactored client DNS mechanism can open up new possibilities for de-centralized name resolution, preserving the benefits of encrypted DNS while satisfying other architectural desiderata, including performance, resilience, and privacy.

Full PDF

DD-DNS: Towards Re-Decentralizing the DNS

Austin Hounsel [email protected] University

Kevin Borgolte [email protected] University

Paul Schmitt [email protected] University

Nick Feamster [email protected] of Chicago

Abstract

Nearly all Internet services rely on the Domain NameSystem (DNS) to resolve human-readable names to IP ad-dresses. However, the content of DNS queries and responsescan reveal private information, from the websites that a uservisits to the types of devices on a network. Industry andresearchers have responded in recent years to the inherentprivacy risks of DNS information, focusing on tunnelingDNS traffic over encrypted transport and application proto-cols. One such mechanism, DNS-over-HTTPS (DoH) placesDNS functionality directly in the web browser itself to directDNS queries to a trusted recursive resolver (resolver) overencrypted HTTPS connections. The DoH architecture solvesprivacy risks ( e.g. , eavesdropping) but introduces new con-cerns, including those associated with the centralization ofDNS queries to the operator of a single recursive resolver thatis selected by the browser vendor. It also introduces potentialperformance problems: if a client’s resolver is not proximalto the content delivery network that ultimately serves thecontent, the CDN may fail to optimally localize the client. Inthis paper, we revisit the trend towards centralized DNS andexplore re-decentralizing the critical Internet protocol, suchthat clients might leverage multiple

DNS resolvers when re-solving domain names and retrieving content. We proposeand evaluate several candidate decentralized architectures,laying the groundwork for future research to explore decen-tralized, encrypted DNS architectures that strike a balancebetween privacy and performance.

The Domain Name System (DNS) serves the essential func-tion of mapping human-readable names to IP addresses andis central to the operation of most Internet services. DNS isnearly 40 years old, and until recently DNS queries and re-sponses have not been encrypted. Recent concerns over userprivacy, however, have focused attention on the DNS and thevarious privacy risks associated with being able to observeDNS queries and responses. For example, DNS queries canreveal the websites (and webpages) that a user is visiting, the connected devices they own, and even how they inter-act with those devices in the physical world. Because DNSqueries and responses have generally been transmitted inthe clear, any entity who can observe the DNS could alsogain insight into a wealth of private information about anindividual.In light of these concerns, recent proposals to encryptDNS queries and responses have emerged, including trans-mitting DNS queries and responses over Transport LayerSecurity (DNS-over-TLS, or DoT) and Secure HTTP (DNS-over-HTTPS, or DoH). One approach to deployment hasbeen to configure a client device with a single , recursiveresolver that is responsible for terminating the encryptedcommunications channel and resolving all of the client’sDNS queries. In the case of Mozilla’s Firefox browser, this re-solver is called a trusted recursive resolver (TRR). In the caseof DoT, the client software might be the operating system( e.g. , the Android OS has a “private DNS” option that routesall DNS traffic to Google’s DoT resolver) or a browser ( e.g. ,the Firefox browser has an option to enable DoH that resultsin all DNS traffic being exchanged with Cloudflare). These ar-chitectures cause all of a client’s DNS traffic to be exchangedwith a single entity, even as that client changes networksand physical locations, potentially introducing new privacyand reliability concerns.Yet, encrypting DNS should not require centralizing it. Infact, as operating systems and browsers move to encryptDNS, we are witnessing a proliferation of resolvers, run by avariety of independent entities. Distributing encrypted DNSqueries across these resolvers could preserve the confidential-ity benefits of encrypting DNS traffic without introducingnew reliability and privacy concerns that would arise fromcentralizing it. Distributed resolvers, operated by indepen-dent organizations, can allow clients to achieve the privacybenefits of encrypting the DNS while also avoiding the risksof centralizing it.Distributing DNS queries across multiple resolvers hasthe potential to avoid privacy and reliability risks associ-ated with centralization, the central question is how to do so .In particular, relying on multiple resolvers to resolve DNSqueries requires a strategy for directing each DNS query to a r X i v : . [ c s . N I] F e b n appropriate resolver; different choices have implicationsfor both privacy and performance. This paper does not aim toprovide the final word on the “best” strategy for distributingDNS queries across resolvers; rather, we aim to show thatdoing so is possible architecturally, to show that there are many feasible strategies for doing so that involve tradeoffsbetween privacy and performance, and to take a first steptowards quantifying the performance tradeoffs of differentresolver selection strategies.We evaluate several different resolver selection strategies:(1) randomly distributing queries for DNS domain namesacross a set of proximal resolvers; (2) sending DNS queriesto the resolver associated with the content delivery networkthat is ultimately responsible for serving the associated webobjects. Both of these strategies offer different privacy bene-fits: random assignment ensures that no single entity amassesextensive information about a client’s DNS query patterns,and assignment based on CDN affinity ensures that a CDNwho operates a resolver learns no additional informationabout the domains that a client is visiting (since it mustserve the associated objects, in any case).An open question, however, is the effects of these strate-gies on performance, particularly on page load time. Of par-ticular concern is the effects of these strategies on the abilityof a CDN to localize clients: specifically, sending DNS queriesthrough a resolver that is not proximal to the client may af-fect client localization and thus the ability of a CDN to mapthe client to a nearby CDN cache node, thus inhibiting pageloads. On the other hand, a CDN that both resolves DNSnames and serves the associated objects may be able to bet-ter localize clients for those objects, and might even be ableto proactively resolve DNS queries associated with futureobject requests. Our goal in this paper is to explore the per-formance implications of these two different architectures.In both cases, we compare web page load times against thebaseline architectures of relying on a single trusted resolver,and against default DNS lookups.To measure these effects, we instrumented Mozilla Firefoxto resolve DNS queries using each of the above strategiesand measured the corresponding page load times for eachresolver selection strategy. Randomly selecting a proximalresolver for each domain name to resolve results in a medianperformance loss of tens of milliseconds versus simply usingthe default local resolver, suggesting a marginal performanceloss for corresponding potential privacy benefits. Alterna-tively, sending DNS queries to a resolver that is co-locatedwith the CDN that hosts the corresponding objects yieldsa median improvement in page load time of more than 100milliseconds for both Google and Cloudflare. In this paper,we focus on the performance effects of different selectionstrategies, using unencrypted DNS as the baseline for com-parison. This paper explores the architectural feasibility and per-formance implications of different strategies for distributingDNS queries across resolvers. It is, however, far from the lastword on architectures and strategies for distributing DNSqueries. First, although we evaluate two possible strategiesfor distributing DNS, many others are possible. Second, tofacilitate browser-based measurements, we focus on the per-formance effects of distributing unencrypted DNS queriesacross resolvers; future work could extend this study to per-form similar measurements with DoT and DoH as browserand device support for these protocols become more preva-lent. To encourage the reproducibility of our study and fa-cilitate these extensions to our work, we have released ourcomplete test harness and measurements to the community. In this section, we provide background on recent develop-ments in DNS encryption—specifically DNS-over-TLS andDNS-over-HTTPS—as well as related previous work explor-ing the relationships between DNS and performance.

The Domain Name System (DNS) resolves human-readable domain names to IP addresses [15]. When a clientapplication needs to resolve a domain name, its stub resolvertypically issues a recursive DNS query to a local recursiveresolver. Often, a client’s local recursive resolver is automat-ically configured at the same time as when it receives itsIP address on the local network ( e.g. , using a configurationprotocol such as the Dynamic Host Configuration Protocol,or DHCP). The local recursive resolver will either have theanswer for the query cached ( i.e. , from a previous query), orit will perform a sequence of “iterative” queries to authori-tative name servers to resolve the domain name, cache theresulting response, and return the response to the client.DNS queries and responses have historically been unen-crypted, which has garnered increasing concern in recentyears, given various demonstrations that DNS traffic can beused to discover private information about users, rangingfrom the websites and webpages that they visit to the “smart”devices that they use (and how they operate them). Signif-icant concern has been raised, for example, by the FederalCommunications Commission (FCC) over an Internet serviceprovider’s ability to observe their subscribers DNS traffic. Amore common threat scenario, however, may be that of auser who associates to a wireless network for convenience( e.g. , in a coffee shop, airport, or any public space) and sub-sequently sends DNS queries to an associated DNS resolver,in cleartext [3].Increasing concern over these scenarios has led to vari-ous developments to encrypt DNS queries and responses.wo such developments are DNS-over-TLS (DoT) [10] andDNS-over-HTTPS (DoH) [8]. Many DNS services, includingGoogle, Cloudflare, Quad9, and others now provide servicesfor both DoT and DoH. The challenge, naturally, concernsconfiguring clients to use these protocols. Recent propos-als from Mozilla and Google involve sending DoH queriesdirectly from the browser to a “trusted” recursive resolver (re-solver) as configured in the browser (perhaps even by de-fault, although as of this writing the default settings havenot yet changed). Similarly, the Android OS makes it possi-ble to route all DNS queries via DoT to a Google-operatedresolver [13].To date, however, clients that are configured to use DoTor DoH operate using centralized architectures, whereby theclient sends all DoT or DoH queries to a single recursiveresolver. Such an architecture solves one privacy problembut creates a new one—that of a single entity who now seesall DNS queries for a user, from all devices, for all networksand locations. In this sense, existing DoT and DoH architec-tures have not solved any privacy problem—they have simplymoved the problem elsewhere, from one Internet entity (theISP) to another (a content provider, ad network, etc.).Furthermore, existing DoH and DoT architectures haveintroduced new performance concerns, since many CDNsmap clients to nearby web caches based on the location ofthe client’s recursive resolver. Of course, if the CDN operatorand the resolver operator are the same party (as may be thecase for certain objects, in the case of Cloudflare and Google),this concern is somewhat mitigated, because the DNS opera-tor sees the client’s IP address in any case. In cases wherethe CDN does not operate the resolver, however, there aresome concerns that performance may suffer, if the CDN mis-takenly maps a client to a cache based on the location of theresolver that is not located within client’s ISP. These relativeperformance effects have not been studied extensively in thecontext of various DoH and DoT architectures, which is oneof the aims of our study.

In this section, we compare to related work on architec-tural changes to DNS, and we discuss related work on howDNS affects web performance.

DNS Privacy

Schmitt et al. proposed Oblivious DNS, a newarchitecture that prevents recursive resolvers from associat-ing queries with the clients that issue them [22]. ODNS doesnot require changes to recursive resolvers or authoritativeservers, but the original design proposal does not involvedistributed recursive resolvers. Pauly et al. proposed Adap-tive DNS as a method to enable clients to send encryptedDNS queries to cloud-based resolvers and send queries forprivate domain names to local resolvers [12]. In Adaptive DNS, queries are selectively distributed to multiple resolvers;there have been no studies of the performance of AdaptiveDNS.

DNS and Web Performance

Sundaresan et al. [24] mea-sured and identified performance bottlenecks for web pageload time in broadband access networks and found that pageload times are influenced by slow DNS response times andcan be improved by prefetching. An important distinction isthat they define the DNS response time only as the responsetime for the first domain, while we consider the set of uniquefully qualified domain names of all resources contained ina page. They investigate only nine high-profile websites,which stands in contrast to the 2,000 popular and normalwebsites that we analyze, and they estimate page load timesthrough Mirage and validate their findings through a head-less browser PhantomJS, while we utilize Mozilla Firefox,which is a full browser.Wang et al. [26] introduced WProf, a profiling tool toanalyze page load performance. They identified that DNSqueries—in particular uncached, cold queries—can signifi-cantly affect web performance, accounting for up to 13% ofthe critical path delay for page load times. In 2012, Otto etal. [18] found that CDN performance was negatively affectedwhen clients choose recursive resolvers that were geograph-ically separated from CDN caches. We believe that this wasdue to the fact that resolvers did not support ECS at the time(ECS was only introduced in January 2011, and standardizedin May 2016) and CDNs only slowly started adopting any-cast. Therefore, clients were likely redirected to sub-optimaldata center based on the resolver’s address, instead of theclient’s address. We suspect that with the wide-spread adop-tion of ECS and anycast since 2012, CDN performance maynot be as negatively affected by choosing a resolver that isgeographically far away from a CDN.Otto et al. introduced namehelp , a DNS proxy running ona client’s machine that improves web performance [19]. Itis designed to help CDNs more accurately map clients toCDN-hosted content. First, the client queries their config-ured recursive resolver for a domain name. When the clientreceives a DNS response from a recursive resolver, the proxychecks if there is a CNAME indicating that the query wasre-directed to a CDN. If so, the proxy queries the recursiveresolver to look up the authoritative server for the CNAME.The proxy finished by directly querying the authoritativeserver for the CNAME, which is operated by the CDN. Thisenables the CDN to directly use the client’s IP address tomap the client to nearby servers, rather than relying on therecursive resolver, which may improve page load times. How-ever, if the CNAME’s authoritative server is not already theclient’s cache, then namehelp induces additional delay in DNSresolution times.

D-DNS: Re-Decentralizing DNS

In this section, we present various approaches for dis-tributing DNS queries from clients across sets of recursiveresolvers. We explore two different approaches: (1) the clientsends queries for each DNS domain name to a random re-solver (Section 3.1); and (2) the client sends DNS queriesto the resolver for the primary content delivery networkcorresponding to the website and associated objects for thatsite, all other DNS queries to the client’s default local re-solver (Section 3.2). In this section, we focus on describingthe different strategies for decentralizing DNS queries acrossmultiple resolvers; Section 4.1 describes in more detail howwe implemented these strategies.

A simple approach to distributing DNS queries acrossresolvers is to send each DNS query to a random resolver. Byrandomly distributing n queries to K multiple resolvers, eachresolver receives n / K of the queries on average, in theorylimiting the amount of information that any single entitylearns from a client’s DNS queries. This approach is also easyto implement and can be deployed at a web browser, a stubresolver, or a DNS forwarder.The random distribution we implement assigns each do-main name to a resolver that is randomly selected from apre-defined list. Queries for each domain are thus routed toa different randomly selected resolver. dnsmasq in principleallows all queries from a given 2LD to be routed to the sameresolver, which could potentially result in fewer cache missesfor authoritative DNS servers; on the other hand, differentsubdomains could be managed by different entities, and forsimplicity of implementation we simply assigned each do-main name from a complete list of objects corresponding toa corpus of websites for this study ( i.e. , the top 100 websitesfrom the Tranco top list, as described in Section 4.1) to adifferent resolver.There are other ways to distribute DNS queries randomlyacross a resolver that could be explored further. For example,another approach would be to map all subdomains of a 2LDto the same resolver, rather than distributing all subdomainsrandomly across resolvers [2]. Hoang et al. evaluate DNSperformance using such an approach as well as how manyunique domain names get mapped to each resolver withvarious sets of websites that a user may visit [7]. Much web content is hosted by content delivery networks(CDNs), which use the DNS to assign clients to nearby repli-cas of the web content. CDNs map queries for content tonearby cache nodes that host the content, relying on thelocation of the client’s recursive resolver to map the client to

Figure 1:

CDN-Based Distribution. a nearby cache. A prominent concern with various propos-als for DoH and DoT is that the resolver that a client usesmay not be close to the client, thus causing the client to bemapped to a copy of the content that is far from the client it-self. Increasingly, however, some CDNs, including Cloudflareand Google, are hosting their own resolvers, which mitigatesthis problem, since they could then see the queries and thecorresponding IP address of the client—thus allowing theCDN to map the client’s location directly.CDNs are increasingly hosting resolvers, with the assumedapproach being that a client would send all of its DNS queriesto that single entity, regardless of whether the CDN hostedthe corresponding content. Such an approach would cer-tainly facilitate encrypted DNS transport ( e.g. , via DoH), butit would also result in potential privacy leaks: the CDN oper-ator would come to learn of DNS lookups for objects, devices,and other activities that are not associated with the contentthat it is serving to the client. An alternative approach, whichwe call the

Single CDN distribution, is to direct only the DNSqueries that are associated with the CDN-hosted content tothe CDN’s resolver; the client sends all other DNS queriesto the client’s local recursive resolver. Figure 1 shows thestep-by-step operation of this approach. Our hypothesis isthat such a approach could improve performance over both abaseline approach of sending all queries to the local resolver,and the random approach (Section 3.1), since the CDN thathosts the object would learn the location of the client fromthe queries it sends directly to the resolver. On the otherhand, the approach has better privacy properties that simplydirecting all queries to a single CDN’s resolver, since theCDN does not come to learn about DNS queries for whichit is already serving content. In this sense, this approach re-sults in no additional information leakage about the client’sbrowsing behavior to the CDN, since any information theDN operator learns about the client’s behavior from theDNS queries is already would already learn by serving thecorresponding objects. On the other hand, DNS queries thatthe client would have otherwise sent to an ISP resolver andencrypted and sent to the CDN, thus resulting in an overallreduction in information that is leaked via DNS queries.The Single CDN approach is more complicated to imple-ment since it requires the client to distribute queries to de-termine which queries to resolve at the resolver correspond-ing to the CDN-hosted objects. This approach essentiallyrequires each domain name to be resolved in advance todetermine which names resolve to objects that are hosted onthe CDN. To do so, we sent queries for all names to a recur-sive resolver and subsequently performed a WHOIS/RDAPlookup on the addresses that are returned to determine theorganization that owns each corresponding IP address. (Thisinitial mapping need only occur once; subsequent lookupscan be sent directly to the CDN’s resolver, as appropriate.)Although our approach to determining the mapping be-tween domain names and CDNs is cumbersome, in-progressInternet drafts, including one authored by Cloudflare, aimto streamline this process in the future. In particular, thestandard would allow a CDN to specify to clients which re-solvers should be used for each object on a page [21], thusallowing clients to know in advance which domain nameshave content that is hosted by a CDN; this information couldthen be incorporated directly into a DNS forwarder suchas dnsmasq . The draft has not yet been adopted by an IETFworking group, however, and its status is thus somewhatunclear, especially in light of various security and privacyrisks associated with the approach. It nonetheless offers aglimpse of how future Internet architectures might enablethe approach we are proposing.

In this section, we describe our results for page load timesand query response times, and we compare the network dis-tance to CDN-hosted content when the CDN-based approachand the local resolver were used. We performed our measure-ments between February 4th, 2020 and February 7th, 2020.Page loads and DNS measurements were performed back toback, and we did not introduce delay between successivemeasurements.

Following, we describe the methodology underlying ourexperiments to measure how D-DNS approaches affect webperformance. We first define the performance metrics andexplain how we measure them, and we then describe ourexperiment setup and the limitations of our measurements.

Recursor Min. Avg. Max. Stdev.

Local Resolver 0.30 0.46 1.68 0.08Quad9 2.24 2.99 5.80 0.21Google 2.56 3.00 8.00 0.20OpenDNS 2.44 3.16 7.77 0.22Cloudflare 4.23 4.93 12.35 0.21

Table 1:

Latencies (in milliseconds) for each recursive resolver, sortedby increasing average latency. We performed approximately 15,650measurements to each resolver.

To understand how different distribution approaches affectweb performance, we measure page load times, DNS resolu-tion times, and object download times. We also analyze theeffectiveness of the CDN-based approach by measuring thenetwork distance to servers that host CDN content.

Page Load Time

We measure page load times of a websitethrough a full browser, Mozilla Firefox 67.0.4, which we usein headless mode controlled by Selenium to record HTTPArchive objects (HARs) [25]. In particular, the HAR containsthe onLoad timing, which measures the elapsed time betweenwhen a page load began to when the load event was fired.The load event is fired when a web page and all of its re-sources have completely loaded. It is specified in the HTMLLiving Standard and it is available in all major browser ven-dors [17]. In fact, it has also been used to measure pageload times in previous research on web performance [4, 6].A related event is

DOMContentLoaded , which is fired when aweb page’s markup (HTML) has been completely loadedand parsed by the browser. Unlike the load event, the

DOMContentLoaded event does not include the time for down-loading and rendering each object on a web page. However,including this time is necessary to truly understand howselection of recursive resolvers affects page load times [16].An alternative metric to page load time would be above-the-fold rendering time (AFT), which measures the time ittakes to download and render content that is initially view-able. The motivation for measuring AFT is that users mayperceive a page load to have finished before all the objectshave actually been rendered. However, accurately measuringAFT is challenging: We would need to record the start timeand end time of rendering within the browser’s dimensionsfor each page load [23], which would require invasive modi-fications to the browser, which themselves could negativelyaffect true rendering times, or we would need to visuallyrecord the full rendering process and analyzing the recordedvideo, which is prohibitively expensive. Moreover, althoughthe AFT might indicate that a browser has finished loadingthe page, a user may not be able to interact with it yet, anday not actually perceive the websites as having loaded.Correspondingly, we rely on page load time (time of load event) for our measurements.

DNS Query Response Time

We measure unencryptedDNS-over-UDP (which we define as

Do53 ) query responsetimes by using a custom tool developed by Hounsel et al. [9].It uses the getdns C library to issue Do53 queries after eachpage load has completed. For each unique domain name con-tained in a HAR that we recorded, we query the resolvers ofthe approach with which we performed the page load andwe record the query response times for all domains.The HARs we collect also contain response times for eachDNS query that was issued during the page load. However,as Hounsel et al. discovered, the HAR response times may beinaccurate depending on how the page load was performed.For example, the initial query in a HAR can have a responsetime of 0 ms, even in cases where this is impossible becausethe session started with an empty cache and the latency tothe recursive resolver is larger. This can be the case because,depending on how a website issues HTTP redirects, thefirst query in the HAR is not actually the first query thatthe browser performed. Instead, the browser might haveperformed a variety of other HTTP requests and DNS queriesbefore, which may still be in-progress or already cached.Therefore, to ensure that the response times are accurate,we perform DNS queries after the page load has completedthrough the custom tool.

Network Distance

We hypothesize that the CDN approachwill improve performance for CDN-hosted content becausethe CDN’s resolver will be able to better locate clients toedge servers that host the content. For example, it couldeasily take into consideration caching at the edge, networkcapacity, or compute capacity. Therefore, in addition to pageload times and DNS response times, we also evaluate theapproaches by measuring how close CDN-hosted contentis to the client when the CDN approach is used and whenthe local resolver is used. Specifically, we measure latency tothe IP addresses of CDN-hosted resources returned by therecursive resolvers to represent network distance.

To understand the performance impact of D-DNS, wedesign an experiment to load web pages with various ap-proaches and recursive resolvers. In this section, we de-scribe our choices of approaches, recursive resolvers, andweb pages, and we detail the hardware and software config-uration we run our experiments on.

Recursive Resolvers

We measure two approaches of howD-DNS could be used: The random approach and the CDN-based approach (Section 3). As a baseline comparison, wealso analyze how the local recursive resolver provided by the university network from which we conduct our measure-ments performs. For the CDN-based approaches, we analyzecontent hosted by Cloudflare and Google, and we thus userecursive resolvers operated by them ( and respectively). For the random approach, we use four popu-lar recursive resolvers, namely Cloudflare ( ), Google( ), Quad9 ( ), OpenDNS ( ), andthe network’s local recursive resolver.To construct the approaches, we use HARs of websites thatwe collected for a previous study (see Section 4.1.2 for details),and we extract all unique domains from them. For the randomapproach, we randomly assign each unique domain name toone of the five recursive resolvers. For the CDN approach,we assign the domain names for CDN-hosted objects to theresolver operated by the respective CDN, and we use thelocal recursive resolver for all other objects. For the baselinecomparison, we perform all queries with the local resolver.These approaches do not depend on any particular DNSprotocol (encrypted or otherwise). As such, it is possible toconfigure these approaches with DoH, a new protocol thatencrypts DNS requests and responses [8, 10]. This may en-able clients to not only observe a performance benefit butalso improve their privacy. In fact, taking into account theconcept made popular by DoH, that is, “applications doingDNS,” D-DNS approaches could be constructed and deployedeasily to end-users, as well as regularly optimized and easilyupdated. However, DoH is not natively supported by Debianand it would require substantial engineering effort to per-form the measurement through the DoH client of MozillaFirefox itself, which is why we use Do53 to resolve names.

Web Pages

We conduct our measurements with all threeapproaches (random approach, CDN approach, and local re-solver) for two different lists of websites. First, we select thetop 100 websites in the Tranco Top 1,000 list that use themost content hosted by Cloudflare [20]. Second, we selecttop 100 websites in the Tranco Top 1,000 list that use themost content hosted by Google. This enables us to under-stand if each approach performs differently with differentsets of CDN-hosted websites. We focus on popular websitesbecause users are more likely to visit them and thus experi-ence performance benefits.We determine if a website uses CDN-hosted content byanalyzing the IP addresses of all resources contained in HARswe collected for an earlier study for the Tranco Top 1,000websites. We then perform RDAP lookups on the IP addressesand mark them as CDN-hosted if Cloudflare or Google arein the organization field [11].

We perform our experiments on a desktop-class computercomprised of an 8th Generation Intel Core i7 CPU and 32 GiBof memory, and running Debian stable (buster). The machines connected over Gigabit Ethernet to the university net-work, and it runs our measurement suite to collect page loadtimes and DNS response times. To deploy our measurementsuite, we use a Docker container that enables us to clearlocal HTTP and DNS caches between each page load eas-ily. Moreover, to enable reproducibility of our research andfurther open science, we will make our measurement suite,including the Docker container, publicly available at the timeof publication. We use dnsmasq to implement our approaches,which is a DNS forwarder that runs on our local machine.We configure dnsmasq in such a away that queries are di-rected to different recursive resolvers based on their domainname. For example, we redirect all queries to google.com toGoogle’s recursive resolver . As such, we create a dnsmasq configuration file with each approach for each listof websites.This design has some limitations that may affect the gen-eralization of our results. Our measurement’s generalizationis limited in two ways: First, we perform our measurementsexclusively on the Debian operating system, which is basedon Linux, which means that Linux’s networking stack andparameters for networking algorithms will affect our mea-surements. Networking stacks are heavily optimized though,which is why we expect our results to generalize across op-erating systems. Second, we conduct our experiments froma single computer connected to a university network, whichmeans that we cannot easily generalize our results acrossother machines or other networks, like residential ISPs. How-ever, the machine from which we perform our measurementsis representative of an end-user computer. Furthermore, uni-versity networks are typically very well connected, whichmeans that any improvements we can observe on a univer-sity network are likely going to be a lower-bound in terms ofpotential performance improvements for end-users relyingon a residential ISP for network connectivity.

As DNS queries for CDN-hosted content are issued to dif-ferent recursive resolvers, clients can get mapped to differentedge servers. Accordingly, depending on which approachesare used, clients may be able to download CDN objects morequickly. In turn, a browser can parse and render these objectsearlier, and we expect that the CDN approach will cause pageload times to be lower than with the local resolver. We alsoexpect the random approach to perform similar to the localresolver. We study page load times across our approachesfor each set of websites (Section 4.1.2).Figure 3 shows CDFs for differences in page load timesbetween approaches with the websites that use the mostCloudflare-hosted or Google-hosted resources. The verticalline on each subplot indicates the median for the CDF. Amedian that is less than 0s on the x-axis is indicated in blue hues and means that the approach on the left half of thecaption is faster than the approach specified by the right ofthe caption. Correspondingly, a median that is greater than0s on the x-axis is indicated in red hues and means that theapproach specified by the left half of the caption is slowerthan the approach specified by the right half of the caption.Finally, a median that is close to 0s (between -30ms and 30ms,that is approximately one frame when the page is renderedat 30Hz, or two frames when the page is rendered at 60Hz)indicates that the two approaches perform similarly.We find that the CDN approach outperforms the localresolver in terms of page load times for the webpages thatinclude the most Cloudflare-hosted resources. The mediandifference in page load times between the CDN approachand the local resolver is 47ms. This improvement is intuitive:The CDN provider can point us to the best place to fetch theobjects from based on edge server utilization and networkconditions, among other metrics. Interestingly, we also findthat page loads with the random approach are faster thanwith the local resolver, with a median difference of 33ms. Thismay seem counter-intuitive at first given that distributingqueries to multiple recursive resolvers could result in lesscache hits. However, the recursive resolvers we distribute ourqueries to are some of the most popular, and thus likely havehighly populated caches. Moreover, we actually always issuequeries for the same domain always to the same recursor,which means that we force less cache pressure onto eachrecursor, which can result in a higher cache hit ratio for eachrecursor, which, in turn, can lead to faster response times.Comparing the differences in page load times for the web-pages that use the most Google-hosted resources, we findagain that page load times are lower with the CDN approachthan with the local resolver, by 6ms in the median case. How-ever, unlike the top webpages that use Cloudflare resources,the random approach performs best, with a median page loadtime that is 19ms faster than the CDN-based approach. Wenote that there is more variance among all three approacheswith the websites that use the most Google resources, whichmay explain why the random approach performs the fastest.Nonetheless, we find that by distributing queries across mul-tiple resolvers, websites can load faster.

In this section we explore in further detail the effectsthat result in differences in page load time. We first explorethe effects of different D-DNS strategies on DNS lookuptimes; then, we explore how D-DNS affects the ability of anauthoritative DNS server to map a client to a nearby CDNreplica. a) CDN - Local Resolver. (b)

Random - Local Resolver. (c)

CDN - Random.

Figure 2:

Page load differences for different approaches (Cloudflare). (a)

CDN - Local Resolver. (b)

Random - Local Resolver. (c)

CDN - Random.

Diff ≥ 1s 0.1s ≤ Diff < 1s 0.03s ≤ Diff < 0.1s -0.03s < Diff < 0.03s -0.1s < Diff ≤ -0.03s -1s < Diff ≤ -0.1s Diff ≤ -1s

Figure 3:

Page load differences for different approaches (Google).

DNS lookup times are an important metric to understandhow different approaches affect page load times. Web pagestypically include many objects (e.g., images, scripts, etc.),which all must have their domain names resolved to IP ad-dresses. Previous work has shown that DNS queries cancause performance bottlenecks on website page loads [26].Thus, if certain approaches cause clients to query distantresolvers, then they may observe slower DNS lookup times.This in turns would cause slower page load times, as theclient’s browser must wait for DNS lookups to finish beforeobjects can be downloaded. We study query response timesacross our approaches with each set of websites.Figure 5 shows CDFs for differences in DNS responsetimes with the websites that use the most Cloudflare-hostedresources. As before, a median that is less than 0ms on thex-axis is indicated in blue hues and means that the approachon the left half of the caption is faster than the approach spec-ified by the right of the caption. Importantly, DNS lookuptime differences are shown in terms of milliseconds. A me-dian that is greater than 0ms on the x-axis is indicated in red hues and means that the approach specified by the lefthalf of the caption is slower than the approach specified bythe right half of the caption. Finally, a median that is closeto 0ms (between -0.3ms and 0.3ms) indicates that a queryfor a given domain name from different models performedsimilarly.As with page load times, we find that the CDN approachoutperforms the local resolver for the webpages that includethe most Cloudflare-hosted resources. However, the differ-ence is negligible, with a median improvement of 0.1ms. Wealso find that the random model performs slower than thelocal resolver, with a difference of 3ms. These results alignwith our expectation that the random approach would resultin longer DNS lookup times. By distributing queries to thefive different resolvers that we measured with, we may getless cache hits, rather than using a single resolver with awell-populated cache for all queries.With the webpages that include the most Google-hostedresources, we also find that the CDN approach outperformsthe local resolver. As with the CDN approach for Cloudflare-hosted resource, though, the difference is negligible, withan improvement of 0.24ms in the median case. The random a) CDN - Local Resolver. (b)

Random - Local Resolver. (c)

CDN - Random.

Figure 4:

DNS differences for different approaches (Cloudflare). (a)

CDN - Local Resolver. (b)

Random - Local Resolver. (c)

CDN - Random.

Diff ≥ 10ms 1ms ≤ Diff < 10ms 0.3ms ≤ Diff < 1ms -0.3ms < Diff < 0.3ms -1ms < Diff ≤ -0.3ms -10ms < Diff ≤ -1ms Diff ≤ -10ms

Figure 5:

DNS differences for different approaches (Google). resolver performs even worse than the local resolver forGoogle-hosted resources, with a difference of 5.25ms.Put together, these results suggest that different ap-proaches have significantly different effects on query re-sponse times. By using the CDN approach with a singleCDN resolver, clients may observe few differences in DNSlookup times. As more resolvers are used to distribute queries,though, lookup times tend to increase. Despite this increasein lookup times, clients may still be able to download web-pages with the random approach on par with the local re-solver, as shown by the results in Section 4.2. DNS lookuptimes times do not appear to have a significant effect on pageload times for each approach. Thus, the random approachremains a feasible option for improving privacy.

We further investigate how each approach affects web per-formance by studying the distance to the CDN edge serversthat each approach resolved and used to fetch the objects.This is particularly important to evaluating the CDN ap-proach, because, we hypothesize that by distributing queriesfor CDN-hosted content to their respective resolvers (Sec-tion 3), CDNs will be able to better map clients to nearby edge servers. The CDNs can more accurately map clients toedge servers based on knowing the locations of their anycast-based recursive resolvers and edge servers, which is other-wise difficult without features such as EDNS Client Subnet(ECS), a feature that some of the most popular recursorsdo not support anymore because of privacy concerns. Cor-respondingly, one way to understand how well CDNs aremapping clients to nearby edge servers is by measuring thelatency to the servers of CDN-hosted objects that each ap-proach used.We compare the localization of CDN-hosted content be-tween approaches by measuring the client’s latency to theservers of CDN-hosted resources via ICMP ECHO ( ping ).First, we extract the URL and IP address returned for eachCDN-hosted resource across all HARs based on our previousanalysis of what content is hosted on a CDN (Section 4.1.2).We then ping each IP address five times and take the medianto measure the latency to the server that hosts the resource.We compare our latency measurements for the resourcesthat are present on the same website across approaches.For example, if the image resource identified by the URL https://cloudflare.com/image.png was present in a HAR a) CDN vs. Local Resolver. (b)

Random vs. Local Resolver. (c)

CDN vs. Random.

Figure 6:

Comparison of latencies to the same CDN-hosted resources with different approaches (Cloudflare; random sample of 20% of points). (a)

CDN vs. Local Resolver, (b)

Random vs. Local Resolver. (c)

CDN vs. Random.

Figure 7:

Comparison of latencies to the same CDN-hosted resources with different approaches (Google, random sample of 20% of points). for the CDN approach and the local resolver, then we wouldgroup the latencies using each approach for further analysis.Figure 7 shows scatter plots comparing the latency toshared resources in HARs between approaches. The x-axismeasure latency to a given CDN-hosted resource with theapproach indicated by the x-label. The y-axis measures la-tency to the same resource with the approach indicated bythe y-label. Put together, these measurements constitute adata point on the scatter plot, which enables us to comparehow different approaches map our clients edge servers thatdiffer in distance from our client. The diagonal line repre-sents the scenario where each approach results in a mappingto edge servers that are equidistant from our client.We find that the CDN approach and the random approachsignificantly effect the edge servers clients get map to forCloudflare-hosted resources. Figure 6a and Figure 6b showthat, when clients use the CDN model and the random model,they often get mapped to servers that are closer to the clientthan with the local resolver. We note that the difference inlatency is typically within 5ms, but this may be significant toimprove page load times. For example, if clients get mappedto an edge server with the CDN approach that is 5ms closer than with the local resolver, then clients may be able todownload dozens of resources more quickly, which adds upover the course of a page load. Thus, even if the edge serveris only a few milliseconds closer to the client, this saving intime adds up.We find different results with the CDN approach and therandom approach for Google-hosted resources. Across anypair of approaches, we find much fewer outliers in termsof latencies to edge servers that host a given resource. In-terestingly, however, the CDN approach does tend to mapclients to edge servers that are further away than with therandom approach. This result coincides with out findingsin Section 4.2, in which page load times with the randommodel were slightly slower than with the CDN approach.Although this result is counter-intuitive at first, we believethat Google’s recursive resolvers may not be as widely dis-tributed as other anycast-based resolvers. Thus, if Googledoes not use the location of their recursive resolvers to per-form better mappings, then clients may get mapped to closeredge servers by using other widely distributed resolvers.

Re-Decentralizing DNS in Practice

We have explored how different strategies for distributingDNS queries across different resolvers affect performance,ultimately finding that various strategies for distributingDNS queries across different entities does not have detrimen-tal effects on performance. In this section, we discuss nextsteps towards a full deployment of D-DNS, including thepossible deployment avenues for incorporating distributedDNS queries with DoH, and various practical considerations,including privacy considerations, as well as the potentialeffects of D-DNS on security ( e.g. , DNS-based anomaly de-tection) and content delivery.

Ultimately, deploying D-DNS requires modifying clientsoftware to distribute queries to different recursive resolvers.This functionality can be implemented in either the applica-tion itself or in the stub resolver that is native to the client’soperating system. Alternatively, a DNS forwarder on thelocal network ( e.g. , at the client’s local resolver, such as ina home router) could forward DNS queries appropriately.Currently, some forms of centralized DNS, such as DoH, areimplemented in the browser ( e.g. , Firefox, Chrome); suchbrowser functionality could be augmented with the abilityto distribute DNS queries to different recursive resolvers ac-cording to the types of strategies we introduced in Section 3.The same type of functionality could be implemented in aDNS forwarder, such as dnsmasq , as we implemented in theexperiments in Section 4. In these cases, local resolvers couldbe augmented to support DoT or DoH functionality, withqueries forwarded to the appropriate resolver depending onthe distribution strategy.Regardless of where D-DNS is deployed, the local resolvermust know where to forward each DNS query. In the caseof CDN-based distribution, the resolver must maintain amapping between each domain name and the correspondingCDN-based recursive resolver, assuming that the correspond-ing object is hosted on a CDN. Maintaining this mapping anddistributing it to clients presents a potential challenge; in thisregard, a browser-based deployment of D-DNS may prove tobe more practical, given that browsers are already equippedto receive updates for various domain name-based lists ( e.g. ,safe browsing lists, ad blockers). In contrast, a dnsmasq -baseddeployment may prove to be more universal, but updatingdomain name mappings may be more complex and difficultto manage, particularly across heterogeneous devices andoperating systems.

The ability to distribute DNS queries across multiple recur-sive resolvers can help ensure that no single entity sees the entirety of a client’s DNS queries. The results from Section 4demonstrate that distributing DNS queries across multipleresolvers generally does not significantly harm performance,and in the case of CDN-based distribution, it can occasionallyeven improve page load time. These results demonstrate thepotential to explore a variety of strategies for distributingDNS queries across resolvers.We envision that D-DNS could incorporate a variety ofstrategies for distributing DNS queries, depending on thetradeoffs between privacy and performance, including thosedescribed by Arkko et al. [2]. Although various strategies fordistributing DNS queries have been proposed, and this paperevaluates the feasibility and performance effects of some ofthese strategies, the privacy tradeoffs of different strategieshas not yet been studied. For example, randomly distributingqueries across a collection of resolvers might prevent anyone resolver from seeing the entirety of client queries onthe one hand, but over time all resolvers might gain enoughinformation about client activity to piece together sensitiveinformation about a client. There are also tradeoffs to con-sider between the privacy benefits of distributing queriesacross different resolvers and potential effects on caching.Future work could consider how D-DNS and various strate-gies for distributing DNS queries could improve privacy bydefending against various privacy attacks, including DNS-based web fingerprinting and device identification attacks.

As DNS becomes both encrypted and decentralized, cer-tain security tasks may become more challenging. For ex-ample, many security appliances depend on the ability toobserve unencrypted DNS traffic to detect compromised ma-chines or other anomalies [1]. The inability to observe aclient’s DNS traffic at a single vantage point may make avariety of conventional network management tasks, fromdevice identification to malware detection, more challengingin the future. Additionally, other appliances, such as parentalcontrols, rely on the local recursive resolver to filter or redi-rect traffic (many ISPs implement parental controls at theirresolver, for example).The proposed use of DNS canary domains may provide aninterim short-term solution for some of these network man-agement challenges. When a client connects to a network,they could ask their local resolver for the addresses of one ormore canary domains. The local resolver can then determinewhether the client uses services that depend on the ability tosee DNS queries. If so, the local resolver can send an answersuch as NXDOMAIN as a signal, which could result in theclient disabling D-DNS. Such canary domains are already inlimited use for DoH, for example: Firefox has implemented acanary domain to disable DNS-over-HTTPS in the presenceof parental controls and malware filters [5]. Unfortunately,anary domains are fairly coarse-grained: they enable or dis-able D-DNS (or DoH) completely, without allowing a clientto enable such a service for a subset of domains. Securingthe use of canary domains is also important.The technical design of DNS resolution is, in fact, a topicwith complicated ethical issues. The introduction of central-ized DoH, for example, arose out of concerns of user privacyand ISP surveillance; on the other hand, centralized DoHintroduced new risks associated with security, privacy, andInternet censorship. For example, among the concerns withcentralized DoH is that it could become a choke point fordata collection, coercion, or censorship by oppressive gov-ernments; various IETF working groups are discussing hu-man rights considerations associated with centralized DoH.Additionally, because the design of DoH and D-DNS havecomplicated implications for privacy ( i.e. , which entities ulti-mately can see a user’s DNS queries), the settings of defaultsin applications and devices also warrant consideration. Forexample, the Android OS currently presents an option for“Private DNS” to users, which ultimately enables DoT andsends all DNS queries from a user’s device to Google. Firefoxand Chrome have also been rumored to be experimentingwith the default settings for DoH which could route a largefraction of user DNS queries to a single centralized entitywith a single software update. Moving forward, the designsof D-DNS, DoH, and related protocols, as well as their associ-ated default settings and user interfaces, entail complicatedethical considerations that likely warrant separate detailedstudies.

RFC 8484 recommends that DoH clients and resolvers useHTTP/2 as the minimum HTTP version to achieve com-parable performance with Do53 [8]; HTTP/2 may enableDoH to perform comparably with Do53 through the use ofa technology called server push, which RFC 7520 specifiesas a way for servers to send content to clients before theyrequest it [14]. In the context of DoH, recursive resolverscould predict DNS responses that clients might make basedon past DNS requests. In the case of a CDN that serves thecontent for the associated DNS names, the CDN may beable to proactively push DNS responses for DNS names thatare referenced in the web objects that it serves, ultimatelypreventing the client from having to make additional DNSqueries. For both security and privacy reasons, it may ulti-mately make sense for CDN-hosted recursive resolvers toonly be permitted for domains that are associated with thecontent that they host (as in the CDN-based distributionapproach described in Section 3).Such transport-layer optimizations can further improvethe performance of CDN-based distribution in D-DNS. Fur-thermore, if the CDN-hosted resolver only receives queries for domains associated with the content that it hosts, theclient need not compromise privacy in exchange for theseperformance improvements, since the CDN already knowsthat it is serving these objects to the client and thus no ad-ditional information is leaked by resolving (or pushing) theassociated DNS responses for the client.

New technologies such as DNS-over-HTTPS (DoH) areintroducing increasing trends towards centralization of DNSresolution. In this paper, we present D-DNS, a new clientarchitecture for interacting with multiple recursive DNSresolvers to improve web performance and privacy. We im-plement D-DNS and study performance when selecting dif-ferent resolvers based on the CDN that hosts content, as wellas simply selecting resolvers at random, comparing thoseresults to performance achieved using a local resolver. Wemeasure DNS resolution times and find that the CDN-basedapproach performs similarly to a local resolver, while therandom approach offers only slightly worse performancethan relying on a default local resolver. Interestingly, whenstudying page load times, we find that both the CDN-basedand random approaches either improve or result in little dif-ference compared with a local resolver, depending on theCDN. Ultimately, our findings show that the D-DNS modelswe present can improve performance in some cases, andgenerally do not negatively affect performance in the worstcase.Our research demonstrates that the potential privacyand robustness benefits inherent in distributing DNS trafficacross multiple resolvers can be realized without a signifi-cant performance penalty, pointing to promising avenuesfor future work in performance and privacy enhancements.To encourage others to build on and extent our results—including replicating our results, validating them in othernetwork settings, exploring different strategies for distribut-ing DNS queries across resolvers, and integrating D-DNSwith other DNS privacy extensions ( e.g. , DoH, DoT)—we willpublicly release the D-DNS source code and all experimentcode and results.

References [1] M. Antonakakis, R. Perdisci, D. Dagon, W. Lee, and N.Feamster. Building a dynamic reputation system fordns. In

USENIX Security Symposium

Proceedings of the 2011 Internet MeasurementConference (IMC) , pages 313–328, Berlin, Germany. As-sociation for Computing Machinery (ACM), Nov. 2011.url: https://web.eecs.umich.edu/~harshavm/papers/imc11.pdf (visited on 10/02/2019).[5] Firefox. Canary domain - use-application-dns.net. url:https://support.mozilla.org/en-US/kb/canary-domain-use-application-dnsnet (visited on 02/07/2020).[6] M. Dhawan, J. Samuel, R. Teixeira, C. Kreibich, M.Allman, N. Weaver, and V. Paxson. Fathom: a browser-based network measurement platform. In R. Mahajanand A. Snoeren, editors,

Proceedings of the 2012 InternetMeasurement Conference (IMC)

Proceedings of theApplied Networking Research Workshop

Proceedings of the 2012 Internet MeasurementConference (IMC) , Boston, MA, USA. Association forComputing Machinery (ACM), Nov. 2012. isbn: 978-1-4503-1705-4. doi: 10.1145/2398776.2398831.[19] J. S. Otto, M. A. Sánchez, J. P. Rula, T. Stein, and F. E.Bustamante. namehelp: Intelligent Client-side DNSResolution. In V. Padmanabhan and G. Varghese, edi-tors,

Proceedings of the 2012 ACM SIGCOMM Confer-ence (SIGCOMM) , Helsinki, Finland. Association forComputing Machinery (ACM), Aug. 2012. isbn: 978-1-4503-1419-0.[20] V. L. Pochat, T. V. Goethem, S. Tajalizadehkhoob,M. Korczyński, and W. Joosen. Tranco: A Research-Oriented Top Sites Ranking Hardened Against Ma-nipulation. url: https : / / tranco - list . eu/ (visited on02/07/2019).[21] D. Schinazi, N. Sullivan, and J. Kipp. DoH Prefer-ence Hints for HTTP. Technical report draft-schinazi-httpbis-doh-preference-hints-01, Jan. 2020. url: https:/ / datatracker . ietf . org / doc / html / draft - schinazi -httpbis-doh-preference-hints-01. Work in Progress.[22] P. Schmitt, A. Edmundson, A. Mankin, and N. Feam-ster. Oblivious DNS: Practical Privacy for DNS Queries.In C. Troncoso and K. Chatzikokolakis, editors,

Pro-ceedings of the 19th Privacy Enhancing Technologies ,pages 228–244, Stockholm, Sweden, 19th edition. Sci-endo, July 2019. doi: 10.2478/popets-2019-0028.23] M. Subramanian, E. Ye, R. Korlipara, and F. Smith.Techniques for measuring above-the-fold page render-ing, 2014. US Patent 8,812,648.[24] S. Sundaresan, N. Feamster, R. Teixeira, and N.Magharei. Measuring and Mitigating Web Perfor-mance Bottlenecks in Broadband Access Networks.In K. Gummadi and C. Partidge, editors,

Proceedingsof the 2013 Internet Measurement Conference (IMC) ,Barcelona, Spain. Association for Computing Machin-ery (ACM), Oct. 2013. isbn: 978-1-4503-1953-9. doi:10.1145/2504730.2504741.[25] W3C. HTTP Archive (HAR) Format. J. Odvarko, A.Jain, and A. Davies, editors. url: https://w3c.github. io / web - performance / specs / HAR / Overview . html(visited on 05/05/2019).[26] X. S. Wang, A. Balasubramanian, A. Krishnamurthy,and D. Wetherall. Demystifying Page Load Perfor-mance with WProf. In N. Feamster and J. Mogul,editors,