Anatomy of the Third-Party Web Tracking Ecosystem
Marjan Falahrastegar, Hamed Haddadi, Steve Uhlig, Richard Mortier
AAnatomy of the Third-Party Web Tracking Ecosystem
Marjan Falahrastegar † , Hamed Haddadi † ‡ , Steve Uhlig † , Richard Mortier (cid:5) Queen Mary University of London † , Qatar Computing Research Institute ‡ , University of Nottingham (cid:5) ABSTRACT
The presence of third-party tracking on websites has be-come customary. However, our understanding of the third-party ecosystem is still very rudimentary. We examine third-party trackers from a geographical perspective, observingthe third-party tracking ecosystem from 29 countries acrossthe globe. When examining the data by region (North Amer-ica, South America, Europe, East Asia, Middle East, andOceania), we observe significant geographical variation be-tween regions and countries within regions. We find trackersthat focus on specific regions and countries, and some thatare hosted in countries outside their expected target track-ing domain. Given the differences in regulatory regimesbetween jurisdictions, we believe this analysis sheds lighton the geographical properties of this ecosystem and on theproblems that these may pose to our ability to track and man-age the different data silos that now store personal data aboutus all.
Keywords
Advertisement, Privacy, Analytics, Trackers
Categories and Subject Descriptors
K.4 [
COMPUTERS AND SOCIETY ]: Privacy
1. INTRODUCTION
The rise in use of personal data and sophisticated algo-rithms on individuals’ online browsing behaviour and inter-ests has lead to increasing presence of third-party advertisingand analytics services on the Internet and the mobile web [1,2, 3]. These services aim to build a user profile by collection,aggregation, and correlation of an individual’s browsing be-haviour, demographics, interests, and temporal/spatial pat-terns of behaviour (e.g., through smartphone localisation, orlocation check-ins on Online Social Networks). While theseservices are vital for the online economy, there are complexdebates over privacy issues which are caused directly or indi-rectly (e.g., misusing ad tracker cookies to identify individ-uals [4]) by such services. Despite legal or regulatory effortson personal data collection and storage, existence of thou-sands of these third party services [3] across the world un- der different names and legal constraints makes holding indi-vidual businesses responsible for their actions a challengingtask.In this paper we investigate the geographic diversity andfootprint of thousands of third party domains across the world,using measurements from tens of vantage points distributedacross nearly all continents. As a result, we shed light onthe third party tracking and analytics industry across legaland geographic boundaries. Several recent works have high-lighted the footprint of companies such as Google as dom-inant corporations and United States as a specific prevalentcountry in the third-party tracking market [5, 2]. Consid-ering the role of technology, legislation and economics inprivacy problem [6], we believe that understanding regionaldifferences in the tracking ecosystem is essential for pro-viding effective privacy regulations, in addition to ability tounderstand the trade of personal data.We gathered data from 28 countries using PlanetLab nodes( § § §
4) to reveal a complex, interwoven set of cross-country re-lationships between third-party tracking services. We findextensive presence of European third-party trackers in thepopular websites of East Asia and the Middle East. In par-ticular, Germany and Russia third-parties are present acrosspopular websites of all investigated countries in our dataset.Similarly, third-parties based in North America, mostly US,are broadly embedded in popular websites of the MiddleEast. We hypothesise that the reasons for the observed cross-country tracking is related to the substantial differences inprivacy regulation in these countries. After a discussion ofrelated work ( § § a r X i v : . [ c s . S I] S e p egion Country North America Canada, USSouth America Argentina, Brazil, EcuadorEurope Belgium, France, Germany, Greece, Hungary,Italy, Netherlands, Norway, Russia, Slovenia,Sweden, United KingdomEast Asia China, Hong Kong, Japan, Korea, TaiwanMiddle East Israel, Jordan, Qatar, TurkeyOceania Australia, New Zealand
Table 1: The countries for which we collected data andtheir assigned region.
2. DATA COLLECTION
In this section we describe our data collection method-ology. We extended Krishnamurthy’s Firefox extension [7]to log cookies, Etags , and browser local storage. We thenran this script against the Alexa top-500 popular websites ineach country listed in Table 1, storing details of the observedthird-party for later analysis. All our data was obtained be-tween 28 March 2014 and 28 April 2014. To minimise pollu-tion between consecutive visits, a bash script creates a newuser profile and ensures that the local cache is cleared be-fore each website is visited and the extension runs. We usedPlanetLab’s infrastructure [8] nodes in 28 different countriesto gain access across the globe. In addition to the Planet-Lab servers, we also ran our scripts on a computer locatedin Qatar as PlanetLab’s coverage in the middle-east is notas strong as elsewhere. Unfortunately, a paucity of Planet-Lab nodes in Africa coupled with the failure of our scriptsto complete successfully on the few nodes in Africa that wecould try, we cannot present data pertaining to Africa.To identify third-party websites, our Firefox extension em-ploys the combination of the domain and adns approachesexplained by Krishnamurthy & Wills [2]. A third-party siteis identified as one whose second-level domain and ADNS(Authoritative DNS) server differ from the second-level do-main name and authoritative DNS server of the origin site.Use of the authoritative DNS server allows us to classifycases such as bbc.co.uk and bbci.co.uk correctly, observ-ing that both belong to the same company even though thesecond-level domains are different.In visiting the Alexa top-500 websites in 28 countries fromdifferent regions of the world, we visited a total of 6497unique websites and identified 6817 third-party trackers. Weobserved the presence of third-parties on the over 80% ofthe visited websites. Qatar (814), Korea (769) and HongKong (726) are the top three countries in terms of num-ber of third-party trackers, while the United Kingdom (397),Jordan (330) and Belgium (274) are the bottom three. Wegroup countries into six geographical regions : North Amer-ica, South America, Europe, East Asia, Oceania and theMiddle East. Table 1 shows the investigated countries and Figure 1: The strength of countries in terms of numberof local third-party services.
No. of physically hosted third−parties W eb I nde x R an k i ng Figure 2: Web Index ranking vs Locally hosted third-parties the regions to which they belong. Overall we detect thehighest numbers of third-party trackers in Europe (3378) andEast Asia (2009). Normalising by the number of countries ineach region, we see that the North America, Oceania and theMiddle East are the regions with the highest average num-bers of third-party services.
3. REGIONAL ANALYSIS
We begin our regional analysis by counting the number ofthird-parties which are physically hosted in the countries ofour study. We rely on the geoiplookup utility to determinethe country in which each observer third-party resides. Ourresults are shown in Figure 1, with circle diameter represent-ing the number of third-parties located in each country. Wecan see that locally hosted third-parties in countries acrossEurope, East Asia and Oceania regions are relatively evenlydistributed, whereas, in North and South America and theMiddle East there are substantial variations. For example,in the Middle East, Turkey and Israel have many more localthird-parties than other countries in that region. In generalwe found North America, Germany and China with the high-est number of locally hosted third-parties. After looking atthis result one natural question that may arise is the correla-2 . . . Number of Domains E CD F Figure 3: Distribution of aggregated domains presentsthe existence of companies with tens of third-party track-ers.
Company( google family (42) elender.hu (9)verisigngrs.com (27) gabia.com (9)microsoft.com (19) indom.com (9)aol family (18) schibsted-it.no (9)sakura.ad.jp (17) taobao.com (9)firstns.de (15) tonline.hu (9)netnames.net (15) iponweb.net (8)transip.nl (15) knet.cn (8)yahoo family (14) sanomaonline.hu (8)register.it (13) baidu family (7)sina family (12) adresseavisen.no (7)conversant family (11) atcom.gr (7)qq.com (11) bbend.com (7)registercom (10) comodogroup.com (7)regtime.net (10) ebay.com (7)
Table 2: Top-30 identified companies with the highestnumber of aggregated domains. tion between Web-based advancement and number of locallyhosted third-parties across countries. To answer this ques-tion, we used the Web Index [9] that is provided by WWWFoundation led by Tim Berners-Lee. The index , first re-leased in 2012 and updated in 2013, measures the contribu-tion of Web in 81 countries using four factors: ”UniversalOpenness” for communication infrastructure, ”Freedom andOpenness” for citizen rights of information, opinion and on-line privacy, ”Relevant content” for accessibility of relevantinformation based on gender and language, ”Empowerment”for impact of the Web on society, economy and politics. Fig-ure 2 presents the scatterplot for Web Index ranking againstlocally hosted third-parties per country. We observe that themajority of countries with high ranking have actually highnumber of locally hosted third-parties. Turkey, Hungary,Russia and China constitute four exceptions with over 100locally services while they are ranked below 50.We carry on our analysis by identifying dominant third-parties in each region after aggregating third-parties withintheir parent company, identified through a combination ofthree methods. First, we used Collusion’s dataset [10] to de-tect third-parties belonging to the same company. We man-ually inspected this dataset for any changes using websites
Company Domain
Table 3: Top-20 companies and their third-party do-mains. [email protected] is the email address of all third-parties hosted on AmazonWeb Services, and [email protected] is assignedfor all services hosted on Google App Engine. We iden-tified the unhelpful email addresses if their email domainname belongs to the known CDN and DNS services, or key-words in the email domain indicate such services. Inteadfor these cases, we used the organization indicated in their whois records when available, else we assumed the third-party has no parent company.The distribution of aggregations we carried out is shownin figure 3. The size of the parent companies varies con-siderably: some appear to own tens of third-party trackerswhile others have fewer than five. Table 2 shows the top-30identified companies with the highest number of aggregatedthird-parties. The well-known advertising-related (e.g., an-alytics, ad trackers) companies are presented in companyname family format. Moreover, the aggregated domains ofsuch companies are listed in Table 3. We found, unsurpris-ingly, that Google, AOL and Yahoo appear to own the largestnumber of third-party trackers. We present the hierarchy ofthe top-four big companies, acquisitions and their trackers inAppendix 7.As well as the well-known services such as Google, wealso observed some less well-known third-party services, spreadacross almost all regions. We present the top-20 in each re-gion in Figure 4. We found third-party services belonging toGoogle, Amazon and Facebook roughly in the same positionthroughout our investigated regions (top four) while Yahoo,compared with other regions, has a notably higher position(second place) in Europe (Figure 4(c)) and South America(Figure 4(b)). This difference in South America is due tothe high number of occurrences of the Yahoo third-party re-quests in Ecuador (4304; 10% of the third-party websites inSouth America). Similarly, Slovenia, Norway and Hungarycontribute most in Europe (Figure 4(c)).Beside the most famous players, we identified other third-party services with extensive presence across all regions. Forexample, scorecardresearch belongs to comScore Inc., ananalytics company, netdna-cdn , belongs to NetDNA, a CDNcompany, and quantserve belongs to QuantCast, a behaviouraladvertising company, almost appeared in all regions, exceptfor the absence of netdna-cdn in East Asia. This presenceimplies a growing competitiveness of such businesses acrossthe regions.In addition to the global third-party services, we observeda notable presence of local third-parties in specific regionssuch as East Asia and Europe. We remind that local third-parties are those services which are physically located in therelated region. In East Asia (Figure 4(f)), 11 cases from the
TLD Number .com .net .biz , .asia ) 444 .org Table 4: The proportion of different Top Level Domainnames of the third-parties. top-20 are based in this region (e.g., sina \ -family , tabaocdn.com ); in Europe (Figure 4(c)), 4 services amongst those pre-sented are mainly found in European countries (DE-based: adtech.de ; FR-based: criteo.com , smartadserver.com ; GB-based badoocdn.com ). On the other hand, in Oceania andSouth America, there are far fewer local third-parties (oneout of top-20) and in the Middle East there are none in thetop-20.We next examine the ecosystem of third-party trackers indifferent regions to find out how this ecosystem looks likeif we put the dominant and popular players aside. We alsoinvestigate “unconventional” third-party services and theirdistribution in different regions. In our analysis, we excludedall those third-party websites that had a country code or apopular TLD name (com, org or net). This left us with about4%(= 235) of the total identified third-party trackers. Table 4shows the proportion of different TLDs amongst the third-parties.Figure 5 shows the top-10 atypical third-parties in eachregion, which appeared in over 50% of the countries of thatgroup. It is notable that amongst these atypical services,some are globally active. We identified three cases amongstthe top ten: simpli.fi (a US-based ad tracker), trafficfactory.biz (a Netherlands-based advertising agency) and adap.tv (aUS-based ad broker) that appeared in most of the regions ofour dataset, denoted by (*) in the above figures.Atypical services are almost equally spread across coun-tries in Oceania (Figure 5(d)), Europe (Figure 5(c)) and SouthAmerica (Figure 5(b)), whereas in other regions the occur-rence of small services is unequal amongst countries of eachregion. For example, in East Asia (Figure 5(f)) and the Mid-dle East (Figure 5(e)), Taiwan and Qatar have high occur-rence of the services in comparison with other countries intheir own group.In terms of region specific services, we identified a Den-mark based ad serving third-party, emediate.eu , which mostlyprovide services in northern and central Europe such as Swe-den, Norway, and Germany (Figure 5(c)). We did not ob-serve regional small services in other groups. However, thepresence of some overseas services such as a Spain basedweb hosting, abcimg.es , in South America (Figure 5(b))and US governmental services such as usa.gov in Oceania(Figure 5(d)) is of interest.
4. PER-COUNTRY ANALYSIS goog l e_ f a m il y a m a z on_ f a m il y f a c eboo k _ f a m il yc o m sc o r _ f a m il yy ahoo_ f a m il y w o r dp r e ss . c o m t w i tt e r _ f a m il y adn xs . c o m ne t dna − c dn . c o m t u r ne r . c o m b l ea c he rr epo r t. ne t quan t c a s t _ f a m il y phn c dn . c o m w p . c o m r ub i n c on_ f a m il y o m t r d c . ne t i gn i m g s . c o m pa r a s t o r age . c o m r ed t ube f il e s . c o m nh l e . c o m CAUS (a) North-America % goog l e_ f a m il yy ahoo_ f a m il y a m a z on_ f a m il y f a c eboo k _ f a m il y t w i tt e r _ f a m il y adn xs . c o m c o m sc o r _ f a m il y b l og s po t. c o m e − p l ann i ng . ne tt u r ne r . c o m r ub i n c on_ f a m il y phn c dn . c o m r ed t ube f il e s . c o m ne t dna − c dn . c o mm l s t a t i c . c o m a k a m a i hd . ne t add t h i s . c o m quan t c a s t _ f a m il y b s t a t i c . c o m s li de s ha r e c dn . c o m BRECAR (b) South-America % goog l e_ f a m il yy ahoo_ f a m il y f a c eboo k _ f a m il y a m a z on_ f a m il y t w i tt e r _ f a m il y adn xs . c o m phn c dn . c o m c o m sc o r _ f a m il y ad t e c h . de c r i t eo_ f a m il y add t h i s . c o m ne t dna − c dn . c o m b l og s po t. c o m badoo c dn . c o m quan t c a s t _ f a m il yx h c dn . c o m s m a r t ad s e r v e r . c o m v i m eo c dn . c o m s i z m e k _ f a m il y aud i en c e sc i en c e_ f a m il y RUHUNONLSESI BEGRDEITFRGB (c) Europe % goog l e_ f a m il y a m a z on_ f a m il y f a c eboo k _ f a m il yy ahoo_ f a m il y t w i tt e r _ f a m il y adn xs . c o m n i e l s on_ f a m il y g r abone − a ss e t s . c o m r a ckc dn . c o m t u r ne r . c o m gu i m . c o . u k ne t dna − c dn . c o m c o m sc o r _ f a m il y r ub i n c on_ f a m il y t i q c dn . c o m ad r o ll . c o m e s pn c dn . c o m quan t c a s t _ f a m il y be t r ad . c o m phn c dn . c o m NZAU (d) Oceania % goog l e_ f a m il y f a c eboo k _ f a m il y a m a z on_ f a m il y t w i tt e r _ f a m il yy ahoo_ f a m il y adn xs . c o m c o m sc o r _ f a m il y ne t dna − c dn . c o m add t h i s . c o m w p . c o m ge m i u s . p l quan t c a s t _ f a m il y e ff e c t i v e m ea s u r e . ne t ad t e c h . deb s t a t i c . c o m y ande x . r ug r a v a t a r . c o m s i z m e k _ f a m il y li n k ed i n . c o m c r w d c n t r l . ne t JOILTRQA (e) Middle-East % goog l e_ f a m il y ba i du_ f a m il y a m a z on_ f a m il yy ahoo_ f a m il ys i na_ f a m il yc o m sc o r _ f a m il yc n zz . c o m t aobao c dn . c o m t an x . c o m a li c dn . c o m a ll y e s . c o m w p . c o mm ed i a v . c o m ao l _ f a m il y qq . c o m s − m s n . c o m op t a i m . c o m quan t c a s t _ f a m il y o m t r d c . ne t adobe_ f a m il y KRCNTWHKJP (f) East-Asia
Figure 4: Top-20 third-party websites by region. Occurrence count for each third-party is displayed above each bar. O cc u rr en c e s b l z . i oadap .t v m o l . i m s i m p li .f i v i d i b l e .t v u s a . go v t r a ff i c f a c t o r y . b i z p r f c t. c og f x . m s b w b x . i o CAUS * * * (a) North-America O cc u rr en c e s goo . m x ab c i m g . e ss i m p li .f i h tt p − s . w s t r a ff i c f a c t o r y . b i z p r f c t. c oadap .t v ad k . c oe l m undo . e s ARECBR * * * (b) South-America O cc u rr en c e s e m ed i a t e . eu m o l . i m t r a ff i c f a c t o r y . b i z goo . m xs i m p li .f i adap .t v p r f c t. c o RUHUNONLSESI BEGRDEITFRGB * * (c) Europe O cc u rr en c e s s i m p li .f i m o l . i m adap .t v u s a . go v t r a ff i c f a c t o r y . b i z g f x . m s p r f c t. c o po . s t ad k . c oe t hn . i o NZAU * * * (d) Oceania O cc u rr en c e s f ll . l aadap .t v m yc dn . m ep r f c t. c ogoo . m x t r a ff i c f a c t o r y . b i zs i m p li .f i s h . s t eb z . i o b c . vc JOILTRQA * * * (e) Middle East O cc u rr en c e s t ag t oo . c o m o l . i m adap .t v i m g s . ccs i m p li .f i p r f c t. c o i n t e r e s t. m e 51 . l app s .t v KRCNTWHKJP ** (f) East Asia Figure 5: Top-10 small third-party websites in East Asia and Middle East. Globally observed sites are indicated by *. A I L T R J O U S C AA UN Z T W CNH KK R J PA R E C B RRUHU G B F R I T D E G R BES I SE N L N O NONLSESIBEGRDEITFRGBHURUBRECARJPKRHKCNTWNZAUCAUSJOTRILQAThird Party Country F i r s t P a r t y C oun t r y Middle−EastUS−CanadaOceanaEast−AsiaLatinEurop (a) x -axis: country code & y -axis: Alexaranking Q A I L T R J O U S C AA UN ZT W CNH KK R J PA R E C B RRUHU G B F R I T D E G R BES I SE N L N O NONLSESIBEGRDEITFRGBHURUBRECARJPKRHKCNTWNZAUCAUSJOTRILQA
Third Party Geolocation F i r s t P a r t y A l e x a C oun t r i e s Middle−EastUS−CanadaOceanaEast−AsiaLatinEurop (b) x -axis: physical location & y -axis: Alexaranking Q A I L T R J O U S C AA UN ZT W CNH KK R J PA R E C B RRUHU G B F R I T D E G R BES I SE N L N O NONLSESIBEGRDEITFRGBHURUBRECARJPKRHKCNTWNZAUCAUSJOTRILQA
Third Party Geolocation F i r s t P a r t y G eo l o c a t i on Middle−EastUS−CanadaOceanaEast−AsiaLatinEurop (c) x -axis & y -axis: physical location Figure 6: Heatmaps showing locations of third-parties. y -axis is the location of the first-party and x -axis is the locationof the third-party. Darker colours indicate greater presence, and the region of each country in the two left-most plots isdepicted by the colour of the blue bars on the left and at the top. In this section we look at the presence of the overseasthird-party trackers in each country and region. We focus ona specific subset of third-parties which their TLD is a coun-try code e.g., the TLD of yadro.ru is the country code ofRussia, we refer to them as locally named third-parties. Forthis purpose, we exclude all third-parties with non CountryCode TLD (CCTLD) as well as those which their CCTLD isnot corresponding to the countries in our list. Our method islower-bound for US-named third-parties because some TLDSsuch as gov is very popular for American domain nameswhich we exclude them. We identified 1654 (24%) uniquethird-parties with CCTLD amongst the other TLDs whichare listed in the table 4.We examine the penetration of locally named third-partiesacross popular websites of countries (based on top 500 Alexaranking) where they do not belong to. The heat map in Fig-ure 6(a) demonstrates such presence. The y -axis shows thecountry which visited website is popular in, while the x -axis corresponds to the CCTLD of third-parties which ap-peared in the visited sites. For example, the third row showsthat the websites which are popular in Turkey embed somethird-party trackers from Germany, Russia, United Statesand Turkey. In general, we found United States, Russia andGermany to be the countries with the locally named third-parties across popular websites of almost all countries inour dataset. Amongst other European countries, Norwegianand Sweden third-parties have a similar and notable pres-ence in each other popular websites. One explanation forsuch presence is that a website can be popular in more thanone country therefor its locally named third-party is consid-ered as an overseas in other countries in which the websiteis popular. Another reason can stem from the difference be-tween the country which a third-party CCTLD implies andthe country where it is physically hosted. To investigate this possibility, we examined the correspondence betweenCCTLDs and the location of the third-parties. We foundmost of the CCTLDs corresponding to their actual physicallocation. Heat map 6(b) presents this result which is verysimilar to the heat map 6 except for those third-parties thattheir CCTDLs were not corresponding to their physical lo-cation. For instance locally named third-parties of Italy andCanada amongst popular websites of Qatar in the first rowof figure6(a) are not present at the corresponding row in thefigure 6(b) since they were not physically hosted in Italy andCanada. On the other hand, US-based locally named third-parties appear stronger which is due to the presence of somelocally named third-parties which their TLD refers to an-other country rather than US. The presence of Sweden-basedthird-parties amongst popular websites in China is anotherrevealed interesting point in this examination.So far we observed the considerable presence of locallynamed third-parties of some specific countries across pop-ular websites of various other countries. We carry on ourinvestigation to identify overseas third-parties using differ-ent approach, based on purely physical location of visitedwebsites and locally named third-parties. So that we as-sign countries to the visited websites using their physicallyhosted location instead of where they are popular in as wedid so far. In contrast to US, in countries of South Amer-ica there is no overseas third-parties across the websites lo-cated there neither third-parties based in these countries be-ing present across websites of other countries with slightexception for the websites based in North America. Simi-lar to the previous examination, we identify the country ofthe locally named third-parties based on their physical lo-cation. The heat map of figure 6(c) presents this result.Clearly, US has a unique situation; While across the ma-jority of countries, with few exceptions, there are consid-7rable number of locally named third-parties based in US,there are third-parties located in majority of countries whichare embedded inside US-based websites. Similar to the bothprevious examinations, the presence of locally named third-parties hosted in Germany, North America and Russia is highamongst websites located in Israel and Turkey. However, wedon´t observe any third-party from these countries amongstwebsites located in the other countries of Middle East (Qatarand Jordan) neither amongst other regions except Europeancountries and US in North American region. On the otherhand we observe slight presence of third-parties hosted inMiddle East countries except Qatar across websites locatedin Great Britain.We would like to clarify that the importance of presenceof overseas third-parties is due to their possible access to theuser´s information from various countries who are visitingthose popular websites. Considering various studies report-ing the access of third-party trackers to the user’s person-ally identifiable information such as full name, email or evenvery sensitive information including user’s health condition,in addition to the growing trend of Web data surveillance bydifferent governments, learning about countries which po-tentially have access to user information is helpful to under-stand the flow of data collection.
5. DISCUSSION
In the previous section, we observed the presence of theoverseas third-parties located in US, Germany and Russiaamongst websites of almost all countries. This implies, notsurprisingly, access to and potentially storage and processingof users’ data from countries other than those where the firstparty service is hosted. We now briefly examine one hypoth-esis for this behaviour, that it is driven by the differing dataprotection and online privacy rules in key countries in thoseregions. Afterwards, we discuss the role of such regulationin the presence of third-party services.The EU regulatory framework has a clear and compre-hensive set of rules for data protection. This means that allbusinesses located in any members of the European Unionshould comply with such regulations. In terms of online pri-vacy, according to the Directive 2002/58/EC and its amend-ment 2009/136/EC, known as ePrivacy Directive, websiteswhich are using cookies or other technologies to collect userdata should clearly inform users about such process and askfor their opt-in consent as soon as website is loaded on user’smachine. Similarly, in Australia, any entity that collects per-sonal information should notify user about that, however, thenotification can be provided after actual data collection [11].In US, in contrast to EU and Australia, there is no single,specific law that regulates the collection and use of personaldata. Therefore, the related laws varies in different states aswell as in different business sectors. The situation in coun-tries such as China and Turkey becomes more ambiguoussince there is no specific data protection law. We summarise the current regulation frameworks regarding data protectionand international data transferring in the Table 5.Despite the existence of data protection regulations in manycountries, and growing attention to the online privacy issues,the implementation and enforcement of such rules are notconsistent with the laws in theory. For example, Germanyhas not implemented ePrivacy Directive, therefore, there isno requirement for active opt-in consent, e.g., by clicking ona pop-up window. It’s suggested that browser cookie settingswould remain adequate. While in UK having user opt-inconsent is mandatory. In Russia, data protection rules havemany similarities with the EU directives, however, the incon-sistencies and complexity in the regulations lead to very lim-ited enforcement of law. In US, on the other hand, with to-tally different approach of jurisdiction system (relying moreon self-regulatory and guidelines) monitoring businesses ismore complicated. These differences suggest the effect ofimplementation and enforcement of law on highly presenceof third-party services in specific countries like US, Ger-many and Russia.We should note that other factors such as technologicaladvancement and political approaches could be influentialin this field. Germany act as a commercial hub in Europeas well as owning the most advanced digital infrastructuresuch as DE.CEX, the largest Internet exchange point in theworld. Russia based third-parties in Asia and Middle East isaligned with the general doctrine of the Russian governmentto broaden relationship with these regions [12].
6. RELATED WORK
A number of studies have analyzed third-party trackersfrom different points of view. Krishnamurthy & Wills [2]investigated the expansion of third-party trackers from 2005for a period of three years. They showed how tracking haschanged with time and the acquisitions of various compa-nies. They had previously analysed the growing associa-tion between first-party and third-parties in [7]. In [13],they examined the access of third-parties to personal infor-mation based on the category of the first-party website inwhich they are embedded. They found that websites provid-ing health and travel-related services disclose more informa-tion to third-parties than other types of websites.Roesner et al. [14] proposed a framework for classify-ing the behaviour of web trackers based on the scope ofthe browsing profile they produce. They show the spreadof the identified classes amongst the top 500 websites in theworld. Gomer et al [15] focused on the network aspects ofthird-party trackers in three search markets. They show aconsistent network structure across different markets as wellas high efficiency in exchanging information among third-parties. Mayer et al [16] surveyed different techniques whichare used by web trackers to collect user information.While the above studies focused on the technical capabili-ties of specific types of third-party trackers, our study exam-ines the presence of all third-parties across different regions8
Yes (cid:54) No (cid:119) Partial
Question US DE UK AU RU CN TR
Existence of dataprotection laws (cid:119) (cid:52) (cid:52) (cid:52) (cid:52) (cid:119) (cid:54)
Coverage of pri-vacy law Sectoral Comprehensive Comprehensive Comprehensive Comprehensive Sectoral Not applicableEffective regula-tor to enforce theprivacy laws Sectoral regula-tion Sectoral regula-tion National regula-tion National regula-tion National regula-tion None NoneCookie specificregulation (cid:54) (cid:52) (cid:52) (cid:54) (cid:54) (cid:54) (cid:54)
Dealing withnon-essentialcookies Informing uservia site policy(guideline) Opt-out mecha-nism (regulation) Opt-in (regula-tion) Notifying userbefore or aftervisiting a site None None NoneOverseas trans-feral of data US entities are li-able for adequateprotect of datathrough secu-rity safeguards,protocols or con-tractual model,with differentindustrial sectorshaving differentregulations The data recipi-ent must ensurean adequate levelof data protec-tion Adequacy as-sessment of dataprotection lawin the relevantcountry (outsideEEA) is requiredor the organi-sation must gettheir BindingCorporate Rulesapproved bythe nationalInformationCommissioner Entity musttake “reasonablesteps” to ensurethe principlesare not breachedoverseas, e.g., ifa cloud serviceprovider is plan-ning on sendingdata overseas,it should have acontract in placeto make suredata will not bemisused Data can betransferred toStrasbourg Con-vention states orother states thatensure adequateprotection ofpersonal data No specific reg-ulation; basedon the nature ofdata there arecertain industrialregulations,e.g., informa-tion collectedby commer-cial banks isnot allowed tobe transferredoverseas No specific regu-lation other thanrequires consent,though individ-ual cases mayhave additionalrequirements
Table 5: Comparison of data protection and data transferring across different countries. of the world. Kulshrestha et al. [17] show the way in whichusers in the various parts of the world have different (localand global) interaction on the Twitter social network. Ourwork is closest to that of Castelluccia et al. [5], which anal-yse the top 100 most popular sites worldwide across a num-ber of countries, to assess their tracker behaviours. Theyfocused on measuring the penetration of US-based servicesin different countries, whereas, we, focus on the regionalpresence of third-party trackers as well as less-known cases.From a privacy point of view, Ur et al [18] report users’strong concerns about data collection done by ad trackers.Moreover, Bellman et al. [19] showed the significant effectof culture and national regulation on users’ privacy concernsand consequently suggests localized privacy policies. In ourwork, we also show the impact of regional characteristicson the structure of the third-party ecosystem, and suggest tofurther investigate how privacy policies in this ecosystem areaffected by the regional regulatory frameworks in place.
7. CONCLUSIONS
In this paper, we had presented a study of the geographicdifferences in the third-party ecosystem. We sampled theAlexa top-500 most popular websites in each of 28 countriesacross widely spread regions of the world: North America,South America, Europe, East Asia, Middle East and Ocea-nia. We examined the global and regional presence of thelarge, small and atypical third-party services in each region,as well as their penetration into other regions. We exposed connections amongst countries within each region by exam-ining the geographic presence of third-parties that serve abig share of today’s web.Unsurprisingly, we found overall that a small number ofinternational corporations are heavily dominant in all coun-tries and regions. We observed significant differences inthe numbers of observed third-parties across regions, withTurkey and Israel in the Middle East region standing out ashaving considerably more local third-parties than most coun-tries. We observed considerably greater regional dominanceof third-parties in Europe and East Asia, perhaps indicat-ing greater commercial collaboration among companies incountries in those regions. In contrast, presence countries inNorth America is dominated by local third-parties.We hope that the findings of our study will help better un-derstand the international trade of personal information andaccordingly adapt privacy protection solutions. Indeed, wehighlighted the potential influence of regulatory constraintson the presence of third-parties. One example is Russiawhere the complexity and ambiguity of privacy regulationlimits their implementation. Our observations suggest thatprivacy regulation, particularly in the area of cloud comput-ing, requires more attention from the regulatory community.As further work, we would like to expand the analysis byexamining the relationships between the third-party and thepeering ecosystems, especially to shed light on the physicalpresence and deployment of third-parties. Another interest-ing further work is to examine in details the type of services9rovided by third-parties in various regions of the world. Fi-nally, we need to find some way to collect data about theserelationships in Africa.
Acknowledgements
We acknowledge constructive feedback and advice from Bal-achander Krishnamurthy. This work was funded in part byHorizon Digital Economy Research, RCUK grant EP/G065802/1.
8. REFERENCES [1] Narseo Vallina-Rodriguez, Jay Shah, AlessandroFinamore, Yan Grunenberger, KonstantinaPapagiannaki, Hamed Haddadi, and Jon Crowcroft.Breaking for commercials: characterizing mobileadvertising. In
Proceedings of the ACM InternetMeasurement Conference (IMC) , 2012.[2] Balachander Krishnamurthy and Craig Wills. Privacydiffusion on the web: a longitudinal perspective. In
Proceedings of the 18th international conference onWorld Wide Web (WWW) , pages 541–550, New York,NY, USA, 2009. ACM.[3] Marjan Falahrastegar, Hamed Haddadi, Steve Uhlig,and Richard Mortier. The rise of panopticons:Examining region-specific third-party web tracking. InAlberto Dainotti, Anirban Mahanti, and Steve Uhlig,editors,
Traffic Monitoring and Analysis , volume 8406of
Lecture Notes in Computer Science , pages104–114. Springer Berlin Heidelberg, 2014.[4] NSA using Google’s online ad tracking tools to spy onweb users. .[5] Claude Castellucia, Stephane Grumbach, and LukaszOlejnik. Data Harvesting 2.0: from the Visible to theInvisible Web. In
The 12th Workshop on theEconomics of Information Security , Washington, DC,USA, June 2013.[6] Balachander Krishnamurthy. I know what you will donext summer.
SIGCOMM Comput. Commun. Rev. ,40(5):65–70, October 2010.[7] Balachander Krishnamurthy and Craig E. Wills.Generating a privacy footprint on the Internet. In
Proceedings of the 6th ACM SIGCOMM conferenceon Internet measurement , IMC ’06, pages 65–70, NewYork, NY, USA, 2006. ACM.[8] PlanetLab. Planetlab: An open platform fordeveloping, deploying and accessing planetary-scaleservices. .[9] Web index. http://thewebindex.org/ .[10] Collusion Firefox add on. Collusion firefox add-on. http://collusion.toolness.org/ .[11] Australian privacy principle 5 notification of thecollection of personal information. .[12] Richard Connolly. The other pivot to asia and whysuccess in china is not all it seems for putins russia. https://theconversation.com/the-other-pivot-to-asia-and-why/-success-in-china-is-not-all-it/-seems-for-putins-russia-27039/ .[13] Balachander Krishnamurthy, Konstantin Naryshkin,and Craig Wills. Privacy leakage vs. protectionmeasures: the growing disconnect. In
Proceedigs ofthe Web 2.0 Security and Privacy Workshop , 2011.[14] Franziska Roesner, Tadayoshi Kohno, and DavidWetherall. Detecting and defending against third-partytracking on the web. In
USENIX Symposium onNetworking Systems Design and Implementation(NSDI) . USENIX, 2012.[15] Richard Gomer, Eduarda Mendes Rodrigues, NatasaMilic-Frayling, and M.C. Schraefel. Network analysisof third party tracking: User exposure to trackingcookies through search.
Web Intelligence andIntelligent Agent Technology, IEEE/WIC/ACMInternational Conference on , 1:549–556, 2013.[16] Jonathan R. Mayer and John C. Mitchell. Third-partyweb tracking: Policy and technology. In
Proceedingsof the 2012 IEEE Symposium on Security and Privacy ,SP ’12, pages 413–427, Washington, DC, USA, 2012.IEEE Computer Society.[17] Juhi Kulshrestha, Farshad Kooti, Ashkan Nikravesh,and Krishna P. Gummadi. Geographic Dissection ofthe Twitter Network. In
In Proceedings of the 6thInternational AAAI Conference on Weblogs and SocialMedia (ICWSM) , Dublin, Ireland, June 2012.[18] Blase Ur, Pedro Giovanni Leon, Lorrie Faith Cranor,Richard Shay, and Yang Wang. Smart, useful, scary,creepy: Perceptions of online behavioral advertising.In
Proceedings of the Eighth Symposium on UsablePrivacy and Security , SOUPS ’12, pages 4:1–4:15,New York, NY, USA, 2012. ACM.[19] Steven Bellman, Senior Lecturer, Eric J. Johnson,Stephen J. Kobrin, William H. Wurster,Professor Multinational Management, and Gerald L.Lohse. G.: International differences in informationprivacy concerns: A global survey of consumers.
TheInformation Society , pages 313–324, 2004.
APPENDIXAppendix A
The full list of the identified family companies is shown intable 6.
Appendix B
The heat map in Figure 7 shows that the most of third-partieswith CCTLDs are physically located in a country where theirCCTDL point to. The y -axis represents the location of a10 oogleDoubleClick YouTube Blogger Other Acqn.. Google Specific AOLAdvertising.com Huffington Post Other Acqn.. AOL Specific advertising.com huffingtonpost.com 5min.com aolcdn.com advertising.aol.comadsonar.com huffpo.net tacoda.com aol.com atwola.comhuffpost.com goviral-content.com srvntrk.com blogsmithmedia.commirabilis.com mqcdn.compictela.net mapquestapi.com
Conversant (former ValueClick)CommissionJunction Media Plex Other Acqn.. Conversant Specific yceml.net mediaplex.com dotomi.com conversantmedia.comftjcfx.com lduhtrp.net apmebf.comtqlkg.com awltovhc.comqksrv.net kdukvh.com
YahooFlicker Yield Manager Other Acqn.. Yahoo Specific flickr.com yieldmanager.com bluelithium.com overture.com yahooapis.com yimg.comstaticflickr.com yldmgrimg.net maktoob.com xtendmedia.com yahoo.net sstatic.netzenfs.com
Table 6: Hierarchical presentation of top four big companies, acquisitions and their third-party trackers. third-party according to its country code, and the x -axis rep-resents the physical location of the third-party. Darker coloursindicate greater presence. 11 A I L T R J O U S C A A
U N
Z T W CN H
K K R J P A R E C B R RU HU G B F R I T D E G R BE S I SE N L N O NONLSESIBEGRDEITFRGBHURUBRECARJPKRHKCNTWNZAUCAUSJOTRILQA
Hosted Location CC T L D Middle−EastUS−CanadaOceanaEast−AsiaLatinEurop