Reducing Misinformation in Query Autocompletions
RREDUCING MISINFORMATION IN QUERY AUTOCOMPLETIONS ∗ Djoerd Hiemstra, † Radboud University, The Netherlands
Abstract
Query autocompletions help users of search engines tospeed up their searches by recommending completions ofpartially typed queries in a drop down box. These recom-mended query autocompletions are usually based on largelogs of queries that were previously entered by the searchengine’s users. Therefore, misinformation entered – eitheraccidentally or purposely to manipulate the search engine –might end up in the search engine’s recommendations, po-tentially harming organizations, individuals, and groups ofpeople. This paper proposes an alternative approach forgenerating query autocompletions by extracting anchor textsfrom a large web crawl, without the need to use query logs.Our evaluation shows that even though query log autocom-pletions perform better for shorter queries, anchor text au-tocompletions outperform query log autocompletions forqueries of 2 words or more.
INTRODUCTION
The brutal killing end of May 2020 by Minneapolis po-lice officers of George Floyd, who was already handcuffed,laying face down, and did not seem to resist arrest, becamean immediate target of disinformation on the platforms runby Google and Facebook. Figure 1 shows Google’s auto-completions for George Floyd early June 2020. Althoughit is hard to proof the deliberate manipulation of Google’sautocompletions in this particular case, we show below thatautocompletions based on previous user interactions havebeen shown to contain defamatory, racist, sexist and homo-phobic information, and there is increasing evidence thatautocompletions are an easy target for spreading fake newsand propaganda.Figure 1: Example autocompletionsSearch engines suggest completions of partially typedqueries to help users speed up their searches, for instance ∗ This study was done while the author worked at Searsia. Published atthe 2nd International Symposium on Open Search Technology, 12-14October 2020, CERN, Geneva, Switzerland. † [email protected] by showing the suggested completions in a drop down box.These query autocompletions enable the user to search faster,searching for long queries with relatively few key strokes.Jakobsson [12] showed that for a library information system,users are able to identify items using as little as 4.3 characterson average. Query autocompletions are now widely used,either with a drop down box or as instant results [4]. WhileJakobsson [12] used the titles of documents as completionsof user queries, web search engines today generally use largelogs of queries submitted previously by their users. Usingprevious queries seems a common-sense choice: The bestway to predict a query? Use previous queries! Severalscientific studies use the AOL query log provided by Pass etal. [21] to show that query autocompletion algorithms usingquery logs are effective [3, 8, 19, 24, 28]. However, queryautocompletion algorithms that are based on query logs areproblematic in two important ways:1. They return offensive and damaging results;2. They suffer from a destructive feedback loop.We discuss these two problems in the following sections. Offensive queries and misinformation in query au-tocompletions
Query autocompletion based on actual user queries mayreturn offensive results, and there are several examples whereoffensive autocompletions hurt organizations or individuals.For instance in 2010, a French appeals court ordered Googleto remove the word “arnaque”, which translates roughlyas “scam”, from appearing in Google’s autocompletionsfor the CNFDI, the Centre National Privé de Formationa Distance [17]. Google’s defense argued that its tool isbased on an algorithm applied to actual search queries: Itwas users that searched for “cnfdi arnaque” that caused thealgorithm to select offensive suggestions. The court howeverruled that Google is responsible for the suggestions that itgenerates, and that Google should remove misinformationthat is based on user-generated input of its search engine.Google lost a similar lawsuit in Italy, where queries for anunnamed plaintiff’s name were presented with autocompletesuggestions including “truffatore” (“con man”) and “truffa”(“fraud”) [18]. In another similar law suite, German’s formerfirst lady Bettina Wulff sued Google for defamation whenqueries for her name completed with terms like “escort” and“prostitute” [15]. In yet another lawsuit in Japan, Google wasordered to disable autocomplete results for an unidentifiedman who could not find a job because a search for his namelinked him with crimes he was not involved with [5].Google has since updated its autocompletion results byfiltering offensive completions for person names, no mat-ter who the person is [29]. But controversies over query a r X i v : . [ c s . I R ] S e p utocompletions remain. A study by Baker and Potts [2]highlights that autocompletions produce suggested termswhich can be viewed as racist, sexist or homophobic. Bakerand Pots analyzed the results of over 2,500 query prefixesand concluded that completion algorithms inadvertently helpto perpetuate negative stereotypes of certain identity groups.A study by Ray and Ayalon [23] suggests that Google playsan important role in the spread of age and gender stereo-types via its autocomplete algorithm. And despite Googleintention to filter autocompletions for person names,There is increasing evidence that autocompletions playan important role in spreading fake news and propaganda.Query suggestions actively direct users to fake content onthe web, even when they are not looking for it [22]. Exam-ples include completions like “Did the holocaust happen”,which if selected, returned as its top result a link to theneo-Nazi site stormfront.org [7]. Bad publicity will usuallypersuade Google to remove such autocompletions. In 2016,Google announced it removed “are Jews evil” from its auto-completions, but many similar offensive completions werestill suggested two years later [14]. Removing such com-pletions is important, because they led people to search foroffensive results that otherwise would not have. Stephens-Davidowitz [26] showed that in the 12 months followingGoogle’s removal of “are Jews evil”, approximately 10%fewer such questions were asked compared to the 12 monthsbefore the removal.The examples show that query autocompletions can beharmful if they are based on searches by previous users.Harmful completions are suggested when ordinary usersseek to expose or confirm rumors and conspiracy theories.Furthermore, there are indications that harmful query sug-gestions increasingly result from computational propaganda,i.e., organizations use bots to game search engines and socialnetworks [25]. It is not hard to manipulate search autocom-pletions, as shown by Want et al. [27], who revealed hun-dreds of thousands of manipulated terms that are promotedthrough major search engines. A destructive feedback loop
Misinformation and morally unacceptable query comple-tions are not only introduced by the searches of previoususers, they are also mutually reinforced by the search engineand its users. When a query autocompletion algorithm sug-gests morally unacceptable queries, users are likely to selectthose, even if the users are only confused or stunned by thesuggestion. But how does the search engine ever learn it waswrong? It might not ever. As soon as the system determinedthat some queries are recommended; they are more of themselected by users, which in turn makes the queries end upin the training data that the search engine uses to train it’sfuture query autocompletion algorithms. Such a destructivefeedback loop is one of the features of a
Weapon of Math De-struction , a term coined by O’Neil [20] to describe harmfulstatistical models.O’Neil sums up three elements of a Weapon of Math De-struction: Damage, Opacity, and Scale. Indeed, the damage caused by query autocompletion algorithms is extensivelydiscussed in the previous section. Query autocompletionalgorithms are opaque because they are based on the propri-etary, previous searches known only by the search engine.If run by a search engine that has a big market share, thequery completion algorithm also scales to a large number ofusers. Query autocompletions of a search engine with a ma-jority market share in a country might substantially alter theopinion of the country’s citizens, for instance, a substantialnumber of people will start to doubt whether the holocaustreally happened.
Structure of the paper
This paper is structured as follows. In Section , we de-scribe a simple but powerful approach that trains query auto-completions using the content that is indexed by the searchengine by extracting anchor texts from a large web crawl.Section compares these content-based query autocomple-tions to collaborative query autocompletions based on querylogs. Section concludes the paper.
CONTENT-BASED AUTOCOMPLETIONS
It is instructive to view a query autocompletion algorithmas a recommender system, that is, the search engine recom-mends queries based on some input. Recommender systemsare usually classified into two categories based on how rec-ommendations are made [1]:1. Collaborative recommendations, and2. Content-based recommendations.Collaborative query autocompletions are based on similar-ities between users: “People that typed this prefix oftensearched for: . . . ”. Content-based query autocompletionsare based on similarities with the content: “Web pages thatcontain this prefix are often about: . . . ”.Until now, we only discussed collaborative query auto-completion algorithms. What would a content-based queryautocompletion algorithm look like? Bhatia et al. [6] pro-posed a system that generates autocompletions by using all N -grams of order 1, 2 and 3 (that is single words, wordpairs, and word triples) from the documents. They testedtheir content-based autocompletions on newspaper data andon data from ubuntuforums.org. Instead of N -gram modelsfrom all text, Kraft and Zien [13] built models for queryreformulation solely from the anchor texts, the clickabletexts from hyperlinks in web pages. Interestingly, Dang andCroft [10] argue that anchor text can be an effective substi-tute for query logs. They studied the use of anchor textsfor a range of query reformulation techniques, includingquery-based stemming and query reformulation, treating theanchor texts as if it were a query log.Inspired by research of Bhatia et al. [6], Kraft and Zien[13], and Dang and Croft [10], we obtain query autocom-pletions from the anchor texts of web pages, and test howwell these autocompletions predict full queries from a largequery log of a web search engine, given a query prefix. OLLABORATIVE VS. CONTENT-BASEDAUTOCOMPLETIONS
Prefix MRR Returned | ’, ‘-’ or‘;’) followed by a space were split in multiple strings. Textin braces ‘()’, ‘{}’, ‘[]’ was removed from the strings. Weprocessed the anchor texts by retaining only suggestions thatoccur at least 15 times. This resulted in 46 million uniquesuggestions. Performance of the ClueWeb09 anchor textsuggestions is presented in Table 2.Table 2: Quality of autocompletions using anchor texts Prefix MRR Returned https://github.com/searsia/searsiasuggest . CONCLUSION
Query autocompletions based on anchor text from webpages perform remarkably well. For queries of more thanone word, they outperform autocompletions that are basedon over two months of query log data. Simply extractingall anchor texts is really only a first attempt to get well-performing autocompletions from web content. Ideas toimprove suggestions are: Using linguistic knowledge to getsuggestions from all web page text (for instance using theStanford CoreNLP tools [16]), using web knowledge likePageRank scores and Spam scores to improve the qualityof suggestions, and reranking of suggestions by their “query-ness” using machine learning.Future work should follow a user-centered evaluation, us-ing ethical instruments of analysis, to better measure theusefulness of autocompletions. This includes measuring ifcan suggest a query that is better than the user’s intendedquery, measuring the actual amount of misinformation in aut-completions (links can also be manipulated, using so-called PageRank scores and Spam scores are also available for ClueWeb09 [9]. oogle bombing), as well as their timeliness (updating fromhyperlinks might be slower than updating from queries).
Acknowledgments
I am grateful to the Vietsch Foundationand NLnet Foundation for funding the work presented in thispaper, which was done at Searsia ( http://searsia.org ). REFERENCES [1] G. Adomavicius and A. Tuzhilin. Toward the next generationof recommender systems: a survey of the state-of-the-art andpossible extensions.
IEEE Transaction on Knowledge andData Engineering , 17(6):734–749, 2005.[2] P. Baker and A. Potts. Why do white people have thin lips?Google and the perpetuation of stereotypes via auto-completesearch forms.
Critical Discourse Studies , 10(2), 2013.[3] Z. Bar-Yossef and N. Kraus. Context-sensitive query auto-completion. In
Proceedings of the ACM SIGIR Conference onResearch and Development in Information Retrieval , 2011.[4] H. Bast and I. Weber. Type less, find more: fast autocom-pletion search with a succinct index. In
Proceedings of theACM SIGIR Conference on Research and Development inInformation Retrieval , 2006.[5] BBC. Google ordered to change autocomplete function injapan.
BBC News
Proceedings of the 34th interna-tional ACM SIGIR Conference on Research and Developmentin Information Retrieval , pages 795–804, 2011.[7] C. Cadwalladr. Google is not ‘just’ a platform. it frames,shapes and distorts how we see the world.
The Guardian , 11December 2016.https://theguardian.com/commentisfree/2016/dec/11/google-frames-shapes-and-distorts-how-we-see-world[8] F. Cai, S. Liang, and M. de Rijke. Prefix-adaptive and time-sensitive personalized query auto completion.
IEEE Transac-tion on Knowledge and Data Engineering , 28(9):2452–2466,2016.[9] J. Callan and M. Hoy. The ClueWeb09 data set.
CarnegieMellon University , 2009.http://boston.lti.cs.cmu.edu/Data/clueweb09/[10] V. Dang and W.B. Croft. Query reformulation using anchortext. In
Proceedings of the third ACM international confer-ence on Web search and data mining (WSDM) , 2010.[11] D. Hiemstra and C. Hauff. MapReduce for information re-trieval evaluation: “let’s quickly test this on 12 TB of data”.In
Lecture Notes in Computer Science 6360 , 2010.[12] M. Jakobsson. Autocompletion in full text transaction en-try: a method for humanized input. In
Proceedings of theACM SIGCHI Conference on Human Factors in ComputingSystems , pages 327–332, 1986.[13] R. Kraft and J. Zien. Mining anchor text for query refinement.In
Proceedings of the 13th International World Wide WebConference (WWW) , pages 666–674, 2004.[14] I. Lapowsky. Google autocomplete still makes vile sugges-tions.
Wired
TechCrunch , 7September 2012.https://techcrunch.com/2012/09/07/germanys-former-first-lady-sues-google-for-defamation-over-autocomplete-suggestions/[16] C.D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S.J. Bethard,and D. McClosky. The stanford corenlp natural languageprocess-ing toolkit. In
Proceedings of 52nd Annual Meet-ing ofthe Association for Computational Linguistics: SystemDemonstrations , pages 55–60, 2014.[17] M. McGee. Google loses French lawsuit over Google suggest.
Search Engine Land , 6 January 2010.https://searchengineland.com/google-loses-french-lawsuit-over-google-suggest-32994[18] D. Meyer. Google loses autocomplete defamation case in italy.
ZDNet
Proceedings of the 24th ACM International on Con-ference on Information and Knowledge Management (CIKM) ,pages 1755–1758, 2015.[20] C. O’Neil.
Weapons of math destruction: How big dataincreases inequality and threatens democracy . Crown, 2016.[21] G. Pass, A. Chowdhury, and C. Torgeson. Adaptive queryauto-completion via implicit negative feedback. In
Proceed-ings of the 1st international conference on Scalable informa-tion systems (InfoScale) , pages 1–7, 2006.[22] H. Roberts. How google’s ‘autocomplete’ search resultsspread fake news around the web.
Business Insider
The Gerontologist , 2019.[24] M. Shokouhi and K. Radinsky. Time-sensitive query auto-completion. In
Proceedings of the ACM SIGIR Conference onResearch and Development in Information Retrieval , pages601–610, 2012.[25] S. Shorey and P.N. Howard. Automation, big data, and poli-tics: A research review.
International Journal of Communi-cation , 10(2016):5032–5055, 2016.[26] S. Stephens-Davidowitz. Tech firms have a duty to face downantisemitism.
The Guardian , 11 January 2019.https://theguardian.com/commentisfree/2019/jan/11/uk-antisemitic-google-searches-tech-companies[27] P. Wang, X. Mi, X. Liao, X. Wang, K. Yuan, F. Qian, andR.A. Beyah. Game of missuggestions: Semantic analysis ofsearch-autocomplete manipulations. In
Proceedings of the 5thAnnual Network and Distributed System Security Symposium(NDSS) , 2018.[28] S. Whiting and J.M. Jose. Recent and robust query auto-completion. In
Proceedings of the World Wide Web Confer-ence (WWW) , pages 971–982, 2014.[29] T. Yehoshua. Google search autocomplete.