[PDF] Truncation-Free Matching System for Display Advertising at Alibaba

Abstract

Matching module plays a critical role in display advertising systems. Without query from user, it is challenging for system to match user traffic and ads suitably. System packs up a group of users with common properties such as the same gender or similar shopping interests into a crowd. Here term crowd can be viewed as a tag over users. Then advertisers bid for different crowds and deliver their ads to those targeted users. Matching module in most industrial display advertising systems follows a two-stage paradigm. When receiving a user request, matching system (i) finds the crowds that the user belongs to; (ii) retrieves all ads that have targeted those crowds. However, in applications such as display advertising at Alibaba, with very large volumes of crowds and ads, both stages of matching have to truncate the long-tailed parts for online serving, under limited latency. That's to say, not all ads have the chance to participate in online matching. This results in sub-optimal result for both advertising performance and platform revenue. In this paper, we study the truncation problem and propose a Truncation Free Matching System (TFMS). The basic idea is to decouple the matching computation from the online pipeline. Instead of executing the two-stage matching when user visits, TFMS utilizes a near-line truncation-free matching to pre-calculate and store those top valuable ads for each user. Then the online pipeline just needs to fetch the pre-stored ads as matching results. In this way, we can jump out of online system's latency and computation cost limitations, and leverage flexible computation resource to finish the user-ad matching. TFMS has been deployed in our productive system since 2019, bringing (i) more than 50% improvement of impressions for advertisers who encountered truncation before, (ii) 9.4% Revenue Per Mile gain, which is significant enough for the business.

Full PDF

TTruncation-Free Matching System for Display Advertising atAlibaba

Jin Li ∗ , Jie Liu ∗ Shangzhou Li, Yao Xu, Ran Cao, Qi Li, Biye JiangGuan Wang, Han Zhu, Kun Gai, Xiaoqiang Zhu

Alibaba GroupBeijing, China{echo.lj,jierui.lj}@alibaba-inc.com{shangzhou.lsz,zaifeng.xy,caoran.cr,luyuan.lq,biye.jby}@alibaba-inc.com{shangfeng.wg,zhuhan.zh,jingshi.gk,xiaoqiang.zxq}@alibaba-inc.com

ABSTRACT

Matching module plays a critical role in display advertising sys-tems. Different from sponsored search where user intentions canbe captured naturally through query, display advertising has no ex-plicit information about user intentions. Thus, it is challenging fordisplay advertising systems to match user traffic and ads suitablyw.r.t. both user experience and advertising performance.

From theadvertiser’s view , system packs up a group of users with commonproperties, such as the same gender or similar shopping interests,into a crowd . Here term crowd can be viewed as a tag over usersin the same crowd. Then advertisers bid for different crowds anddeliver their ads to those targeted users.

From the advertisingsystem’s view , things turn to be a little different. So far as weknow, matching module in most industrial display advertising sys-tems follows a two-stage paradigm. When receiving a user visitrequest, matching system (i) finds the crowds that the user belongsto; (ii) retrieves all ads that have targeted those crowds. However, inreal world applications, such as the display advertising at Alibaba,with volume of crowds reaching up to tens of millions and volumeof ads reaching up to millions, both stages of matching have totruncate the long-tailed user-crowd or crowd-ad pairs for onlineserving, under limited latency and computation cost requirements.That is to say, not all advertisers that bid for a given user have thechance to participate in the online matching process. This resultsin sub-optimal advertising performance for advertisers. Besides, italso brings loss of revenue of the advertising platform.In this paper, we study carefully the truncation problem and pro-pose a T runcation- F ree M atching S ystem (TFMS). The basic ideaof TFMS is to decouple the matching computation from the onlineprocessing pipeline. Instead of executing the two-stage matchingwhen user visits, TFMS utilizes a near-line truncation-free matchingmodule to pre-calculate and store those top valuable ads for eachuser. Then, the online pipeline just needs to fetch the pre-stored can-didate ads as the result of matching. In this way, near-line matchingcan jump out of the online system’s latency and computation costlimitations and leverage flexible computation resources to finishthe user-ad matching process. Moreover, we can employ arbitraryadvanced models to conduct the top-n candidate selection in thenear-line matching system over all candidate ad set, bringing supe-rior performance compared with original roughly truncated online ∗ Both authors contributed equally to this work. matching system. Since 2019, TFMS has been deployed in our pro-ductive display advertising system, bringing (i) more than improvement of impressions for advertisers who encountered trun-cation before, (ii) . RPM (Revenue Per Mile) gain for advertisingsystem, which is significant enough for the business.

CCS CONCEPTS • Information systems → Computational advertising ; Dis-play advertising . KEYWORDS display advertising, truncation-free matching system

Online display advertising has played an important role in market-place in last several decades [4, 5]. Advertisers bid for online trafficaccording to their marketing demands. For example, a clothingshop bids for users who have shopping needs of clothes throughthe advertising system when these users visit Alibaba’s e-commercesite. Different from search advertising (also known as sponsoredsearch) where user intentions can be captured naturally throughquery, display advertising has no explicit information about userintentions. Thus, it is challenging for display advertising systemsto match user traffic and ads suitably w.r.t. both user experienceand advertising efficacy.Practically, display advertising systems usually provide tools foradvertisers to select the target audiences. This is called targeting in online advertising term. Taking the display advertising systemat Alibaba as an example, there are different kinds of targetingways for advertisers to select audiences they want to bid for. Welist several typical ones as follows: • Retargeting : to target users that have already viewed, clickedor even added to cart the advertiser’s products (or similarproducts) recently. • Keywords targeting : to target users who have interests thatare expressed by the keywords, such as ‘travel’, ‘swim’, etc. • Demographic targeting : to target users according to theirdemographic information, such as age, gender, etc. • Automatic targeting : to target users automatically by thesystem, which searches suitable users for advertisers withadvanced models learned from large-scale historical userbehavior data. For example, TDM [13, 15, 16] system has a r X i v : . [ c s . I R ] F e b i et al. served as an important automatic targeting tool for the dis-play advertising system at Alibaba.With the help of these targeting tools, now advertisers can selecttarget audiences and bid for them with differentiated bidding prices.When a user visits the e-commerce site, the display advertising sys-tem receives a traffic request. Then system processes the request inthree steps usually: (i) Matching Step : checks all the involved adsthat bid for this user and selects top- 𝑛 suitable candidate ads. Usu-ally active users can be bid by tens of thousands of ads. In this way,only a part of ads can be chosen to send to the next step, consideringthe ads’ revenue and relevance in a rough manner. (ii) RankingStep : evaluates precisely each ads’ revenue with advanced models[9, 11, 12] and ranks them. (iii)

Auction Step : decides which adswin the auction and calculates the cost respectively [6, 14]. It isworth mentioning that matching step mainly has two aspects offunctionalities: • Matching system should ensure that advertisers can reacheach target audiences they bid for; • Matching system should be responsible for the advertisingplatform’s revenue and user experience. user crowd1crowd2crowd4crowd5 ad1ad2ad3ad4ad5ad6ad7ad8ad9crowd3 truncated truncated user → crowd stage crowd → ad stage

Figure 1: Illustration of the two-stage matching process inthe display advertising system. Candidate ads are selectedby execution of ‘user -> crowd’ and ‘crowd -> ad’ stages. Trun-cation strategies are used in both stages to reduce long-tailcalculation for saving online latency and computation cost.

So far as we know, matching module of most industrial displayadvertising systems is designed under a two-stage paradigm. Whenreceiving a traffic request, a.k.a. a user visit, the first stage of match-ing is to find the crowds that the user belongs to. The term ‘crowd’in display advertising system is a concept associated with adver-tiser’s targeting. ‘crowd’ can be viewed as tag information thatusers in the same crowd own. It is the bridge to link users and ads. For example, if an ad campaign decides to bid for users whoseage ranging from 18 to 25 through demographic targeting, then allusers with age ranging from 18 to 25 is a crowd. After finding thecrowds, the second stage of matching is to retrieve all the ads thathave targeted those crowds. So far, the matching stage successfullyfind valid ad candidates. Figure 1 illustrates the two-stage matchingprocess.Actually, the two-stage matching architecture exists widely inindustrial applications apart from display advertising system. Forexample, when handling user request in search engine, a ( user query -> keywords -> documents ) two-stage retrieval process is executedbased on inverted index [3]. In recommender systems, the item-based collaborative filtering algorithm follows the ( user -> triggeritem -> similar item ) pipeline. Without loss of generality, we cansummarize this kind of two-stage matching architecture into a samepattern, i.e., ( user -> intermediate token -> item ).For display advertising systems in a start-up business, both thevolumes of ‘crowds’ and ‘ads’ are small enough and the two-stagematching system can run perfectly. However, in most industrialcases, such as the display advertising at Alibaba, with volume ofcrowds reaching up to tens of millions and volume of ads reachingup to millions, the truncation problem of the two-stage matchingsystem will bring severe hurt. Truncation exists in both ( user -> crowd ) and ( crowd -> ad ) stages of the matching system, as illus-trated in Figure 1. In few words, to finish the matching step underlimited latency and computation cost, real world display advertisingsystems often truncate (or even skip) those long-tail user-crowdpairs and crowd-ad pairs. Thus, not all advertisers that bid for agiven user have the chance to participate in the matching process.Moreover, the truncation strategy is often determined in an offlinemanner, which is usually executed when building the user -> crowd index and crowd -> ad index. It means that advertisers be truncatedwill never have the chance to participate in the matching processunless the offline built indices are re-built or updated. In our system,the volume of ads that linked to a given crowd cannot exceed 2,000,which is carefully tuned to meet system latency and computationcost requirements. Table 1 shows the truncation statistics in oursystem. Obviously, truncation results in sub-optimal advertisingperformance for advertisers. Besides, it also brings loss of revenueto the advertising platform. Table 1: Truncation statistics in the display advertising sys-tem at Alibaba

User-Crowd Crowd-AdPercentage of truncated pairs 36% 21%In this paper, we rethink the challenge from a system design’sview and propose a truncation-free solution which has already beendeployed in the display advertising system at Alibaba. We observethat the key conflict lies between the system limitation (limitation ofonline latency and the computation cost) and the long-tail volumesof user-crowd and crowd-ad pairs. Based on this insight, we proposea new architecture for the matching system towards truncation-freematching. The basic idea is to decouple the matching computation runcation-Free Matching System for Display Advertising at Alibaba from the online processing pipeline. Instead of executing the two-stage matching when user visits, our new design utilizes a near-linetruncation-free matching module to pre-calculate and store thosetop valuable ads for each user. Then, the online pipeline just needs tofetch the pre-stored candidate ads as the result of matching. In thisway, near-line matching computation can jump out of the limitationof online system and leverage flexible computation resources tofinish the user-ad matching process. Moreover, we can employarbitrary advanced models to conduct the top-n candidate selectionin the near-line matching system, bringing superior performancecompared with original roughly rule-based truncation in onlinematching system. We name the proposed solution as T runcation- F ree M atching S ystem (TFMS).To the best of our knowledge, our paper is the first work thatstudies carefully the widely existed truncation problem in matchingsystem of display advertising and propose a truncation-free solution.Since 2019, TFMS has been deployed in our productive displayadvertising system. Advertisers that encountered truncation inoriginal two-stage online matching system gets more than improvement of impressions in TFMS. Besides, advertising platformalso enjoys benefits. For example, in our banner ads product, TFMSbrings . RPM (Revenue Per Mile) gain, which is significantenough for the business. It’s worth mentioning that the proposedtruncation-free near-line matching solution not only works well inthe display advertising system, but also is a general framework forother applications that use truncated two-stage matching. We hopethat our work could bring some motivations for these applicationsfrom a new perspective.The rest of this paper is organized as follows: Section 2 intro-duces some background of crowd targeting in display advertising.Section 3 and 4 describe in detail the design of the proposed TFMS.Section 5 describes how we implement TFMS in our productiveadvertising system. Section 6 gives experiments about TFMS com-pared with traditional two-stage online matching system. Section 7concludes our work.

In this section, we give a brief introduction of matching module inthe display advertising of Alibaba, especially explains in detail thetruncation problem, to help readers understand why we need todesign a novel truncation-free matching system.

Like many other industrial systems, matching module in the displayadvertising system of Alibaba follows a common ( user -> crowd -> ad ) two-stage structure to retrieve candidate ad set. Figure 1illustrated this process. • ( user -> crowd ) stage . As described in the introduction sec-tion, we provide different targeting tools for advertisers foraudience selection. We name each tool as a matching chan-nel in the online serving manner. That is, for each user, amulti-channel way is used to find the crowds he belongs to.Each channel handles a specific type of crowds, like retar-geting channel, keywords targeting channel, or model-based automatic targeting channel. Crowds collected from eachchannel are then merged together. • ( crowd -> ad ) stage . Given crowds merged from each chan-nel, ads that have targeted those crowds are retrieved froman inverted index which contains crowd -> ad pairs. Consid-ering the computing cost of online serving, a pre-rankingmodule [10] is also applied to further reduce the size of user-ad pairs for the following ranking module. However, due tosystem performance limitation, here the pre-ranking modelhas to been simple enough. Facing with billions of traffic every day, truncation strategies arewidely used in both stages to meet the latency and computationcost requirements for online serving.In user -> crowd stage, an active user usually belongs to thou-sands of effective crowds, even in a single channel. Here effective means these crowds are targeted by advertisers. For example, ayoung mother may have diverse interests including milk powdershopping, yoga fitness, swimming, traveling etc. She may be tar-geted by advertises from all these businesses. If all these crowdsare sent to the second stage, it will cause serious latency problemin processing real-world traffic. Practically, we truncate the crowdvolume for each channel separately, considering the crowd -> ad volume for different type of crowds. For example, a crowd of

20 to25 years old young girls from demographic targeting may link tensof thousands of ads, while a crowd of people who have added myproduct to cart might link only dozens of ads. In our real system,the truncation threshold for crowd volume is usually less than 100.Here truncation is conducted in each channel by its own truncationmodel, which can be viewed as some rule-based statistical scores.Obviously, it is rough and has no connection with ads that havetargeted the truncated crowds.In crowd -> ad stage, as described above, a popular crowd maybe targeted by tens of thousands of advertisers. To retrieve allthese long-tail candidate ads from this crowd is not acceptable, inwhich latency cost will exceed the limitation for online serving.To overcome this, a limitation of ads number for a single crowd isemployed by performing an offline pre-truncation strategy. Again,truncation is conducted by some rule-based approaches, such asaverage CTR of each ad in the last 7 days etc. In our system, thevolume of ads that linked to a given crowd cannot exceed more than2000, which is carefully tuned to meet the latency and computationcost requirement. Table 2: Simulated performance of two stages by applyingtruncation-free strategy directly in our real system

Truncation-Free Stage Latency Simulated Revenue( user -> crowd ) +26% +5.5%( crowd -> ad ) +17% +2.5%both stages +51% +5.6%To further get an intuitive understanding, we make online sim-ulations in our real system by deploying truncation-free strategyand without considering the latency and computation cost. These i et al. simulations are performed using logged data. Table 2 gives the re-sults. Here we only calculate the impact from advertising platform’sview. Obviously, without truncation the two-stage matching is ofhigher revenue theoretically but unacceptable latency, as in ourreal system an increase of latency is unacceptable.Now it is clear to see the hurt from truncation problem. Here wesummarize it formally: • Truncation causes those advertisers that have bid for se-lected target audiences to lose chance for participating inthe matching process. This is unfair for them, as even theseadvertisers further rise the bid price, things will be the same. • Truncation causes the advertising platform to lose chanceto evaluate precisely each candidate ad, thus resulting insub-optimal platform revenue and user experience.

To tackle the challenges caused by truncation in the above discussedtwo-stage online matching system, in this section we give detailedreview of the matching problem in display advertising, thus derivea truncation-free solution. It is worth mentioning that approachesdiscussed here can also be extended to other applications using thesame two-stage matching architecture.

Given a user 𝑢 , let C ( 𝑢 ) be the set of all valid crowds that the userbelongs to: C ( 𝑢 ) = { 𝑐 | 𝑢 belongs to 𝑐 } . (1)Here 𝑐 denotes a crowd. Given 𝑐 , let A ( 𝑐 ) be the set of all valid adsthat have selected 𝑐 as the targeting crowd: A ( 𝑐 ) = { 𝑎 | crowd 𝑐 is targeted by 𝑎 } . (2)Then we can define the total valid candidate set O ( 𝑢 ) for user 𝑢 as O ( 𝑢 ) = {( 𝑎, 𝑐 ) | 𝑐 ∈ C ( 𝑢 ) ∧ 𝑎 ∈ A ( 𝑐 )} . (3)We treat each element in O ( 𝑢 ) as an ad-crowd pair as that advertisermay have different bid prices on different target crowds even forthe same ad, i.e., 𝑏𝑖𝑑 ( 𝑎, 𝑐 ) ≠ 𝑏𝑖𝑑 ( 𝑎, 𝑐 ) .Now, the objective of matching in display advertising can beviewed as an optimization problem w.r.t. a matching system 𝑓 : maximize 𝑓 ∑︁ 𝑢 𝑅 ( 𝑓 ( 𝑢 )) subject to 𝑓 ( 𝑢 ) ⊂ O ( 𝑢 ) 𝐿 ( 𝑓 ( 𝑢 )) ≤ 𝑙𝑐𝑎𝑟𝑑 ( 𝑓 ( 𝑢 )) = 𝑛, (4)where 𝑓 ( 𝑢 ) is the matching result of user 𝑢 given matching system 𝑓 , 𝑅 (·) is a metric function to measure the reward of both advertiser’sperformance (as well as fairness) and advertising platform’s rev-enue, 𝐿 (·) measures the system performance of 𝑓 , such as latency,computation cost, etc. 𝑐𝑎𝑟𝑑 (·) measures the volume of matchingresult 𝑓 ( 𝑢 ) . The goal of designing a matching system 𝑓 is to select aset of candidate ads 𝑓 ( 𝑢 ) with size 𝑛 from all valid candidates O ( 𝑢 ) to maximize the reward 𝑅 , while satisfying the requirement of on-line system performance 𝑙 . It is trivial that (cid:205) 𝑢 𝑅 ( 𝑓 ( 𝑢 )) reaches the maximal value when 𝑓 ( 𝑢 ) equals to O ( 𝑢 ) . That is, matching system 𝑓 returns all the valid ( 𝑎, 𝑐 ) pairs. In this case, both the advertisersand the advertising platform get the maximal satisfaction.Practically, it is not straightforward to measure the satisfactionfrom advertiser side in reward function 𝑅 (·) . But it is easy to provethat when advertising platform’s revenue reaches maximal value,advertisers’ performance also reaches maximal point [7]. Thus,without loss of generality, in the rest of this paper, we simplifyreward function 𝑅 (·) to be calculated just from the advertisingplatform side.Given user 𝑢 , the reward of 𝑅 ( 𝑓 ( 𝑢 )) can be substituted by theaveraged value across set 𝑓 ( 𝑢 ) : 𝑅 ( 𝑓 ( 𝑢 )) = (cid:205) ( 𝑎 𝑖 ,𝑐 𝑖 ) ∈ 𝑓 ( 𝑢 ) 𝑟 𝑢 ( 𝑎 𝑖 , 𝑐 𝑖 ) 𝑛 , (5)where 𝑟 𝑢 is some value measurement (e.g., eCPM in display adver-tising system) for a given ad-crowd pair ( 𝑎, 𝑐 ) for user 𝑢 . Optimal matching system.

Based on the above definition, ifwe ignore the system performance requirement, i.e., 𝐿 ( 𝑓 ( 𝑢 )) ≤ 𝑙 ,the optimal solution of matching system 𝑓 would be to select theset of top- 𝑛 ( 𝑎, 𝑐 ) pairs from O ( 𝑢 ) under 𝑟 𝑢 measure: 𝑓 𝑜𝑝𝑡 ( 𝑢 ) = argTop-n ( 𝑎,𝑐 ) ∈ O ( 𝑢 ) 𝑟 𝑢 ( 𝑎, 𝑐 ) . (6)Note that here optimal solution should also takes into considerationof advertisers’ dynamic operation during the whole advertisingperiod, since that 𝑟 𝑢 is a real-time value measurement.In real-world business, without truncation, the volumes of validad set O ( 𝑢 ) are usually larger than the ones that can be affordedby the online serving system. As introduced in Table 4, the full setof O ( 𝑢 ) is larger than the truncated one on average. Thus,traditional truncated two-phase matching usually uses truncationstrategies to shrink 𝑂 ( 𝑢 ) to a subset of proper size. We concludeand denote the solution of truncated two-stage online matchingsystem as 𝑓 : C ( 𝑢 ) = argTop-m 𝑐 ∈ C ( 𝑢 ) 𝑠 𝑢 ( 𝑐 ) , C ( 𝑢 ) ⊂ C ( 𝑢 ) A ( 𝑐 ) = argTop-k 𝑎 ∈ A ( 𝑐 ) 𝑠 𝑢 ( 𝑎 ) , A ( 𝑐 ) ⊂ A ( 𝑐 ) O ( 𝑢 ) = {( 𝑎, 𝑐 ) | 𝑐 ∈ C ( 𝑢 ) ∧ 𝑎 ∈ A ( 𝑐 )} , O ( 𝑢 ) ⊂ O ( 𝑢 ) 𝑓 ( 𝑢 ) = argTop-n ( 𝑎,𝑐 ) ∈ O ( 𝑢 ) 𝑟 𝑢 ( 𝑎, 𝑐 ) (7)where 𝑠 𝑢 ( 𝑐 ) , 𝑠 𝑢 ( 𝑎 ) denote score functions for crowd 𝑐 and ad 𝑎 respectively, i.e., rule-based statistics in our real system; 𝑚, 𝑘 arethe truncation numbers for ( user -> crowd ) and ( crowd -> ad ) stages.Not surprisingly, we have: 𝑅 ( 𝑓 ( 𝑢 )) < 𝑅 ( 𝑓 𝑜𝑝𝑡 ( 𝑢 )) , (8)that is, truncated two-phase online matching system yields a non-optimal solution. Besides, for ( 𝑎, 𝑐 ) pairs truncated by 𝑓 , i.e., thepairs in set O ( 𝑢 )\ O ( 𝑢 ) , they lose the chance to bid for user 𝑢 . Thisarises unfairness and brings bad bidding experiences for advertisers,which is also harmful for the advertising platform’s both short termand long-term revenue. runcation-Free Matching System for Display Advertising at Alibaba usercrowdad truncation byrule-based score Truncated Two-stage Online Matching System TFMS: Decoupled with Near-Line and Online Modules online module near-line module online moduleusercrowdfull adtruncation-freetruncation-free usertop-nadsnear-lineranking top-nads validationtop-nadsnear-linematching user → crowdcrowd → ad top-nadspre-ranking truncation byrule-based scoreuser traffic user traffic

Figure 2: Comparison between traditional two-stage online matching system and our proposed TFMS. Different from two-phase online matching process, which is performed only in online system, TFMS decouples matching process into near-linearcalculation and online fetch parts, which enables to achieve truncation-free matching.

The brute-force calculation over all valid candidate ads in the on-line serving manner is of great challenge, under both latency andcomputation cost limitation. In the past years, we have tried ourbest to optimize the system performance to allow more ads intothe online serving system. However, with the advance of our ad-vertising business, more advertisers enter our system, making thetruncation problem more serious.Rethinking the problem, we find the truncation is actually causedby the long-tail distribution of user -> crowd pairs or crowd -> ad pairs. To cover these long-tail calculations fully in an onlineserving manner seems very difficult, especially under a strict latencylimitation. On the other hand, the matching process is repetitivelyexecuted for a given user with multiple visits to the advertisingsystem, that is, the computing cost is wasted in the fully onlineserving manner. For example, in our system each user visits . times on average every day. This motivates us to design a newmatching system, which we name as T runcation- F ree M atching S ystem (TFMS). Near-line matching design.

The key idea of TFMS is to decou-ples the matching process from online serving and moves it into anear-line system. We use an additional near-line matching moduleto generate candidate top ( 𝑎, 𝑐 ) pairs for each user asynchronously,and the online process just needs to fetch the generated candidates.Since the asynchronous near-line calculation has no latency limit, atruncation-free traverse over O ( 𝑢 ) is feasible. Besides, considering:(i) the full set of O ( 𝑢 ) is larger than the truncated one onaverage; (ii) we can save computation with decoupled man-ner, as each user we need to execute the matching process once;(iii) more flexible resources can be used in the near-line manner,e.g., utility of servers is often low when the traffic is low, suchas the early morning, it is affordable for TFMS to fully evaluate full set of O ( 𝑢 ) and select top-n valuable ads. That is, TFMS is areasonable design of optimal matching system. We conclude TFMSby the following matching system 𝑓 : 𝑓 ( 𝑢 ) = OnlineFetch (cid:34)

Near-Line (cid:32) argTop-n ( 𝑎,𝑐 ) ∈ O ( 𝑢 ) 𝑟 𝑢 ( 𝑎, 𝑐 ) (cid:33)(cid:35) . (9) In this section, we discuss in detail the design of our proposed TFMSsolution.

Unlike traditional truncated two-stage matching system that mostcomputation is carried out in online serving manner with real-time user traffic, TFMS adopts an asynchronous near-line solution,as illustrated in Figure 2. TFMS also applies a two-stage processto get the valid candidate ad set, i.e., the ( user -> crowd -> ad )process. Both of these two stages are truncation-free with no latencylimitation. Note that in TFMS, arbitrary advanced models can beused to evaluate each candidate ad, which greatly enhance theability of top-n selection. Then, the generated top- 𝑛 ( 𝑎, 𝑐 ) pairs foreach user is cached for online serving when user visits. It is worthmentioning that due to the time difference between online fetchingand near-line caching, some the of cached top- 𝑛 ( 𝑎, 𝑐 ) pairs may nolonger be valid. For example, the ad campaign’s budget may run out,or the campaign may even be canceled. Thus, a validation process inTFMS’s online module is used to filter those invalid cached results. The core part of TFMS is the maintenance of top- 𝑛 valuable can-didate ads for each user for any time. More specifically, we hope i et al. user user-crowdservice crowd-adserviceadsmanagementplatform △ △ △ (merged by user)crowd-userservice rankingservicemini-batchactionsfull Delta Update oldtop-nupdatedtop-ntop-n initializationtop-n adsstorage

Near-line Matching Component Near-line Ranking Component

Daily Update advertiser mini-batchactions

Figure 3: The updating process for top- 𝑛 ads. Our framework contains a fully update which calculates top-n ads snapshot forall users and a delta update process to update top- 𝑛 ads based on old top- 𝑛 ads and delta ads. that 𝑓 in Eq.(9) can be as close to 𝑓 𝑜𝑝𝑡 in Eq.(6) as possible. It isclear that keeping the consistency of O ( 𝑢 ) and 𝑟 𝑢 ( 𝑎, 𝑐 ) in onlineand near-line system of TFMS is the critical point.For O ( 𝑢 ) , there are many factors that can arise near-line andonline inconsistency. For example, user’s newly updated behaviormay change the crowds that the user belongs to; advertiser’s op-eration on ad campaigns can influence both ( user -> crowd ) and( crowd -> ad ) results. For measure function 𝑟 𝑢 ( 𝑎, 𝑐 ) , how to keep theconsistency depends on the specific value measurement in differ-ent advertising businesses. Take the measure function with eCPM(effective cost per mile) in CPC (cost per click) advertising systemas an example, prediction value of CTR and ad’s bidding price arekey factors to keep the consistency of 𝑟 𝑢 ( 𝑎, 𝑐 ) .Based on these observations, TFMS designs a framework thathas two update pipelines for keep consistency and maintains thetop- 𝑛 valuable candidate ads: (i) fully update pipeline which onlyruns once every day as an initialization; (ii) delta update pipelinewhich runs every 5 minutes incrementally, as illustrated in Figure 3. • Fully Update.

We use daily full initialization as the baseresults of 𝑓 ( 𝑢 ) . To perform the full initialization, we acquirethe truncation-free crowd set C ( 𝑢 ) for all users, then matchthe truncation-free ad set A ( 𝑐 ) for all crowd 𝑐 in C ( 𝑢 ) toget O ( 𝑢 ) . Then user-wise top- 𝑛 valuable ads are ranked andselected by measure function 𝑟 𝑢 ( 𝑎, 𝑐 ) for all users. • Delta Update.

Delta update is necessary since that the dailyinitialization is not able to provide top valuable ads withstrong timeliness. Hence, a delta updating mechanism is de-signed to receive both user and advertiser’s actions as inputto update the user-wise top ads list. It’s worth mentioningthat the delta update process triggered by both user and ad-vertiser’s action only considers the incremental part of ( 𝑎, 𝑐 ) pairs that occurred in every delta time interval, such as 5minutes in our system. Section 5.4 will give some implemen-tation details about the design of delta update in practice.Note that, theoretically only the version of TFMS with real-timedelta update is a kind of optimal matching system. However, accord-ing to our practice, a 5-minutes delta update is approximately opti-mal, which in turn saves much computation cost since it increases the resource utility of near-line system in mini-batch manner whilenot in single request manner.To support the two above kinds of update, TFMS further designstwo key components, near-line matching and near-line ranking. Near-line matching component plays an important role in TFMSto provide ad candidates in both full daily initialization and deltaupdate of top valuable ads list. Like traditional truncated two-stageonline matching system, the near-line matching component con-tains a user-crowd service and a crowd-ad service to conduct ( user -> crowd ) and ( crowd -> ad ) retrieval to get O ( 𝑢 ) . As illustrated in Fig-ure 3, the ads management platform that exists in both traditionalonline matching and TFMS’s near-line matching module handlesoperations from advertisers to ensure the accurate results of A ( 𝑐 ) in real time. The near-line matching component mainly has twodifferences compared with traditional online matching module.Firstly, benefiting from the near-line design in which exists nolatency limitation, the user-crowd service and crowd-ad serviceboth provide truncation-free retrieval operation. Secondly, an addi-tional crowd-user service is added in TFMS’s matching componentto support user-wise delta update. The crowd-user service can beviewed as an inverted version of user-crowd service, which canprovide ( crowd -> user ) look-up operation. Imagining that whenan advertiser changes the targeting crowds of its ad, TFMS’s deltaupdate mechanism ensures that each user belongs to the involvedtargeted crowds be aware of the advertiser’s operation and furtherupdate those corresponding users’ top-n ads list. In this process, thecrowd-user service is necessary to help quickly locate the relevantusers. With the candidate ads generated by near-line matching compo-nent, a ranking service including click-through rate or conversionrate prediction is designed to select the final top-n candidate ads.Again, without latency limitation, the near-line ranking componentcan employ arbitrary advanced models [8] to conduct the top-n can-didate selection. This ranking process is far more accurate than that runcation-Free Matching System for Display Advertising at Alibaba in the truncated two-stage online matching system, which consistsof ruled-based truncation models and a simple pre-ranking modelunder strict online serving performance. This obviously bringssuperior performance for TFMS.

In section 4 we introduce in detail the ideal design of TFMS. How-ever, in real world applications, we may still suffer from severalimplementation challenges. Here we share our practice on the im-plementation of TFMS in our productive display advertising system.

As mentioned earlier, our display advertising system provides dif-ferent targeting tools for advertisers. Among these targeting tools,automatic targeting is one of the most popular way used by adver-tisers. In automatic targeting, our platform searches with advancedmodel to get high quality users for advertisers, which means theseads can be valid candidates for every user. If we consider automatictargeting in TFMS, size of O ( 𝑢 ) may reach the level of full ad set,i.e. millions in our system, as almost all advertisers have adoptedautomatic targeting tools. On the other hand, automatic targetingcan directly generate top-n candidate ads from the whole ad corpusfor each advertiser, without using the two-stage matching process.In our system, full-corpus retrieval models such as ANN[2] andTDM[13, 15, 16] are developed along another direction, which suf-fers no truncation problem. Thus, we only implement TFMS withall other targeting tools except automatic targeting. For each user, top- 𝑛 valuable candidate ads 𝑓 ( 𝑢 ) needs to be stored.In our practice, 𝑛 cannot be too small. We set it to be 5000 in ourreal system, considering both resource cost and business perfor-mance. Besides, we also need to maintain the user-crowd storage C ( 𝑢 ) and the crowd-ad storage A ( 𝑐 ) . Further, we need to fetchand update these storages very often. Hence, these storages shouldbe both write- and read-friendly. In practice, C ( 𝑢 ) and 𝑓 ( 𝑢 ) areimplemented using memory storage like key-value table [1] and A ( 𝑐 ) is implemented by inverted index. It is worth to mention that C ( 𝑢 ) storage is used in both crowd-user and user-crowd servicewith different key-value setting using the same source data. There are hundreds of millions of users visiting our system everyday, making the fully update for all users remaining a great chal-lenge. Clearly, it’s impossible for us to calculate 𝑓 ( 𝑢 ) for all usersat the same time, as it is unfriendly and risky for TFMS whichwill bring impulse response to the system. Besides, not all monthlyactive users visit our system every day, which means computationfor other in-active users is wasteful. To solve this, we implementthe fully update in the following ways: (i) firstly, we only calculateusers that have actions in recent days to reduce the computationcost, which can still covers 98% of daily requests in our system;(ii) secondly, we process active users as streams to share the samestream process structure for updating with parallel strategy to gen-erate 𝑓 ( 𝑢 ) . That is, the fully update is executed into several parallel streaming pipelines. The parallelism can be adjusted flexibly ac-cording to the resources that TFMS can use. For delta update, the incremental part caused by advertisers’ actionssuffers a huge amplification effect, as the volume of crowd can reachtens of millions, which means a single update action of an ad canresult in millions of update message. This amplification effect isvery unfriendly. Also, the computation cost needed for this kind ofupdate is also huge for us.Two types of solutions are considered by us to solve this chal-lenge. One way to handle this kind of pressure is to take a trade-offbetween real-time performance and computing resources. We use awindow function mechanism to aggregate updates for same user in5-10 minutes to significantly decrease the QPS (query per second)of update process. Another approximate way of handling this isto discard certain advertiser actions from user view. Only activeuser will be affected by advertiser action. For users who have ac-tions today, we follow the full update process to recompute 𝑓 ( 𝑢 ) after user’s action. In this way, advertiser’s action that affect thisactive user of today will be renewed automatically. In practice, weimplement the first strategy. In this section, we give some offline and online results about ourinvestigation and practice on TFMS for display advertising at Al-ibaba. Through the experimental results, we hope to answer threequestions:

RQ1:

Whether truncation is a severe problem in traditionaltwo-stage matching or not?

RQ2:

Weighing advertising performance and additional compu-tation cost, is TFMS a cost-effective solution to solve the truncationproblem?

RQ3:

What are the actual gains for the advertising system afteradopting TFMS solution?

Table 3: Truncation statistics for different targeting types

Targeting Type Truncation PercentageAd UserRetargeting 43% 60%Keywords Targeting 9% 85%Demographic Targeting 36% 90%In Table 1, we have already given some data about the percent-age of truncated ( user -> crowd ) and ( crowd -> ad ) pairs in ouradvertising system. We can see that on average 36% of ( user -> crowd ) pairs and 21% of ( crowd -> ad ) pairs are truncated. Gettinginto other dimensions, Table 3 gives the online truncation statisticsw.r.t. different targeting types. As the bridge between advertisersand users, the three typical types of targeting all suffer from severetruncation from the data, which brings obstacles for advertisers i et al. Table 4: Simulation results for truncation-free matching in online systems using real-world user and ad data

Truncation-Free Stage Time RPM PPC PCTR user -> crowd ) +26% +5.2% +4.4% +0.7% +57% +69%( crowd -> ad ) +17% +2.5% +1.8% +0.6% - +112%both stages +51% +5.6% +4.5% +1.1% +57% +259%to effectively deliver their ads to the target users. To further quan-tify the influence of truncation about advertising performance, weconduct offline simulation experiments.The simulation is performed on a set of random-sampled dis-play advertising traffic at Alibaba of Item Ads scenario in a sin-gle day, which contains nearly one million real users. We replaythese user traffic with the same setting as online systems in anoffline simulation environment using real-world data. Three kindof truncation-free strategies are compared with base truncated two-stage matching: truncation-free matching in ( user -> crowd ) stage,truncation-free matching in ( crowd -> ad ) stage, and truncation-freematching in both stages.Before giving the results, we firstly introduce the metrics thatwe use to estimate the performance. Revenue Per Mille, or RPM,is a commonly used metric in online advertising. It is the averageadvertising earning of every 1,000 ad impressions. In our simulation,the RPM is estimated by the predict click-through-rate (PCTR) timesadvertiser’s cost per click (PPC). We also evaluate the responsetime and the volume of user-crowd and user-ad pairs calculated inmatching system for each different strategy.Simulation results are given in Table 4. We find that all of thesethree truncation-free strategies yield better RPM results. For thetruncation-free matching in both stage strategy, the RPM metricincreases 5.6% compared to traditional truncated two-stage match-ing, which is significant enough in our application. Besides, thevalid user-ad pairs number increases 259%, which means that manyadvertisers gain additional chances to bid for their wanted user traf-fic. However, from the simulation, we find that the truncation-freematching strategy also causes 51% of processing time increase inmatching stage, which is unacceptable for online system. Using near-line matching calculation, TFMS achieves truncation-free matching while brings little online latency and computationcost increase. However, to implement truncation-free matching,there are also other solutions, such as increasing the degree of onlinematching’s parallelism along with more computation resource. Toverify that whether the proposed TFMS is a cost-effective solution,we conduct quantitative comparison based on the aforementionedoffline simulation experiments and the results are listed in Table 5.In the comparison, we use the number of user-ad pairs thatthe system needs to process to represent the computation cost ofdifferent solutions. Regarding that the traditional truncated two-stage matching’s computation cost is , TFMS’s computation costis no more than . taking both the fully update and delta updateinto account (it is hard for us to give accurate computation cost fordelta update, however, in our real system, delta update takes lesscomputation cost than fully update). Correspondingly, the online truncation-free solution (denoted as ‘Online Parallelization’ in thetable) takes a computation cost of . , which is obviously largerthan TFMS. Note that the fully update computation cost of TFMSis . , since that each daily active user will trigger . times ofadvertising requests on average, while only needs once fully updatein TFMS. With this strategy, TFMS is much more cost-effectivecompared with directly perform online truncation-free matching. Table 5: Truncation-free cost estimation based on offlinesimulation

Method ( O ( 𝑢 )) ∗ 𝑟𝑒𝑞𝑢𝑒𝑠𝑡 ( O ( 𝑢 )) ∗ 𝑟𝑒𝑞𝑢𝑒𝑠𝑡 ( O ( 𝑢 )) ∗ 𝑢𝑠𝑒𝑟 TFMS has been fully deployed in display advertising at Alibabasystem since 2019, which brings great value for both the plat-form and advertisers. In our application, TFMS is deployed in twomain display advertising scenarios,

Banner Ads and

Item Ads . Theearly deployment of TMFS is performed in an incremental way, i.e.,reserving the traditional truncated two-stage matching pipelinewhile adding the additional TFMS pipeline at the same time, for thesmoothness of advertising performance. The fully substitutionaldeployment of TFMS is carried out on schedule step by step. Alltargeting types except the automatic targeting are built in TFMS.

Table 6: Performance on platform revenues in two ad scenar-ios in which TFMS is implemented

Scenario Time RPM CTR PPCBanner Ads +1% +9.4% -0.4% +9.9%Item Ads +2% +5.1% -0.9% +5.9%From both the advertising platform’s view and the advertiser’sview, the deployment of TFMS brings additional values, and thepractical online results are listed in Table 6 and Table 7. Firstly,we can observe that TFMS brings 9.4% and 5.1% RPM increasefor

Banner Ads and

Item Ads respectively, which is a significantimprovement considering the volume of advertising revenue atAlibaba. Secondly, as important tools for advertisers to select targetaudience, the main target types win more impressions after adopt-ing truncation-free matching. For example, the total impression runcation-Free Matching System for Display Advertising at Alibaba

Table 7: Performance on advertiser winning impressions indifferent targeting types

Targeting Type Winning ImpressionsBanner Ads Item AdsRetargeting +12% +26%Keywords Targeting +23% +39%Demographic Targeting +112% +136%All types +51% +39%number of retargeting, keywords targeting and demographic tar-geting in

Banner Ads increases 51% relatively, which brings moreflexible biding experience for advertisers.

In this paper, we proposed a novel matching design TFMS to handlethe truncation problem for matching module in display advertisingat Alibaba. Online A/B test shows the efficiency of TFMS in bothplatform revenue and advertiser experience. TFMS not only givesa way to solve truncation problem, but more importantly, gives anovel possibility to explore matching methods in a way withoutonline latency limitation. It’s commonly believed that in onlineindustrial systems, matching module cannot be too complicateddue to limiting computing power. We believe that TFMS shows apossible way to handle more advanced models in ad retrieval, whichis significant for many matching systems in display advertising.

REFERENCES [1] [n.d.]. Tair, A distributed key-value storage system developed by Alibaba Group.https://github.com/alibaba/tair.[2] Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networksfor youtube recommendations. In

Proceedings of the 10th ACM conference onrecommender systems . 191–198.[3] Doug Cutting and Jan Pedersen. 1989. Optimization for dynamic inverted indexmaintenance. In

Proceedings of the 13th annual international ACM SIGIR conferenceon Research and development in information retrieval . 405–411.[4] David S Evans. 2009. The online advertising industry: Economics, evolution, andprivacy.

Journal of economic perspectives

23, 3 (2009), 37–60.[5] Avi Goldfarb and Catherine Tucker. 2011. Online display advertising: Targetingand obtrusiveness.

Marketing Science

30, 3 (2011), 389–404.[6] Junqi Jin, Chengru Song, Han Li, Kun Gai, Jun Wang, and Weinan Zhang. 2018.Real-time bidding with multi-agent reinforcement learning in display advertising.In

Proceedings of the 27th ACM International Conference on Information andKnowledge Management . 2193–2201.[7] Vijay Krishna. 2009.

Auction theory . Academic press.[8] Qi Pi, Guorui Zhou, Yujing Zhang, Zhe Wang, Lejian Ren, Ying Fan, XiaoqiangZhu, and Kun Gai. 2020.

Search-Based User Interest Modeling with Lifelong Sequen-tial Behavior Data for Click-Through Rate Prediction . Association for ComputingMachinery, New York, NY, USA, 2685–2692. https://doi.org/10.1145/3340531.3412744[9] Hongwei Wang, Fuzheng Zhang, Xing Xie, and Minyi Guo. 2018. DKN: Deepknowledge-aware network for news recommendation. In

Proceedings of the 2018world wide web conference . 1835–1844.[10] Zhe Wang, Liqin Zhao, Biye Jiang, Guorui Zhou, Xiaoqiang Zhu, and KunGai. 2020. COLD: Towards the Next Generation of Pre-Ranking System.arXiv:2007.16122 [cs.IR][11] Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, XiaoqiangZhu, and Kun Gai. 2019. Deep interest evolution network for click-through rateprediction. In

Proceedings of the AAAI conference on artificial intelligence , Vol. 33.5941–5948.[12] Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, YanghuiYan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-throughrate prediction. In

Proceedings of the 24th ACM SIGKDD International Conferenceon Knowledge Discovery & Data Mining . 1059–1068. [13] Han Zhu, Daqing Chang, Ziru Xu, Pengye Zhang, Xiang Li, Jie He, Han Li, JianXu, and Kun Gai. 2019. Joint optimization of tree-based index and deep modelfor recommender systems. In

Advances in Neural Information Processing Systems .3971–3980.[14] Han Zhu, Junqi Jin, Chang Tan, Fei Pan, Yifan Zeng, Han Li, and Kun Gai. 2017.Optimized cost per click in taobao display advertising. In

Proceedings of the 23rdACM SIGKDD International Conference on Knowledge Discovery and Data Mining .2191–2200.[15] Han Zhu, Xiang Li, Pengye Zhang, Guozheng Li, Jie He, Han Li, and Kun Gai.2018. Learning tree-based deep model for recommender systems. In

Proceedingsof the 24th ACM SIGKDD International Conference on Knowledge Discovery & DataMining . 1079–1088.[16] Jingwei Zhuo, Ziru Xu, Wei Dai, Han Zhu, Han Li, Jian Xu, and Kun Gai. 2020.Learning Optimal Tree Models under Beam Search. In