A Deep Prediction Network for Understanding Advertiser Intent and Satisfaction
Liyi Guo, Rui Lu, Haoqi Zhang, Junqi Jin, Zhenzhe Zheng, Fan Wu, Jin Li, Haiyang Xu, Han Li, Wenkai Lu, Jian Xu, Kun Gai
AA Deep Prediction Network for UnderstandingAdvertiser Intent and Satisfaction
Liyi Guo , Rui Lu , Haoqi Zhang , Junqi Jin , Zhenzhe Zheng* , Fan Wu , Jin Li , Haiyang Xu Han Li , Wenkai Lu , Jian Xu , Kun Gai Shanghai Jiao Tong University, Alibaba Group, Tsinghua University{liyiguo1995, zhanghaoqi39, zhengzhenzhe}@sjtu.edu.cn, [email protected]{dashi.lr, junqi.jjq, echo.lj, shenzhou.xhy, lihan.lh, xiyu.xj}@[email protected], [email protected]
Abstract
For e-commerce platforms such as Taobao and Amazon, adver-tisers play an important role in the entire digital ecosystem: theirbehaviors explicitly influence users’ browsing and shopping expe-rience; more importantly, advertiser’s expenditure on advertisingconstitutes a primary source of platform revenue. Therefore, pro-viding better services for advertisers is essential for the long-termprosperity for e-commerce platforms. To achieve this goal, the adplatform needs to have an in-depth understanding of advertisers interms of both their marketing intents and satisfaction over the ad-vertising performance, based on which further optimization couldbe carried out to service the advertisers in the correct direction. Inthis paper, we propose a novel Deep Satisfaction Prediction Network(DSPN), which models advertiser intent and satisfaction simultane-ously. It employs a two-stage network structure where advertiserintent vector and satisfaction are jointly learned by considering thefeatures of advertiser’s action information and advertising perfor-mance indicators. Experiments on an Alibaba advertisement datasetand online evaluations show that our proposed DSPN outperformsstate-of-the-art baselines and has stable performance in terms ofAUC in the online environment. Further analyses show that DSPNnot only predicts advertisers’ satisfaction accurately but also learnsan explainable advertiser intent, revealing the opportunities tooptimize the advertising performance further.
CCS Concepts • Information systems → Online advertising ; •
Applied com-puting → Electronic commerce . This work was supported in part by National Key R&D Program of China No.2019YFB2102200, in part by Alibaba Group through Alibaba Innovation ResearchProgram, in part by China NSF grant No. 61902248, 61972252, 61972254, 61672348,and 61672353, in part by Joint Scientific Research Foundation of the State EducationMinistry No. 6141A02033702, in part by the Open Project Program of the State KeyLaboratory of Mathematical Engineering and Advanced Computing No. 2018A09. Theopinions, findings, conclusions, and recommendations expressed in this paper arethose of the authors and do not necessarily reflect the views of the funding agenciesor the government.*Z. Zheng is the corresponding author.Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected].
CIKM ’20, October 19–23, 2020, Virtual Event, Ireland © 2020 Association for Computing Machinery.ACM ISBN 978-1-4503-6859-9/20/10...$15.00https://doi.org/10.1145/3340531.3412681
Advertiser
Ad Platform
Tag 1: 12$Ad position 1: 1.3 … PV: 5000ROI: 0.1 … Tag 1: 12$ → → … … Ad auctions;
Generate reportInitialize an ad unitAdjust the ad unit based on the report
PV: 8000
ROI: 0.3 … PV:5000ROI:0.1 … Ad auctions;Update report
Figure 1: An representative interaction process between anadvertiser and the ad platform.
Keywords
E-commerce; Display Advertisement; Advertiser Intent Identifica-tion; Advertiser Satisfaction Prediction
ACM Reference Format:
Liyi Guo, Rui Lu, Haoqi Zhang, Junqi Jin, Zhenzhe Zheng, Fan Wu, Jin Li,Haiyang Xu, Han Li, Wenkai Lu, Jian Xu, and Kun Gai. 2020. A Deep Predic-tion Network for Understanding Advertiser Intent and Satisfaction. In
Pro-ceedings of the 29th ACM International Conference on Information and Knowl-edge Management (CIKM ’20), October 19âĂŞ23, 2020, Virtual Event, Ireland .ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3340531.3412681
As online-shopping growing rapidly in popularity, e-commerceplatforms such as Taobao and Amazon have become a primaryplace for users to search and purchase products [7, 8, 10]. With ahuge number of potential users in online platforms, an increasinglylarge number of sellers rely on e-commerce platforms for adver-tising their products. Although advertising expenses constitute amajor revenue income of e-commerce platforms, little attentionhas been given to investigate advertiser’s behavior, intent or sat-isfaction either from academia or industry community. Previousstudies mainly focused on modeling and understanding the behav-iors of users, such as widely-studied Click-Through Rate (CTR)prediction [4, 6, 9, 11, 30, 31] and user intent or satisfaction esti-mation in diverse contexts [3, 12, 17, 18, 25, 27]. As an ad platform a r X i v : . [ c s . S I] A ug an be modeled as a two-sided market between users and advertis-ers [8], ignoring the actions and feedback from advertisers wouldsignificantly reduce the efficiency of user-advertiser matching, anddeteriorate the performance of ad systems in the long term. From arecently launched survey for advertisers in Taobao displaying adplatform, we observe that around half of the new advertisers wouldleave the ad platform after ten days, due to their marketing goalsare not well satisfied. Therefore, in this work, we initialize a newresearch direction: focusing on advertiser intent understandingand advertiser satisfaction prediction, which would be leveraged tofurther improve the performance of ad systems.Advertisers achieve their ad marketing goals through takingdifferent actions for ad campaigns, resulting in frequent and richinteractions with the ad platforms. We illustrate one representativeinteraction process between an advertiser and the ad platform inFigure 1. The advertisers launch an ad campaign for a specificproduct by initializing an ad unit : adding demographic tags oftargeting users and setting basic bid prices. The ad unit participatesin a series of ad auctions over a period of time, and the statisticalindicators about the ad performance, such as CTR, CVR (ConversionRate), ROI (Return of Investment), and etc, are recorded in a report.After observing the performance report, the advertiser would takeseveral adjusting actions, such as adding/deleting some tags ormodifying bid prices, to further improve the ad performance underthe guideline of her marketing goal. The ad auction mechanismswould react to the adjustment of advertiser and generate new adperformance. We observe that if an advertiser is satisfied with theupdated ad performance, in most cases she would like to continueto invest in advertising; otherwise, she would eventually close thead campaign and leave the platform. Thus, the long-term revenueof ad platforms largely depends on whether the advertisers aresatisfied with the performance of their ad campaigns. It is highlynecessary to provide algorithms for the ad platforms to predictadvertiser’s satisfaction.During the advertising campaign, heterogeneous advertiserswould have quite different preferences over the performance indica-tors, demonstrating various marketing intents. Although our casestudy is in a display advertising system with cost per click (CPC),advertisers still have different marketing intents beyond Click num-bers. For example, for new launched products, advertisers tend toreach users as many as possible in a given time period, in which PV(Page View) and Click number often act as the key factors; for theadvertisers with the goal of maximizing transaction revenue, theywould pay more attention to indicators of Paynum (Pay Number)and Payamt (Pay Amount). Figure 2 illustrates the examples of thesetwo advertisers realizing their different intents through a series ofactions. The first advertiser would like to maximize user impres-sions, while the second advertiser attempts to maximize revenue.To provide better personalized ad services and optimize ad systemperformance in correct directions, the ad platform needs to have adeep understanding of the marketing intents of advertisers.In this paper, we propose a novel advertiser-side Deep Satisfac-tion Prediction Network (DSPN), which jointly models the adver-tiser intent and predicts advertiser satisfaction. Specifically, DSPNis a supervised learning framework that employs a two-stage hi-erarchical structure: through transformation of features, attentionmechanism and recurrent neural networks, the first stage leverages Smoothly increase bid from 30 to 90. Raise bid to 300. (a) Maximizing Impression.
Smoothly increase bid from 150 to 300.
Raise bidto 800. (b) Maximizing Revenue.
Figure 2: Advertisers realize their various intents throughincreasing bid prices. To understand advertisers, we shouldget more details to model and identify advertisers’ intents. both advertiser’s action information and various ad performanceindicators to learn the potential intent, which is further fed intothe second stage; the second stage models the relation between ad-vertiser intent vector and satisfaction explicitly in a principled way.Based on the above architecture, DSPN automatically learns theconnections among raw data, advertiser intent and metrics relatedto advertiser satisfaction in e-commerce. Our contributions of thiswork can be summarized as follows:(1) To the best of our knowledge, we are the first to consideradvertiser-side intent identification and satisfaction prediction. Weformally define advertiser intent as a multi-dimension weight vec-tor over ad performance indicators, which can be derived from amathematical optimization problem. We evaluate the advertiser’ssatisfaction by a new metric based on the change of investment onad campaigns.(2) We point out the challenges in directly applying user-sidedmodels to the advertiser-sided problems since they did not modelthe intent in a unified continuous space. Thus, we propose a DeepSatisfaction Prediction Network (DSPN) which jointly model ad-vertiser intent and satisfaction. By employing action fusion layerand recurrent network structures, DSPN effectively extracts infor-mation critical for understanding advertiser intent and satisfaction,from diverse data features underlying the interaction between ad-vertisers and the ad platform.(3) We conduct extensive experiments on an Alibaba advertise-ment dataset. Results verify the effectiveness of DSPN both in sat-isfaction prediction and intent vector generation. The generatedintent vector not only exhibits nice interpretation for advertiser’sbehavior but also reveals the potential optimization objectives tofurther improve the advertiser satisfaction and ad system perfor-mance. Moreover, DSPN has now been deployed in Alibaba onlineadvertiser churn prediction system and serves daily business re-quirements.
In traditional information retrieval literature, there are exten-sive existing work in predicting user satisfaction with a searchingquery under different user intents [3, 17, 26, 27]. In [26], the authorsconsidered the keywords submitted in an ad auction as a directexpression of intent and use them to determine the relevance of thead and user query. Recently, researchers initialized the directionf user intents modeling and user satisfaction prediction in recom-mendation systems [12, 18, 23, 25]. In [18], the authors identifieddifferent kinds of user intents in music recommendation systemthrough interviews, surveys and quantitative approaches, and pro-posed a multi-level hierarchical model to predict user satisfactionbased on their intents. In [25], the authors classified user intents inproduct searching into three categories, including target productsearching, decision making and new product exploration. Theyobserved that different intents usually imply different interactionpatterns, which can be further used to predict user satisfaction.Advertiser intent modeling is different from these existing worksin two main aspects. First, advertiser intent is more difficult tomodel than user intent. User intents in searching, recommendationsystem, and e-commerce platform are relatively explicit, accompa-nied with different behavior patterns, and can be identified intocoarse discrete categories through surveys or interviews. Accordingto our preliminary investigation, most of the advertisers’ intentsare a mixture of different optimization objectives, including PV,CTR, CVR, etc. Thus, instead of classifying advertiser intents intopre-defined discrete categories, we need to quantify advertiser in-tents in a unified continuous space. Furthermore, advertisers do nothave appropriate channels to explicitly express their preferencesover different performance indicators in most ad platforms. Sec-ond, advertiser’s action is more complicated and dynamic than userbehavior, the method to identify user intents is usually based on fine-grained interaction signals, such as mouse clicks and dwell time,according to the observation that user’s behavior pattern is highlyrelated to user’s instant intent. However, advertiser’s operationsexhibit more complicated characteristics than user actions: differentadvertisers have different strategies for advertising, resulting invarious action patterns; action patterns of the same advertiser couldalso change during an ad campaign due to the underlying evolvingintents. Therefore, we may not only use fine-grained interactionsignals to capture advertiser intent and satisfaction.
In e-commerce platforms, click-through rate (CTR) indicatesthe user’s interest and satisfaction over recommendation items.Recently, a number of works have been done in designing CTRprediction models via deep learning approaches [4, 6, 9, 19, 22, 29–31]. Most of these models are based on the structure of embeddingand multilayer perceptron [31]. To further improve the expressionability of the model, Wide & Deep [4] and DeepFM [11] used differ-ent modules to learn low-order and high-order features. PNN [19]proposed a product layer to capture interactive patterns betweeninter-field categories. DIN [31] introduced the attention mechanismin CTR prediction problem to capture user’s diverse interests. DIEN[30] further improved DIN by designing an additional networklayer to capture the sequential interest evolving process.There are three main differences between advertiser satisfactionprediction and CTR prediction. First, we do not have a clear label ofwhether the advertiser is satisfied with the ad performance, while inCTR prediction, we can easily get the label by checking whether theuser clicks the ads. Second, advertiser satisfaction is more complexthan user click action. We not only need to consider advertisers’satisfaction but also explore their hidden intent, which is the under-lying factors to determine advertiser satisfaction, while most CTR prediction models do not explicitly learn the user intent. In thiswork, we also identify advertiser intent during satisfaction predic-tion. Finally, CTR prediction models aim at building a correlationbetween a user and an item. However, advertiser comes to the adplatform to improve ad performance or product sale volume insteadof a specific item. Thus we should focus on building a correlationbetween an advertiser and her ad performance.
In this section, we first describe the interaction between adver-tisers and the displaying ad platform in Taobao. We then introducethe definitions of advertiser intent and satisfaction based on ourobservations from practical ad systems.In Figure 1, we show an example of the interaction between adver-tisers and the Taobao displaying ad platform, in which advertiserscan start or close an ad campaign at an appropriate time and takevarious actions during the ad campaign to achieve their marketinggoals. To initialize an ad campaign for an ad unit, the advertiserneeds to choose a set of demographic tags and the correspondingbasic bid prices to effectively and efficiently target the potentialusers. The advertiser would also select preferred ad positions (e.g.,ad slots in “Home Page” or “After Shopping Page”) and a premiumrate to improve the performance of ad exposure. The advertiserwould like to declare a higher bid, i.e., basic bid × premium rate for the preferred ad positions. The advertisers participate in the adauctions when targeting users visit the ad positions. If the ad unitwins the auction, the corresponding ad is shown to the users, and acertain price is charged to the advertiser once the user clicks the ad,following the cost per click (CPC) paradigm. Various auctions areconducted for the ad unit during the campaign, and the accumu-lated Key Performance Indicators (KPIs), such as CTR, Cost, ROI,and etc., are recorded in the report. After a certain period (e.g., oneday or one week), the advertiser would check the KPIs report andconduct a series of actions to adjust the ad campaign when someperformance indicators do not match her expectation. The poten-tial operations of advertisers could be adding or deleting certaintags/positions or changing bid prices or premium rates. The ad unitwith the adjusted features enters the ad auctions for competitionand obtains new marketing performance.Our goal in this work is to formally define the metrics to measurethe advertiser marketing intent and satisfaction, build a connectionbetween them via extracting the useful information from the aboveinteractions, and identify advertiser intent and predict advertisersatisfaction based on the proposed metrics and connection. Advertisers use the Taobao ad platform with different marketingintents. However, existing work in the literature usually assumedthe advertisers simply optimize a single objective, such as CTRor CVR, and the specific formulation of advertiser’s practical andvarious intents is not well established. Our main focus in this sub-section is to evaluate advertiser intent based on a survey and anintuitive observation from a mathematical optimization model.To fully understand advertisers’ marketing goals, we conduct asurvey with questions about the major reasons of using Taobaodisplaying ad platform and the preferred performance indicators.We collect feedback from 791 advertisers, and show the results in able 1: Summary of advertiser survey.
Marketing Intents Ratio Indicators RatioTarget potential users actively 68% ROI, CVR 86%Lever non-advertisement traffic 52% CTR, PPC 56%Cost-effective advertising 46% CART, CLT 55%Table 1. We note that the ratio of each column in Table 1 does notneed to sum up to one, indicating the advertisers may have multi-ple intents and care for different performance indicators, such asROI, CVR, CTR, CART (
Add to Cart number) and CLT (
Collection number), simultaneously. As a result, advertiser intent may not besimply characterized by a single indicator as in the existing work.Its formulation should reflect various concerns of the advertiser andcontain the major indicators of the ad campaign. We introduce anadvertiser-specific weight vector w = [ w , w , · · · , w n ] ⊤ to definethe advertiser’s intent by assigning each indicator an importanceweight, which represents the advertiser’s preference over differ-ent indicators. One simple interpretation for the weight vector isthat the advertiser may optimize her actions, such as adjustingbid or premium rate, through solving a multi-objective optimiza-tion problem. We can also derive the weight vector by solving themaximization problem for a specific indicator but with additionalconstraints for other indicators, which can be illustrated as follows:max bid pvs . t . click ≥ α , cost ≤ β , ppc ≤ γ , (1)where the performance indicators pv , click , cost , ppc are functionsof bid . By employing Lagrange multipliers, we can derive the fol-lowing Lagrangian function for the above optimization problem: L( bid , ˆw ) = pv + w ( click − α ) + w ( β − cost ) + w ( γ − ppc ) = ˆw ⊤ I + b , (2) where ˆw = [ , w , − w , − w ] ⊤ , I = [ pv , click , cost , ppc ] ⊤ and b = − w α + w β + w γ . The Lagrange multipliers have a niceinterpretation in economics [24]. We can regard the Lagrange mul-tipliers as some kind of prices for violating the indicator constraint.Therefore, we consider the Lagrangian function L( bid , ˆw ) as thetotal benefit of declaring the bid , and the vector [ ˆw ⊤ , b ] ⊤ as theweight vector w of advertiser’s intent. However, there are severalobstacles for optimizing (2) explicitly. Due to information asymme-try between the ad platform and the advertiser: the ad platform hasno direct knowledge of the constraints in (1), e.g., the constraintparameters α , β , γ . This may come from the scenario that the ad-vertisers would not like to reveal this information due to privacypolicy and business secret, or from the scenario that the advertisersmay not have the feasible channels to express their optimizationobjectives in the current ad platform.To resolve the lack of information about optimization parameters,we have a critical observation from the interaction between thead platform and advertisers. The intents motivate advertisers totake different actions after checking the performance report, andthus their intents can be estimated by their historical action traces,such as deleting or adding tag actions, and the performance report.From the previous advertiser survey, we also observe the marketingintent of advertisers is usually consistent during a certain period.Thus, we can use the advertiser’s historical information from a certain observation period to predict her current intent using ourdesigned deep learning model in next section. It is quite challenging to determine whether the advertisers aresatisfied with the performance of their ad units, compared withthe scenario in the user’s side, where we have direct and clearsignals, such as click or buying actions, to infer their satisfactionover recommendation items. One potential metric to measure theadvertiser satisfaction level is the Lagrange function in (2), whichis a weighted average of all advertising performance indicators.However, due to the above-mentioned information asymmetry, weare unable to calculate this metric exactly.From empirical observation, we find that various indicators arehighly related to advertiser satisfaction, which could be exploitedto define an appropriate metric to infer the level of advertiser satis-faction via indirect signals. Churn label is a classical satisfactionmetric to measure a customer’s satisfaction over the subscribed ser-vices, e.g., various works [2, 16, 20, 28] define a churned customerin different contexts based on the following signals: (1) No placingan order over a year in ASOS platform [2], (2) No logging in over30 consecutive days in TikTok platform [16], (3) No activity over aweek in Snapchat [28]. Similarly, if an advertiser is unsatisfied withthe performance of an ad unit, she would finally stop investing forthe ad campaign. Thus we can adopt investment-related indicatorsto infer the advertiser’s satisfaction to some extent, such as takerate (the ratio of cost to GMV) or simply the cost level. From theabove discussion, we regard an advertiser is unsatisfied with theperformance of an ad unit during a period [ l − l , l ) if the cost inperiod [ l , l + l ) is less than or equal to ϵ , which is a small amountof money close to 0. Otherwise, the advertiser is satisfied with (orat least not so unsatisfied) with the ad unit. Usually, we do not set ϵ equal to 0 because an advertiser may leave before her budgetruns out. Due to the performance delay in the ad platform, we usethe cost level in a later period to evaluate the satisfaction level ina previous period. With this definition, we can have a direct andeasy-calculated signal to obtain the satisfaction label for the data. In this section, we propose a novel deep network architecture,Deep Satisfaction Prediction Network (DSPN), for the advertiserintent learning and satisfaction prediction. As shown in Figure 3,DSPN is a supervised learning framework with two stages: intentlearning stage and satisfaction prediction stage. The learnable pa-rameters are all deployed in the intent learning stage, which extractsinformation from the advertiser’s historical data to learn intent vec-tor w , which is forwarded to the second stage. The satisfactionprediction stage combines the intent vector w and advertiser’s his-torical performance report to calculate whether an advertiser issatisfied over the performance of ad units. We first investigate which types of features related to advertiser’sintent and satisfaction. As heterogeneous advertisers may havevarious intents and satisfaction levels for different ad units, theprediction model needs to involve the ID features of advertisersand ad units. It is obvious that the specific values of KPIs in theeport are highly related to the advertiser’s satisfaction levels. Thus,we also need to consider the performance indicators during thetraining of prediction model. Advertisers’ behavior patterns havecertain relation with their satisfaction over the ad performance. Forexample, when we count the number of actions of an advertiser for aspecific ad unit during an observation period with length 10, we findthat the average number of modification actions in positive label(satisfied) is 11 . . . . We transform our sparse features into low-dimensional densefeatures through embedding. For categorical feature, e.g. ID features,we convert them into a dense real-valued embedding vector m .For features with both categorical and numerical information, wepresent them with e = v ⊙ m , where the embedding vector m indicates categorical information, the value v is the normalizedvalue indicating numerical information, and ⊙ is the element-wiseproduct operator. Furthermore, the value v of performance indicatorfeature is the value of the corresponding KPI; the value v of ultimateaction feature is the value of bid price/premium rate at the end ofthe day; the value v of sequential action feature is the value ofbid price/premium rate after the action minus the value before theaction.After the embedding layer, features of the same type like IDfeatures are concatenated into a vector. Fully-connected layers areused for reshape the vector when necessary. For sequential actionfeatures, we further divide them into two groups: tag features andad position features. Each of the group contains a sequence ofaction features within the day, and we connect the embeddingvectors of these actions chronologically to form a list represented by E = [ e , e , ..., e n a ] ∈ R n e × n a , where e i is the embedding vector ofthe i -th sequential action with dimension n e , and n a represents thenumber of actions in this group within a day. The embedding layer istrained at the same time with the model during the training process.We use sum/average pooling to transform multiple embeddingvectors into a fixed-length vector when necessary. In the intent learning stage, we employ multilayer bidirectionalRNNs on the top to learn the advertiser intent based on informationfrom l sequential days. We use a fusion layer to capture importantpattern in sequential actions for each day in the bottom. Action Fusion Layer.
Advertiser’s sequential actions such as “add a tag with price x” or “delete a tag” contain rich informationof her strategies or preferences over various KPIs. We use a fusionlayer to get a summary of advertiser’s sequential actions withineach day. Different approaches like sum pooling, average poolingor attention mechanism [1] can be used in the fusion layer. Sinceadvertisers may try different actions before they finally figure outthe correct actions to do within a day, we suggest employ theattention mechanism to identify important actions within the actionsequence, considering the advertiser’s current situation. In detail,the attention mechanism is formulated as: V = softmax ( V Q ⊤ √ n a ) · V (3)where V ∈ R n a × n e is the sequential action matrix of one day and V = E ⊤ , values of n a and n e represent the number of actions ineach day and the embedding dimension, respectively. The matrix Q ∈ R n a × n e is the summary of the advertiser’s ID features, dailyperformance report and daily ultimate action information on thesame day, and is reshaped to the same size as V . One can obtain Q through MLP, RNN or other structures. With the attention mech-anism, we can learn an advertiser-specific representation for theaction pattern. Matrix V is given to the next level together withother feature representations after pooling. RNN Layer.
Recurrent neural network models sequential dataexplicitly [15] and gives us inspiration to exploit the advertiserintent based on her historical information. Imagine the procedurethat an advertiser reviews her recent ad performance and actions,and adjust the actions to obtain updated ad performance if her mar-keting intent is not well fulfilled. Thus, there are some underlyingrelation between advertiser actions and advertiser intent. We em-ploy RNN to investigate the dynamic representation of advertiser’sdaily information related to advertiser intent. To balance efficiencyand performance, we employ GRU [5] as GRU overcomes vanishinggradient problem and is faster than LSTM [13]. The formulationsof GRU are listed as follows: r t = σ ( W r e t + U r h t − + b r ) (4) z t = σ ( W z e t + U z h t − + b z ) (5) h t = tanh ( W h e t + U h ( r t ⊙ s t − ) + b h ) (6) s t = z t ⊙ s t − + ( − z t ) ⊙ h t (7)where e t is the representation of t -th day’s report and action in-formation from the previous level, s t is the t -th hidden states, σ isthe sigmoid function and ⊙ is the element-wise product operator.Because advertisers may review the historical information back andforth, we would like the representation of each day to summarizenot only the preceding days, but also the following days. Hence,we propose to use two layers of bidirectional recurrent network(Bi-GRU) [21]. For each Bi-GRU layer, the hidden state of the t -thday can be represented by concatenating the hidden state in theforward GRU and the backward GRU as h t = [−→ h t , ←− h t ] . We obtain Concat & Flatten
Action Fusion Layer
Concat & Flatten
Action Fusion Layer
Concat & Flatten
Action Fusion Layer
Bi-GRU Bi-GRU Bi-GRU
Concat & Flatten
Action Fusion Layer
Bi-GRU … SUM Pooling SUM Pooling
SUM Pooling
SUM Pooling Dot … Dot
Dot Average
Pooling
Bi-GRU Bi-GRU Bi-GRU Bi-GRU
Output
Embedded Ad ID FeaturesEmbedded Daily Performance Report Features
Embedded Daily Ultimate Action FeaturesEmbedded Daily Sequential Action Features
Sigmoid
SigmoidSigmoid
ReportReportReport
Intent Learning Stage Satisfaction Prediction Stage
Figure 3: Deep Satisfaction Prediction Network. The left part is the intent generation stage, which computes the intent vector w , the right part is the satisfaction prediction stage, which predicts advertiser satisfaction based on ad unit’s performancereport and advertiser’s intent vector. the representation of w by adding the l − w = −→ h l − + ←− h l − . In satisfaction prediction stage, we build a connection betweenadvertiser intent vector and advertiser satisfaction level. Accordingto Section 3.1, there exists a positive correlation between w ⊤ I i and advertiser satisfaction, where I i represents the performancereport in day i . Following this intuition, we assign the probabilityof the advertiser being satisfied with the ad performance duringthe period [ l − l , l ) as: p ( x ) = l l − (cid:213) i = l − l σ ( w ⊤ I i ) , (8)where σ is sigmoid function and w is the advertiser intent vectorlearned from the previous stage. The loss function of the predictionmodel is defined as follows: L = − N N (cid:213) ( x , y )∈D ( yloдp ( x ) + ( − y ) loд ( − p ( x )) , (9)where D is the training set with size N , x is the input of the modeland y is the label. In this section, we evaluate DSPN in detail and summarize ourexperiments as follows: (1) We verify the effectiveness of DSPNin intent identification and satisfaction prediction, and discuss theimpact of different data features through offline experiments. (2)Analyses of intent vector w show that w can fully capture theadvertiser intent in practice. (3) Online evaluation shows that ourmodel performs well for different advertiser satisfaction modelingtasks and has stable performance with around 0 . ± . The dataset we used is collected from Alibaba displaying adsystem. There are about 1 million ad units from around 1 . × advertisers in our dataset, and each ad unit forms a unique sample.The ad units we selected are those with cost larger than 10 units in period [ l − l , l ) . We randomly divide the data set into a training setand a test set according to a 9 : 1 ratio. Each data sample containsan ad unit’s information from the observation period [ l − l , l ) with length l =
10, and the satisfaction label is marked for eachdata sample using the definition in Section 3.2 with the parameters l =
10 and ϵ =
10. We set these parameters based on our businessexperience and data analysis. Specifically, when l is too small, itmay not reflect real satisfaction phenomenon of users; when l istoo large, it would involve many already churned ad units duringthe observation period, or the downstream tasks would wait a longtime (at least l days later) to get the satisfaction predicted results.The parameter ϵ is a small amount of money that can prevent noisydata introduced by ad performance delay, and enable the model todo early churn warning before budget runs out. The details of mainfeatures in the dataset are summarized as follows: ID Features.
ID features contain basic profile and identity in-formation about ad unit, product category and advertiser, and areused to provide personalized information in model training.
Report Features.
Report features refer to the performance indi-cators of an ad unit within each day from the observation period.There are 15 indicators of report features in total, including Cost,CTR, CVR, ROI, and etc.
Action Features.
Action features contain the ultimate actionfeature and sequential action feature of the advertiser within theobservation period. The detailed data format for action features aredescribed in Table 2.We train DSPN with batch-GD algorithm with batch size 32, usingthe Adam optimizer [14]. The dimension of the hidden state in thefirst Bi-GRU layer is 18, the dimension of the hidden state in thesecond Bi-GRU layer is 16, which is the same as the dimension of w .Different categorical features have different embedding dimensionsin DSPN from 3 to 18, which are fine-tuned to achieve good results.The source code is available online . The problem of advertiser satisfaction prediction we discussedis new in both academia and industry communities, and to the https://github.com/liyiguo95/DSPN est of our knowledge, there are not related models in the litera-ture. Considering that the prediction models for user-side problemsare widespread and mature, we carefully select some widely usedprediction models in user click-through rate prediction, slightlyrevised them to adopt to the scenario of advertiser satisfaction, andregard them as our baselines. Embedding & MLP Model:
The Embedding & MLP model is abasic model of most subsequently developed deep networks [4, 6,11, 19, 31] for CTR modeling. It contains three parts: embeddinglayer, pooling and concat layer, multilayer perceptron.
Wide & Deep [4]:
Wide & Deep model has been proved to beeffective in recommendation systems. It consists of two parts: (1)Wide model, which is a linear model handles the manually designedfeatures. (2) Deep model, which employs the Embedding & MLPModel to learn nonlinear relations among features automatically.
PNN [19]:
PNN can be viewed as an improved version of theEmbedding & MLP Model by explicitly introducing a product layerafter embedding layer to capture high-order feature interactions.
DeepFM [11]:
DeepFM imposes factorization machines as the“wide” part and multilayer perceptron as the “deep” part. The twoparts in DeepFM are trained together and share the input.
DIN [31]:
Based on the Embedding & MLP Model, DIN usesthe attention mechanism to activate related user behaviors. In ourexperiments, we implements DIN to learn the inner relationshipbetween the advertiser profiling and daily features.
Comparative Results.
Since the above mentioned models havedifferent structures, to tune each baseline model, we use sum pool-ing, average pooling, and concatenation to integrate different fea-tures according to the specific model structure. The metrics usedare Area Under the Curve (AUC) and Accuracy (ACC). From Table 3,we observe that DSPN outperforms other models in the problemof advertiser satisfaction prediction, achieving AUC of 0 . . .
0% and 1 .
6% respectively.We conclude that attention mechanism makes DSPN more robustto the tasks with relation to sequential action information.
Impact of Data Features.
To better understand the roles thatdifferent data features play in advertiser satisfaction prediction, weconduct two types of tasks using DSPN. The first task is to predictthe advertiser satisfaction without a selected data feature, while the second task only uses the selected data feature for prediction.We show the results in Table 4, and can see that each data featurehas a positive contribution to the performance of DSPN. ID featurecontributes the least since it only provides basic information of anentity. However, we still maintain ID features so that informationcan be shared among the same entity. For example, ad units belongto certain categories may have good sale in some special timeor areas. Thus, these ad units can then share the performanceindicators for satisfaction prediction. Both daily report feature andaction feature play important roles in satisfaction prediction. Thisobservation shows that the advertiser satisfaction mainly dependson daily ad performance, which proves the necessity of optimizingthe ad results under the guideline of advertiser’s intent.
In this subsection, we show the validity and the effectiveness ofthe weight vector w learned from the first stage in representing var-ious advertiser intents. We also show the necessity of consideringadvertiser intents when predicting their satisfaction. Effectiveness of weight vector w in profiling intents. Weperform a
K-Means clustering analysis on the weight vector w inthe test set and investigate whether w can capture the advertisers’various intents. We use Euclidean distance as the cluster metric in
K-Means clustering and use
PCA to reduce the dimensions of w from 16 to 2, then we visualize the sample distribution of differentclusters in Figure 4. We find the obtained cluster centers well matchthe typical advertisers in practical ad systems. Cluster 1: Active ad units.
Advertisers in this cluster are highquality customers to the ad platform. Most of advertisers feel satis-fied with the ad units in this cluster, because the ad units have astable and good performance in overall indicators.
Cluster 2: Maximizing Impression.
Advertisers in this clustercare more about PV and Click number, and they also care about cost.This indicates that advertisers would like to maximize impressionor Click number in a cost-effective way.
Cluster 3: Maximizing Revenue.
Advertisers in this clusterare insensitive to the cost, and consider more on ROI and CVR.These ad units usually target the loyal customers or members inthe advertiser’s Taobao shop.
Cluster 4: Tail ad units.
Advertisers in this cluster usually donot spend too much money in total. Performance indicators likeclick and pay amount have a positive impact in their satisfaction,while cost has a negative impact in their satisfaction.
Importance of advertiser intent in satisfaction prediction.
We design three experiments to figure out how different intents cansignificantly influence the performance of advertiser satisfaction,and show the results in Table 5. The first column in Table 5 showsthe accuracy of predicting each advertiser’s satisfaction label withher report data I and w from the center of the cluster she belongsto. The results show that these cluster centers can well predict thesatisfaction of advertisers belonging to the same cluster, i.e., theadvertisers having similar marketing intents. In the second exper-iment, we use each cluster center to predict the satisfaction labelof advertisers in other clusters. The results in the second columnof Table 5 show that the prediction accuracy is far worse than thatin the same cluster as we mentioned above, indicating advertiserintents are different among clusters. In the third experiment, we able 2: Action features of Alibaba dataset. Feature Type Target Type Data Format Data ExplanationUltimate Action Tag ( taд _ type , bid _ price ) bid for tags end of the dayAd Position ( position _ type , premium _ rate ) premium rate for ad positions end of the daySequential Action Tag ( taд _ type , price , time ) add a tag with a bid ( taд _ type , old _ price , new _ price , time ) change bid for a tag ( taд _ type , current _ price , time ) delete a chosen tagAd Position ( position _ type , rate , time ) add an ad position with a premium rate ( position _ type , old _ rate , new _ rate , time ) change premium rate for an ad position ( position _ type , current _ rate , time ) delete a chosen ad position Table 3: Comparisons of different models.
Model AUC ACCMLP 0.8501 0.7975Wide & Deep 0.8507 0.7968PNN 0.8519 0.7989DeepFM 0.8538 0.8005DIN 0.8562 0.7951DSPN without GRU 0.9001 0.8414DSPN with multilayer GRU 0.9415 0.8917DSPN with multilayer bidirectional GRU
Task AUC ACC Task AUC ACCw/o ID 0.9428 0.8924 w ID 0.8241 0.7669w/o Action 0.9410 0.8903 w Action 0.9300 0.8853w/o Report w Report
ACC In-cluster ACC Other-clusters Equal RatioCluster 1 0.9962 0.6286 0.9578Cluster 2 0.7745 0.7071 0.8152Cluster 3 0.9657 0.6384 0.8048Cluster 4 0.8702 0.1850 0.5070All 0.8734 0.5669 0.7712 (a) Positive Samples. (b) Negative Samples.
Figure 4: Distribution visualization of intent vector w . Weuse PCA to reduce the dimensions of w from to , andvisualize the sample distribution of different clusters afternormalization. randomly select 5000 samples in each cluster. For each sample, wefind its closest sample in other clusters in terms of Euclidean dis-tance , and we compare the labels of these two samples to find outwhether advertisers with similar reports have the same satisfactionlabel. The results in the last column of Table 5 prove that even withsimilar advertising reports, advertisers’ satisfaction can be differentdue to the differences in intents. The only exception is the samples similar to Cluster 1, which has accuracy of 95 . w pCT R + w pCV R + w pCost as the ranking index in tag recommendation, where pCT R , pCV R and pCost are the predicted indicators and w i comes fromthe intent vector learned from advertisers. Weight average overindicators can also be used as the objective of real-time biddingalgorithms. Our model can be deployed in different business scenarios re-lated to advertiser satisfaction, such as whether an advertiser will bechurned or not, whether the cost or take rate of an ad unit or adver-tiser will increase in the following week, or the advertiser’s attitudetowards an adjust performance report. We have currently deployedour model as a key component in our online advertiser churn pre-diction system to serve different downstream optimization tasks,such as coupon distribution for unsatisfied advertisers, for morethan four months. We use DSPN to predict the advertiser’s satisfac-tion of an advertiser over their all ad campaigns, which is a slightdifferent from our offline setting, in which we use DSPN to predictthe satisfaction over an ad unit. To overcome such a difference, weuse the features like performance report and action informationof an advertiser instead of an ad unit, to train DSPN model. Theaverage AUC of DSPN for the advertiser churn prediction task inone month is around 0 . ± . . ± . . ± . .
91% and recall of 51 .
96% in predictingthe loyal advertisers who have consumption in the recent 10 daysbut will not have consumption in the next 10 days, which largelyoutperforms existing system’s main strategy, a rule-based method.In our nearest optimization through optimizing ad performance,the logging and recharging rate of 8500 advertisers increase by 0 . .
0% respectively than 8500 homogeneous advertisers withoutoptimization.
Conclusion
In this paper, we have investigated the problem of advertiserintent learning and satisfaction prediction. Based on advertiser sur-vey, empirical observations and mathematical analyses, we used aweight vector over advertising performance indicators to model theadvertiser intent. Considering that the satisfied advertisers wouldcontinue to invest in ad campaigns, we proposed a metric relatedto the change of cost to evaluate the advertiser satisfaction. Wethen designed a deep learning network, namely DSPN, to identifyadvertiser intent and predict advertiser satisfaction, using the pro-file information of advertisers and ad units, performance indicatorsand sequential actions. Experimental results on an Alibaba adver-tisement dataset have shown the superiority of DSPN comparedwith the baseline models in terms of AUC and accuracy, and theeffectiveness of using the weight vector to interpret the advertiser’sintent. DSPN has been deployed in the online advertiser churn pre-diction system in Alibaba, and has helped to avoid the unsatisfiedadvertisers to leave the ad platform.
References [1] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural ma-chine translation by jointly learning to align and translate. arXiv preprintarXiv:1409.0473 (2014).[2] Benjamin Paul Chamberlain, Angelo Cardoso, CH Bryan Liu, Roberto Pagliari,and Marc Peter Deisenroth. 2017. Customer lifetime value prediction usingembeddings. In . ACM, 1753–1762.[3] Olivier Chapelle, Shihao Ji, Ciya Liao, Emre Velipasaoglu, Larry Lai, and Su-Lin Wu. 2011. Intent-based diversification of web search results: metrics andalgorithms.
Information Retrieval
14, 6 (2011), 572–592.[4] Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra,Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al.2016. Wide & deep learning for recommender systems. In .ACM, 7–10.[5] Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014.Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).[6] Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks foryoutube recommendations. In . ACM, 191–198.[7] David S Evans. 2008. The economics of the online advertising industry.
Reviewof Network Economics
7, 3 (2008).[8] David S Evans. 2009. The online advertising industry: Economics, evolution, andprivacy.
Journal of Economic Perspectives
23, 3 (2009), 37–60.[9] Kun Gai, Xiaoqiang Zhu, Han Li, Kai Liu, and Zhe Wang. 2017. Learning piece-wise linear models from large scale data for ad click prediction. arXiv preprintarXiv:1704.05194 (2017).[10] Avi Goldfarb and Catherine Tucker. 2011. Online display advertising: Targetingand obtrusiveness.
Marketing Science
30, 3 (2011), 389–404.[11] Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017.DeepFM: a factorization-machine based neural network for CTR prediction. In . 1725–1731.[12] Long Guo, Lifeng Hua, Rongfei Jia, Binqiang Zhao, Xiaobo Wang, and BinCui. 2019. Buying or Browsing?: Predicting Real-time Purchasing Intent us-ing Attention-based Deep Network with Multiple Behavior. In .ACM, 1984–1992.[13] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory.
Neuralcomputation
9, 8 (1997), 1735–1780.[14] Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic opti-mization. arXiv preprint arXiv:1412.6980 (2014).[15] Zachary C Lipton, John Berkowitz, and Charles Elkan. 2015. A critical review ofrecurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019 (2015).[16] Yunfei Lu, Linyun Yu, Peng Cui, Chengxi Zang, Renzhe Xu, Yihao Liu, Lei Li, andWenwu Zhu. 2019. Uncovering the Co-driven Mechanism of Social and ContentLinks in User Churn Phenomena. In . ACM, 3093–3101.[17] Rishabh Mehrotra, Ahmed Hassan Awadallah, Milad Shokouhi, Emine Yilmaz,Imed Zitouni, Ahmed El Kholy, and Madian Khabsa. 2017. Deep sequentialmodels for task satisfaction prediction. In . ACM, 737–746.[18] Rishabh Mehrotra, Mounia Lalmas, Doug Kenney, Thomas Lim-Meng, and GolliHashemian. 2019. Jointly leveraging intent and interaction signals to predict user satisfaction with slate recommendations. In . 1256–1267.[19] Yanru Qu, Han Cai, Kan Ren, Weinan Zhang, Yong Yu, Ying Wen, and Jun Wang.2016. Product-based neural networks for user response prediction. In .IEEE, 1149–1154.[20] Saharon Rosset, Einat Neumann, Uri Eick, Nurit Vatnik, and Yizhak Idan. 2002.Customer lifetime value modeling and its use for customer retention planning.In . ACM, 332–340.[21] Mike Schuster and Kuldip K Paliwal. 1997. Bidirectional recurrent neural net-works.
IEEE transactions on Signal Processing
45, 11 (1997), 2673–2681.[22] Ying Shan, T Ryan Hoens, Jian Jiao, Haijing Wang, Dong Yu, and JC Mao. 2016.Deep crossing: Web-scale modeling without manually crafted combinatorialfeatures. In . ACM, 255–262.[23] Humphrey Sheil, Omer Rana, and Ronan Reilly. 2018. Predicting purchasing in-tent: Automatic feature learning using recurrent neural networks. arXiv preprintarXiv:1807.08207 (2018).[24] Eugene Silberberg. 1972. Duality and the many consumer’s surpluses.
TheAmerican Economic Review
62, 5 (1972), 942–952.[25] Ning Su, Jiyin He, Yiqun Liu, Min Zhang, and Shaoping Ma. 2018. User intent,behaviour, and perceived satisfaction in product search. In . ACM,547–555.[26] Bhanu C Vattikonda, Santhosh Kodipaka, Hongyan Zhou, Vacha Dave, SaikatGuha, and Alex C Snoeren. 2015. Interpreting advertiser intent in sponsoredsearch. In . ACM, 2177–2185.[27] Hongning Wang, Yang Song, Ming-Wei Chang, Xiaodong He, Ahmed Hassan,and Ryen W White. 2014. Modeling action-level satisfaction for search tasksatisfaction prediction. In . ACM, 123–132.[28] Carl Yang, Xiaolin Shi, Luo Jie, and Jiawei Han. 2018. I Know You’ll Be Back:Interpretable New User Clustering and Churn Prediction on a Mobile SocialApplication. In . ACM, 914–922.[29] Shuangfei Zhai, Keng-hao Chang, Ruofei Zhang, and Zhongfei Mark Zhang. 2016.Deepintent: Learning attentions for online advertising with recurrent neuralnetworks. In . ACM, 1295–1304.[30] Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, XiaoqiangZhu, and Kun Gai. 2019. Deep interest evolution network for click-through rateprediction. In . 5941–5948.[31] Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, YanghuiYan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-throughrate prediction. In24th SIGKDD