[PDF] Sequential Click Prediction for Sponsored Search with Recurrent Neural Networks

Abstract

Click prediction is one of the fundamental problems in sponsored search. Most of existing studies took advantage of machine learning approaches to predict ad click for each event of ad view independently. However, as observed in the real-world sponsored search system, user's behaviors on ads yield high dependency on how the user behaved along with the past time, especially in terms of what queries she submitted, what ads she clicked or ignored, and how long she spent on the landing pages of clicked ads, etc. Inspired by these observations, we introduce a novel framework based on Recurrent Neural Networks (RNN). Compared to traditional methods, this framework directly models the dependency on user's sequential behaviors into the click prediction process through the recurrent structure in RNN. Large scale evaluations on the click-through logs from a commercial search engine demonstrate that our approach can significantly improve the click prediction accuracy, compared to sequence-independent approaches.

Full PDF

SSequential Click Prediction for Sponsored Searchwith Recurrent Neural Networks

Yuyu Zhang ∗ Institute of Computing TechnologyChinese Academy of [email protected]

Hanjun Dai ∗ Fudan [email protected]

Chang Xu ∗ Nankai [email protected]

Jun Feng

Tsinghua [email protected]

Taifeng Wang

Microsoft [email protected]

Jiang Bian

Microsoft [email protected]

Bin Wang

Institute of Computing TechnologyChinese Academy of [email protected]

Tie-Yan Liu

Microsoft [email protected]

Abstract

Click prediction is one of the fundamental problemsin sponsored search. Most of existing studies took ad-vantage of machine learning approaches to predict adclick for each event of ad view independently. However,as observed in the real-world sponsored search system,user’s behaviors on ads yield high dependency on howthe user behaved along with the past time, especiallyin terms of what queries she submitted, what ads sheclicked or ignored, and how long she spent on the land-ing pages of clicked ads, etc. Inspired by these observa-tions, we introduce a novel framework based on Recur-rent Neural Networks (RNN). Compared to traditionalmethods, this framework directly models the depen-dency on user’s sequential behaviors into the click pre-diction process through the recurrent structure in RNN.Large scale evaluations on the click-through logs froma commercial search engine demonstrate that our ap-proach can signiﬁcantly improve the click prediction ac-curacy, compared to sequence-independent approaches.

Introduction

Sponsored search has been a major business model for mod-ern commercial Web search engines. Along with organicsearch results, it presents to users with sponsored search re-sults, i.e., advertisements (ads) targeting to the search query.Sponsored search accounts for the overwhelming majorityof income for three major search engines: Google, Yahooand Bing. Even in the US search market alone, it gener-ates over 20 billion dollars per year, the amount of whichstill keeps rising . According to the common cost-per-click(CPC) model for sponsored search, advertisers are onlycharged once their advertisements are clicked by users. Inthis mechanism, to maximize the revenue for search engineand maintain a desirable user experience, it is crucial forsearch engines to estimate the click-through rate (CTR) ofads. ∗ This work was performed when the ﬁrst three authors werevisiting Microsoft Research Asia.Copyright c (cid:13) Source: eMarketer, June 2013.

Recently, click prediction has received much attentionfrom both industry and academia (Fain and Pedersen 2006;Jansen and Mullen 2008). State-of-the-art sponsored searchsystems typically employ machine learning approaches topredict the click probability by using the feature extractedbased on: 1) historical CTR for ad impressions with re-spect to different elements, e.g., CTR of query, ad, user, andtheir combinations (Graepel et al. 2010; Richardson, Domi-nowska, and Ragno 2007); 2) semantic relevance betweenquery and ad (Radlinski et al. 2008; Shaparenko, C¸ etin, andIyer 2009; Hillard et al. 2011).However, most of previous works take single ad impres-sion as the input instance to predict the click probability,without considering dependency between different ad im-pressions. Some recent research work (Xu, Manavoglu, andCantu-Paz 2010; Xiong et al. 2012) pointed out that theycould achieve more accurate click prediction by modelingspatial relationship between ad slots in the same query ses-sion. Inspired by them, we conduct further data analysis tostudy other types of dependency between user’s behaviorsin sponsored search, through which we ﬁnd that user’s be-haviors also yield explicit temporal dependency. For exam-ple, if a user clicks an ad, comes to the ad landing page,but closes it very quickly, the click probability of her nextview of this ad will become fairly low; moreover, if a userhas previously submitted a query on booking ﬂight, he/shewill be with higher probability to click the ads under ﬂightbooking. These ﬁndings motivate us to advance the state-of-the-art of click prediction by modeling the important tempo-ral dependency into the click prediction process. Althoughsome kinds of dependency can be modeled as features, it’sstill hard to identify all of them explicitly. Thus, it is nec-essary to empower the model with the ability to extract andleverage various kinds of dependency automatically.In real-world sponsored search system, the event of anyad impression, click, and corresponding context information(e.g. user query, ad text, click dwell time, etc.) is recordedwith the time stamp in search logs. Thus, it is natural toemploy time series analysis methods to model sequentialdependency between user’s behaviors. Previous studies on We refer to a certain ad shown to a particular user in a speciﬁcsearch result page as an ad impression. a r X i v : . [ c s . I R ] J u l ime series analysis (Kirchgassner, Wolters, and Hassler2012; Box, Jenkins, and Reinsel 2013) usually focused onmodeling trends or periodic patterns in data series. However,the sequential dependency between ad impressions is socomplex and dynamic that time series analysis approachesis not capable enough to model it effectively. On the otherhand, a few most recent studies leverage Recurrent NeuralNetworks (RNN) to model the temporal dependency in data.For example, RNN language model (Mikolov et al. 2010;Mikolov et al. 2011a; Mikolov et al. 2011b) successfullyleverages long-span sequential information among the mas-sive language corpus, which results in better performancethan traditional neural networks language model (Bengioet al. 2006). Moreover, RNN based handwriting recogni-tion (Graves et al. 2009), speech recognition (Kombrink etal. 2011), and machine translation (Auli et al. 2013) systemshave also led to much improvement in the correspondingtasks. Compared to traditional feedforward neural networks,RNN has demonstrated its strong capability to exploit de-pendencies in the sequence due to its speciﬁc recurrent net-work structure.In this work, we propose to leverage RNN to model se-quential dependency into predicting ad click probability. Weconsider each user’s ad browsing history as one sequencewhich yields the intrinsic internal dependency. In the train-ing process of RNN model, features of each ad impressionwill be feedforwarded into the hidden layer, together withpreviously accumulated hidden state. In this way, the depen-dency among impressions will be embedded into the recur-rent network structure. Our experiments on the large scaledata from a commercial search engine reveal that, such RNNstructure can give rise to a signiﬁcant improvement on theclick prediction accuracy compared with the state-of-the-artdependency-free models such as Neural Networks and Lo-gistic Regression.The main contributions of this paper are in three folds: • We investigate the sequential dependency among partic-ular user’s ad impressions, and identify several importantsequential dependency relationships. • We use Recurrent Neural Networks to model user’s clicksequence, and successfully incorporate sequential depen-dency into enhancing the accuracy of click prediction. • We conduct large scale experiments to validate the RNNmodel’s effectiveness for modeling sequential data insponsored search.In the following parts of this paper, we will ﬁrst presentour data analysis results to verify the potential dependencywhich might affects click prediction. Then, we propose ourRNN model for the task of sequence-based click prediction.After that, we describe experimental settings followed by theexperimental results and further performance study. At last,we summarize this paper and discuss some future work.

Data Analysis on Sequential Dependency

To gain more understanding on why sequential informationis important in click prediction, in this section, we will dis-cuss the effects of sequential dependency from multiple per- C T R Figure 1: Correlation between last click dwell time and cur-rent click-through rate.spectives. We collect data for analysis from the logs of acommercial sponsored search system.Once a user clicks an ad, she will enter into the corre-sponding ad landing page and stay for a certain period oftime, which is referred to as the click dwell time. Generally,longer dwell time implies better user experience. For a par-ticular user, all the ad impressions can be organized as an or-dered sequence along with the time. To explore the sequen-tial effect of click dwell time, we ﬁrst pick up all the “ﬁrstclicks” in each user’s sequence, and track whether each adwill be clicked again in its consecutively next impression.The correlation between the previous click dwell time andthe current click-through rate is shown in Figure 1. Fromthis ﬁgure, it is clear to observe an obvious positive correla-tion, i.e., the longer a user stays on an ad’s landing page, themore likely she will click this ad right at the next time.When the click dwell time is less than 20 seconds, we callsuch ad click as a “quick back” one. From the data analy-sis above, we can observe that “quick back” can give rise torather lower click-through rate in the next impression, whichindicates that users tend to avoid clicking the ad once theyhad an unsatisﬁed experience. However, it is not clear howusers behave if they experienced a “quick back” click longtime ago (e.g., half a month). To explore this, we collect allthe “quick back” clicks and calculate the click-through rateafter different time intervals, i.e., the time elapsed since the“quick back” click, respectively. Figure 2 shows the result,where the time interval is binning by half-days. As conjec-tured, along with increasing elapsed time, the overall click-through rate grows signiﬁcantly and then stay steadily ina certain level, which implies that users tend to gradually forget the unsatisﬁed experience with the time passing.Besides studies on the sequential effect of dwell time,we further analyze such effects from the aspect of query. Insponsored search systems, ads are automatically selected bymatching with the user submitted query. Queries also forma time ordered sequence, binding on that of ad impressions.After categorizing the queries into topics based on our pro-prietary query taxonomy, we calculate users’ click-throughrate when they submit each query topic for the ﬁrst time, andcompare with their future CTR on subsequent queries of thesame query topic. Analysis results, as shown in Figure 3, il- In this paper, we present relative click-through rate to preservethe proprietary information. − days) C T R Figure 2: Click-through rate right after a “quick back” clickon the same ad.

Retail Flight Health Movie0%0.5%1.0%1.5%2.0%2.5%3.0%3.5% First Submission Subsequent Submission

Figure 3: Click-through rate on users’ ﬁrst and subsequentsubmissions of a certain type of query.lustrate that if a user has submitted a query belonging to acertain query topic, he/she will become more likely to clickthe ads under the same topic.All the analysis results above indicate that user’s previousbehaviors in sponsored search may cause strong but quitedynamic impact her subsequent behaviors. As long as wecan identify such sequential dependency between user be-haviors, we can design features accordingly to enhance theclick prediction. However, since a big challenge to enumer-ate such dependency in the data manually, it is necessary tolet the model have the ability to learn such kind of depen-dency by itself. To this end, we propose to leverage a widelyused framework, Recurrent Neural Network, as our model,as it naturally embeds dependencies in the sequence.

The Proposed Framework

Model

The architecture of a recurrent neural network is shown inFigure 4. It consists of an input layer i , an output unit, ahidden layer h , as well as inner weight matrices. Here weuse t ∈ N to represent the time stamp. For example, h ( t ) denotes the hidden state in time t . Speciﬁcally, the recurrentconnections R between h ( t − and h ( t ) can propagatesequential signals. The input layer consists of a vector i ( t ) that represents the features of current user behaviors, andthe vector h ( t − represents the values in the hidden layercomputed from the previous step.The activation values of the hidden and output layers arecomputed as h ( t ) = f ( i ( t ) U T + h ( t − R T ) , U y(t) Feedforward Backpropagation h (t) Rh (t-1) h (t-2) h (t-3) i (t-2) i (t-1) i (t) V Figure 4: RNN training process with BPTT algorithm. Un-folding step is set to 3 in this ﬁgure. y ( t ) = σ ( h ( t ) V T ) , where f ( z ) = − e − z e − z is the tanh function we use for non-linear activation, and σ ( z ) = e − z is the sigmoid functionfor predicting the click probability.The hidden layer can be considered as an internal mem-ory which records dynamic sequential states. The recurrentstructure is able to capture a long-span history context ofuser behaviors. This makes RNN applicable to the tasks re-lated to sequential prediction.In our framework, i ( t ) represents the features correlatedto user’s current behavior and h ( t ) represents sequential in-formation of user’s previous behaviors. Thus, our predictiondepends on not only the current input features, but also thesequential historical information. Feature Construction

In our study, we take ad impressions as instances for bothmodel training and testing. Based on the rich impression-centric information, we construct the input features thatcan carry crucial information to achieve accurate CTR pre-diction for a given impression. All these features can begrouped into several general categories: 1)

Ad features con-sist of information about ad ID, ad display position, and adtext relevance with query. 2)

User features include user IDand query (submitted by user) related semantic information.3)

Sequential features include time interval since the last im-pression, dwell time on the landing page of the last clickevent, and whether the current impression is the head of se-quence, i.e., whether it is the ﬁrst impression for this user.With all these diverse types of features, each ad impres-sion can be described in a large and complex feature space.In this paper, all of the click prediction models will fol-low this input feature setting, so that we can fairly comparetheir capabilities of predicting whether an impression willbe clicked by user.

Data Organization

To effectively harness the sequential information for mod-eling temporal dependencies, the training process requireslarge quantity of data. Luckily, the data in computationaldvertising is big enough to make RNN tractable. To ob-tain the data for sequential prediction, we re-organize theinput features along with the user dimension (i.e., reorderedeach user’s historical behaviors according to the timeline).In particular, for the ads in the same search session, we rankthem by their natural display orders in the mainline and thensidebar.

Learning

Loss Function

In our work, the loss function is deﬁnedas an averaged cross entropy, which aims at maximizing thelikelihood of correct prediction, L = 1 M M (cid:88) i =1 ( − y i log ( p i ) − (1 − y i ) log (1 − p i )) , where M is the number of training samples. The ith sam-ple is labeled with y i ∈ { , } and p i is the predicted clickprobability of the given ad impression. Learning Algorithm (BPTT)

RNN can be trained in thesame way as normal feedforward network using backpropa-gation algorithm. In this way, basically, the state of the hid-den layer from previous time step is simply regarded as anadditional input. With only one hidden layer, the networktries to optimize prediction of the next sample given theprevious sample and previous hidden state. However, no ef-fort is directly devoted towards longer context information,which may hurt the performance of RNN.A simple extension of the training algorithm is to unfoldthe network and backpropagate errors even further. This iscalled Back Propagation Through Time (BPTT) algorithm.BPTT was proposed in (Rumelhart, Hinton, and Williams2002), and has been used in the practical application of RNNlanguage model (Mikolov 2012).We illustrate the overall training pipeline that appliesBPTT to the RNN based click prediction models in Figure 4.Such unfolded RNN can be viewed as a deep neural networkwith T hidden layers where the recurrent weight matrices areshared and identical. In this approach, the hidden layer canactually exploit the information of the most recent inputs andput more importance to the latest input, which is essentiallycoherent with the the sequential dependency. In the follow-ing part, we use T to denote the number of unfolding stepsin the BPTT algorithm.The network is trained by Stochastic Gradient Descent(SGD). The gradient of the output layer is computed as e o ( t ) = y ( t ) − l ( t ) , where y ( t ) is the predicted click probability, and l ( t ) is thebinary true label according to the ad is clicked or not. Theweights V between the hidden layer h ( t ) and output unit y ( t ) are updated as V ( t + 1) = V ( t ) − α × e o ( t ) × h ( t ) , where α is the learning rate. Then, gradients of errors arepropagated from the output layer to the hidden layer as e h ( t ) = e o ( t ) V ∗ ( (cid:126) − h ( t ) ∗ h ( t )) , V y(t) h (t) Rh (t-1) i (t) t t-1t+1 Test Samples (user behavior sequence)

Feedforward U Figure 5: RNN testing process with sequential input sam-ples. The hidden state of previous test sample will be usedas input, together with the current sample features.where ∗ represents the element-wise product, and (cid:126) is a vec-tor with all elements equal to one.Errors are also recursively propagated from the hiddenlayer h ( t − τ ) to the hidden layer from previous step h ( t − τ − , that is e h ( t − τ −

1) = e h ( t − τ ) R ∗ ( (cid:126) − h ( t − τ − ∗ h ( t − τ − , where τ ∈ [0 , T ) . The weight matrix U and the recurrentweights R are then updated as U ( t + 1) = U ( t ) − α (cid:34) T − (cid:88) z =0 e h ( t − z ) T i ( t − z ) (cid:35) , R ( t + 1) = R ( t ) − α (cid:34) T − (cid:88) z =0 e h ( t − z ) T h ( t − z − (cid:35) . Note that, in our practical experiments, we add the biasterms and L2 penalty of weights to the model, and the gra-dients can still be computed easily based on a slight modiﬁ-cation to the equations above.

Inference

In contrast to traditional neural networks, RNN has a recur-rent layer to store the previous hidden state. In the inferencephase, we still need to store the hidden state of previous testsample, and feedforward it with the recurrent weights R .Figure 5 illustrates the testing process. The test data is alsoorganized as ordered user behavior sequences. We feedfor-ward current sample features, together with the hidden stateof previous sample to get the current hidden state. Then wemake the prediction and replace the stored hidden state withcurrent values. Here we only record the hidden state of thelast test sample, no matter how many unfolding steps thereare in the BPTT training process. Experiments

This section ﬁrst describes the settings of our experiments,and then reports the experimental results.

Data Setting

To validate whether the RNN model we proposed can re-ally help enhance the click prediction accuracy, we conducta series of experiments based on the click-through logs ofa commercial search engine. In particular, we collect half aable 1: Statistics of the dataset for training and testing clickprediction models.Ad Impressions Ads UsersTraining 3,740,980 1,363,687 235,215Testing 3,741,500 1,379,581 235,215Table 2: Overall performance of different models in terms ofAUC and RIG. Model AUC RIGLR 87.48% 22.30%NN 88.51% 23.76%RNN month logs from November 9th to November 22nd in 2013as our experimental dataset. And, we randomly sample aset of search engine users (fully anonymized) and collecttheir corresponding events from the original whole trafﬁc.Finally, we collect over 7 million ad impressions in this pe-riod of time. After that, we use the ﬁrst week’s data to trainclick prediction models, and apply those models to the sec-ond week’s data for testing. Detailed statistics of the datasetcan be found in Table 1.

Evaluation Metrics

In our work, there are multiple models to be applied to pre-dict the click probability for ad impressions in the testingdataset. We use recorded user actions, i.e., click or non-click,in logs as the true labels. To measure the overall performanceof each model, we follow the common practice in previousclick prediction research in sponsored search and employArea Under ROC Curve (

AUC ) and Relative InformationGain (

RIG ) as the evaluation metrics (Graepel et al. 2010;Xiong et al. 2012; Wang et al. 2013).

Compared Methods

In order to investigate the model effectiveness, we com-pare the performance of our RNN model with other clas-sical click prediction models, including Logistic Regression(LR) and Neural Networks (NN), with identical feature setas described aforementioned. We set LR and NN as baselinemodels due to the following reasons: 1) Quite a few pre-vious studies (Richardson, Dominowska, and Ragno 2007;McMahan et al. 2013; Wang et al. 2013) have demonstratedthat they are state-of-the-art models for click prediction insponsored search. 2) LR and NN models ignore the se-quential dependency among the data, while our RNN basedframework is able to model such information. Through thecomparison with them, we will see whether RNN can suc-cessfully leverage dependencies in the data sequence to helpimprove the accuracy of click prediction.

Experimental Results

Overall Performance

For fair model comparison, wecarefully select the parameters of each model with cross val-idation, and ensure every model achieve its best performance respectively. To be more speciﬁc, parameters for grid searchinclude: the coefﬁcient of L2 penalty, the number of trainingepochs, the hidden layer size for RNN and NN models, andthe number of unfolding steps for RNN. Finally, we get thebest settings of parameters as follows: the coefﬁcient of L2penalty is e − , the number of training epochs is 3, thehidden layer size is 13, and the number of unfolding stepsshould be 3 (more details will be provided later).Table 2 reports the overall AUC and RIG of all three meth-ods on test dataset. It demonstrates that our proposed RNNmodel can signiﬁcantly improve the accuracy of click pre-diction, compared with baseline approaches. In particular,in terms of RIG , there is about relative improvementover LR, and about relative improvement over NN. Asfor the metric of

AUC , we can ﬁnd there is about rel-ative gain over LR, and about relative gain over NN.In real sponsored search system, such improvement in clickprediction accuracy will lead to a signiﬁcant revenue incre-ment.The overall performance above shows the effectivenessof our RNN model, which clearly transcends sequence in-dependent models. Next, we will conduct detailed analysison how the sequential information help to get more accurateclick prediction.

Performance on Speciﬁc Ad Positions

It is well-knownthat the click-through rate on different ad positions varies alot, which is often referred to as the position bias. To furthercheck the performance of models within speciﬁc positions,in our experiments, we separately analyze the performanceof RNN model and two baseline algorithms on different adpositions: top ﬁrst, mainline and sidebar.Figure 6 shows the evaluation results on different posi-tions. In Figure 6(a), RNN outperforms NN and LR mea-sured by AUC on all positions. In Figure 6(b), in terms ofRIG, RNN achieves impressive relative gain over NN andLR, especially on mainline positions, where RNN beats NNby . According to our statistics on daily trafﬁc data,most of revenue comes from the mainline ad clicks, whereRNN can achieve signiﬁcantly better performance. Whilefor sidebar positions, the ads shown there are easily to beignored by users, so that the clicks or positive instances arevery rare. This may drastically hurt the performance of LRmodel. Nevertheless, the RNN model still performs the best,which indicates that even in rare cases, sequential informa-tion can still help.

Effect of Recurrent State for Inference

We have shownthe inference method of RNN in Figure 5. To further ver-ify the importance of utilizing historical sequences in theinference phase, we remove the recurrent part of the RNNmodel after training, and feedforward the testing samples asa normal NN, which means that we just ignore the sequen-tial dependencies in the testing phase. Finally, the AUC is88.25% and RIG is 18.95%. Compared to Table 2, we canobserve a severe drop of the performance of RNN model,which shows that the sequential information is indeed em-bedded in the recurrent structure and signiﬁcantly contributeto the prediction accuracy. op First Mainline Sidebar56%58%60%62%64%66%68%70%72%74%76%78% A UC LR NN RNN (a) AUC on speciﬁc ad positions

Top First Mainline Sidebar0%2%4%6%8%10%12% R I G LR NN RNN (b) RIG on speciﬁc ad positions

Figure 6: Performance on speciﬁc ad positions.

Performance with Long vs. Short History

In this part,we conduct experiments to check the model performancewith different length of history. We ﬁrst collect all availableuser sequences whose length is larger than a threshold T .In these sequences, the ﬁrst T samples in each sequencesare fed into model to serve as the “accumulation period”.Then, we continue feeding and testing samples for the restpart of each sequence, and calculate the AUC and RIG onall those rest parts. In such setting, the user sequences whichare selected with a larger threshold T have longer historyto feed as “accumulation period”. By doing so, we aim toverify whether our RNN model can maintain more robustsequential information in longer sequences.Figure 7 shows the results. Just as expected, it turns outthat our RNN model performs the best in all settings. More-over, when the “accumulation period” gets longer, our RNNmodel tends to achieve even more relative gain compared tobaseline models. This result validates the capability of RNNto capture and accumulate sequential information, especiallyfor long sequences, and help further improve the accuracy ofclick prediction. Effect of RNN Unfolding Step

As described in the frame-work section, the unfolding structure plays an import role inRNN training process with BPTT algorithm. Since unfold-ing can directly incorporate several previous samples, andthe depth of such explicit sequence modeling is determinedby the steps of unfolding, it is necessary to delve into theeffect of various RNN unfolding steps. This further analy-sis can help us better understand the property of the RNNmodel.According to our experimental results, the prediction ac-

100 200 300 400 50087.00%88.00%89.00%90.00%91.00% Sequence Length Threshold A UC Sequence Length Threshold R I G LR NN RNN100 200 300 400 500

Figure 7: Performance with different history length T .curacy surges along with the increasing unfolding steps atthe beginning. The best AUC and RIG can be achieved whenunfolding 3 steps, after which the performance drops. Bychecking the error terms during the process of BPTT, wediscover that the backpropagated error vanishes after 3 stepsof unfolding, which explains why larger unfolding step isdetrimental.With these observations, intuitively, our RNN basedframework can model sequential dependency in two ways:short-span dependency by explicitly learning from currentinput and several of its leading inputs through unfolding;long-span dependency by implicitly learning from all previ-ous input, accumulated or embedded in the weights of the re-current part. Meanwhile, in sponsored search, user’s behav-ior is also affected by both the very recent events as explicitfactor and the long-run history as implicit (background) fac-tor. This reveals the intrinsic reason why RNN works so wellfor the sequential click prediction. Conclusion and Future Work

In this paper, we propose a novel framework for click pre-diction based on Recurrent Neural Networks. Different fromtraditional click prediction models, our method leverages thetemporal dependency in user’s behavior sequence throughthe recurrent structure. A series of experiments show that ourmethod outperforms state-of-the-art click prediction modelsin various settings. In this future, we will continue this direc-tion in several aspects: 1) The sequence is currently built onuser level. We will study different kinds of sequence build-ing methods, e.g. by (user, ad) pair, (user, query) pair, ad-vertiser, or even merge all users on the level of whole sys-tem. 2) We are going to deduce the meaning of dependencylearnt by RNN via deep understanding of RNN structure.This may help up better utilize the property of the recur-rent part. 3) Recently, some research work (Hermans andSchrauwen 2013) has been done on Deep Recurrent Neu-ral Networks (DRNN) and shows good results. We plan tostudy whether “deep” structure can also help in click predic-tion, together with the “recurrent” structure. eferences [Auli et al. 2013] Auli, M.; Galley, M.; Quirk, C.; andZweig, G. 2013. Joint language and translation modelingwith recurrent neural networks. EMNLP.[Bengio et al. 2006] Bengio, Y.; Schwenk, H.; Sen´ecal, J.-S.;Morin, F.; and Gauvain, J.-L. 2006. Neural probabilis-tic language models. In

Innovations in Machine Learning .Springer. 137–186.[Box, Jenkins, and Reinsel 2013] Box, G. E.; Jenkins, G. M.;and Reinsel, G. C. 2013.

Time series analysis: forecastingand control . Wiley. com.[Fain and Pedersen 2006] Fain, D. C., and Pedersen, J. O.2006. Sponsored search: A brief history.

Bulletin of theAmerican Society for Information Science and Technology

Proceedings of the 27thInternational Conference on Machine Learning (ICML-10) ,13–20.[Graves et al. 2009] Graves, A.; Liwicki, M.; Fern´andez, S.;Bertolami, R.; Bunke, H.; and Schmidhuber, J. 2009. Anovel connectionist system for unconstrained handwritingrecognition.

Pattern Analysis and Machine Intelligence,IEEE Transactions on

NIPS , 190–198.[Hillard et al. 2011] Hillard, D.; Manavoglu, E.; Raghavan,H.; Leggetter, C.; Cant´u-Paz, E.; and Iyer, R. 2011. Thesum of its parts: reducing sparsity in click estimation withquery segments.

Information Retrieval

International Journal of Electronic Business

Introduction to moderntime series analysis . Springer.[Kombrink et al. 2011] Kombrink, S.; Mikolov, T.; Karaﬁ´at,M.; and Burget, L. 2011. Recurrent neural network basedlanguage modeling in meeting recognition. In

INTER-SPEECH , 2877–2880.[McMahan et al. 2013] McMahan, H. B.; Holt, G.; Sculley,D.; Young, M.; Ebner, D.; Grady, J.; Nie, L.; Phillips, T.;Davydov, E.; Golovin, D.; Chikkerur, S.; Liu, D.; Watten-berg, M.; Hrafnkelsson, A. M.; Boulos, T.; and Kubica, J.2013. Ad click prediction: a view from the trenches. In

Proceedings of the 19th ACM SIGKDD International Con-ference on Knowledge Discovery and Data Mining (KDD) .[Mikolov et al. 2010] Mikolov, T.; Karaﬁ´at, M.; Burget, L.;Cernock`y, J.; and Khudanpur, S. 2010. Recurrent neuralnetwork based language model. In

INTERSPEECH , 1045–1048. [Mikolov et al. 2011a] Mikolov, T.; Kombrink, S.; Burget,L.; Cernocky, J.; and Khudanpur, S. 2011a. Extensionsof recurrent neural network language model. In

Acoustics,Speech and Signal Processing (ICASSP), 2011 IEEE Inter-national Conference on , 5528–5531. IEEE.[Mikolov et al. 2011b] Mikolov, T.; Kombrink, S.; Deoras,A.; Burget, L.; and ˇCernock`y, J. 2011b. Rnnlm-recurrentneural network language modeling toolkit. In

Proc. IEEEworkshop on Automatic Speech Recognition and Under-standing , 16.[Mikolov 2012] Mikolov, T. 2012.

Statistical LanguageModels Based on Neural Networks . Ph.D. Dissertation, Ph.D. thesis, Brno University of Technology.[Radlinski et al. 2008] Radlinski, F.; Broder, A.; Ciccolo, P.;Gabrilovich, E.; Josifovski, V.; and Riedel, L. 2008. Opti-mizing relevance and revenue in ad search: a query substi-tution approach. In

Proceedings of the 31st annual interna-tional ACM SIGIR conference on Research and developmentin information retrieval , 403–410. ACM.[Richardson, Dominowska, and Ragno 2007] Richardson,M.; Dominowska, E.; and Ragno, R. 2007. Predictingclicks: estimating the click-through rate for new ads. In

Proceedings of the 16th international conference on WorldWide Web , 521–530. ACM.[Rumelhart, Hinton, and Williams 2002] Rumelhart, D. E.;Hinton, G. E.; and Williams, R. J. 2002. Learning repre-sentations by back-propagating errors.

Cognitive modeling

Proceedings of the ThirdInternational Workshop on Data Mining and Audience In-telligence for Advertising , 46–54. ACM.[Wang et al. 2013] Wang, T.; Bian, J.; Liu, S.; Zhang, Y.; andLiu, T.-Y. 2013. Exploring consumer psychology for clickprediction in sponsored search. In

Proceedings of the 19thACM SIGKDD international conference on Knowledge dis-covery and data mining . ACM.[Xiong et al. 2012] Xiong, C.; Wang, T.; Ding, W.; Shen, Y.;and Liu, T.-Y. 2012. Relational click prediction for spon-sored search. In

Proceedings of the ﬁfth ACM internationalconference on Web search and data mining , 493–502. ACM.[Xu, Manavoglu, and Cantu-Paz 2010] Xu, W.; Manavoglu,E.; and Cantu-Paz, E. 2010. Temporal click model for spon-sored search. In