Macro-optimization of email recommendation response rates harnessing individual activity levels and group affinity trends
11 Macro-optimization of email recommendationresponse rates harnessing individual activity levelsand group affinity trends
Mohammed Korayem, Khalifeh Aljadda, and Trey GraingerCareerBuilder, GA. mohammed.korayem,khalifeh.aljadda,[email protected]
Abstract —Recommendation emails are among the best waysto re-engage with customers after they have left a website.While on-site recommendation systems focus on finding the mostrelevant items for a user at the moment (right item), emailrecommendations add two critical additional dimensions: who tosend recommendations to (right person) and when to send them(right time). It is critical that a recommendation email system notsend too many emails to too many users in too short of a time-window, as users may unsubscribe from future emails or becomedesensitized and ignore future emails if they receive too many.Also, email service providers may mark such emails as spam iftoo many of their users are contacted in a short time-window.Optimizing email recommendation systems such that they canyield a maximum response rate for a minimum number of emailsends is thus critical for the long-term performance of such asystem. In this paper, we present a novel recommendation emailsystem that not only generates recommendations, but which alsoleverages a combination of individual user activity data, as wellas the behavior of the group to which they belong, in orderto determine each user’s likelihood to respond to any givenset of recommendations within a given time period. In doingthis, we have effectively created a meta-recommendation systemwhich recommends sets of recommendations in order to optimizethe aggregate response rate of the entire system. The proposedtechnique has been applied successfully within CareerBuilder’sjob recommendation email system to generate a 50% increase intotal conversions while also decreasing sent emails by 72%.
I. I
NTRODUCTION
Recommender systems are widely deployed across manyindustries for diverse use cases such as e-commerce, advertis-ing, media distribution, and job boards. Recommender systemsautomate the process of discovering the interests of a userand subsequently suggesting what should be relevant to his/herneeds [1], [2].Many companies depend on recommender systems to helpdrive their revenue, like Netflix , Amazon , CareerBuilder ,etc. For example, Netflix, a movie rental and video streamingweb site, offered a prize (known as the Netflix prize) of 1million dollars in 2006 for any recommendation algorithmthat could beat their recommender system, named Cinematch[3]. Netflix, like many other websites, depends heavily on recommendations in order to keep their customers interestedin their service. Recommendations can take place while a useris browsing a website, or even asynchronously while the useris not actively online. In the former case, recommendationsare focused on selecting similar items based on the otheritems with which the user has previously interacted. In thelatter case, recommendation emails are often sent on a regularbasis (i.e. nightly, weekly, or monthly) in order to recaptureoffline users’ attention and have them return to the websiteto reengage. These offline recommendation emails are morecomplicated than real-time recommendations due to the factthat they have to deal with two additional dimensions: 1)the right users to target among all users, and 2) the righttime to send the recommendations to those users. Looking tothe recommendation email process across a three-dimensionalspace of right person ( who ), right item ( what ), and righttime ( when ) gives us a model to optimize the effectivenessof the system at generating quality recommendations thatsuccessfully re-engage offline users.Whereas a real-time recommendation system provides on-line recommendations to users while they are browsing awebsite or otherwise interacting with a system, e-mail recom-mendations must be much more carefully optimized to ensurethey are only sent to users when the user will appreciate andbenefit from them. There is a fundamental supply and demandproblem here: once a user has left a website, they then becomeyour supply of potential future customers to reengage, butevery time you attempt to email them, you risk having themunsubscribe if they are for any reason unhappy with the email.Sending them a timely and relevant recommendation email isa good way to get them to return and reengage, but if yousend too many emails to a user then you may annoy themand lose them as a future customer forever. You wouldn’t, forexample, want to send a recommendation to every user of yoursystem multiple times per day, as you may quickly run out ofusers to send to once they all unsubscribe from your service.Instead, it is important to drive as many successful conversionsas possible with as few emails as possible, such that both youand your customers maximize the impact of your interactions.In this paper we describe a novel system to address thethree dimensions of a recommendation email system in orderto maximize the aggregate response rate through sending theright content to the right people at the right time. This systemhas been deployed in production as part of CareerBuilder’s a r X i v : . [ c s . A I] S e p recommendation email system, significantly increasing emailresponse rates (by 50%) while simultaneously reducing (by72%) the number of sent e-mails necessary to achieve thoseimproved response rates.II. R ELATED W ORK
The main task of a recommendation system is to provideusers with relevant content suggestions. It works by collectingthe preferences of users for a set of items and then rankingother items for each user based on how interested the systempredicts a user will be to see those other items [4]. There aretwo major types of recommendation systems: Collaborativefiltering [5] and content-based recommendations (often ref-ered to as content-based filtering) [4], [6], [7]. Content-basedrecommendation systems recommend items for a user basedon the similarity of features between a user and the itemsbeing recommended. For example, a job posting may containfeatures such as a job title, skills, salary, and location, and ajob seeker will similarly have a desired job title, list of skills,salary, and location. Because a content-based recommendationsystem is just performing a similarity calculation on features,a content-based recommendation system can actually matchbetween any two sets of entities (i.e. item to item, user toitem, user to user, etc.) with a shared feature space, as it relieson no past interactions with the items by users in order tomake the recommendations.Collaborative filtering [8]–[11], on the other hand, is basedon the concept that users with a shared interest in some itemswill also have a shared interest in other items. For example,a user who applies to a software engineering job is likely toapply to other jobs related to software engineering, whereas auser who applies to a registered nurse job is likely to applyto other jobs related to nursing. Thus if a new user applies toa registered nurse job, there is a good chance that if we lookat the other people who applied to that job and recommendthe other jobs those people applied to, that our new usermay be interested in those other jobs, as well. Collaborativefiltering can be performed using different approaches likefactorization-based methods [12], graph methods [13], geneticalgorithms [14], and case-based reasoning [15]. Hybrid ap-proaches also exist, which can combine both collaborativefiltering and content-based recommendations into a unifiedrecommendation algorithm [16]–[19].While the majority of published research on recommen-dation systems focuses on some form of content-based, col-laborative filtering-based, or hybrid algorithm for matchingusers with the best items (”what” to match), the additionaldimensions of ”when” to match and even ”who” to match tomaximize the response rate of the overall system are far lessstudied. The most related prior research to ours is [20], whichtries to find the best time to send a job recommendation to auser in order to optimize the odds of that user acting upon therecommendation. Their system is focused on the when compo-nent of the recommendation system, while ours combines thethree components of the recommendation together ( who , what ,and when ), with a particular emphasis on the who and what being decided relative to a more prescriptive when dimension. III. M ETHODS
Our methodology aims to address the three dimensions ofrecommendation email relevancy: who to send to, what tosend, and when to send. We ultimately choose one of thesedimensions - when to send, as our fixed dimension from whichwe will pivot, choosing to calculate who to send to and whatto send relative to each time window in which we chooseto send a batch of recommendations. We address these threedimensions by utilizing both individual user behavior, as wellas historical group behavior from other users within the sameclassification, in order to figure out who to send to and whatto send for each time period. The individual user behavioraldata predominantly dictates who to send to relative to when therecommendations will be sent, with the goal being to maximizeresponse rate. The group behavioral data primarily determineswhat to send to a particular user from a list of candidaterecommendation lists in order to maximize response rate. Thisfusion of personal and group behavioral data provides us withbetter understanding of what to send, when to send, and whomto target in order to maximize the aggregate response rateacross the entire batch of sent recommendation emails.
A. Response Likelihood
One of the most important goals for any recommendationemail system should be to achieve a high response rate.Sending a high volume of emails with a low response ratewill likely lead to several problems. First, sending too manyemails to end users may overwhelm or annoy them, hurt-ing the sender’s reputation and likely resulting in the userunsubscribing from future emails or possibly breaking tiescompletely with the sender. Second, sending emails that are notsufficiently interesting to the end user will result in reputationalharm for the sender, since the sender will be perceived ashaving a low-quality platform that is not worth the end user’stime. Third, if too many emails are sent by the system to aparticular email service provider, that email service providermay determine that the large volume of emails are spam, andthey may blacklist the sender such that future emails to anyrecipients are blocked. Fourth, sending many emails withouta good response rate is waste of resources since sendingrecommendation emails requires servers, queues, databases,and bandwidth to store, transmit, and track all of the emails.With the risks of losing customers, losing the right tocontinue contacting customers, having a sender’s reputationdamaged, having all future email communications blocked toall users, and wasting resources sending ineffective emails,it is clearly important that a recommendation email systemoptimize how it sends emails to maximize impact whileminimizing emails sent.In order to address these issues, recommendation emailsshould be sent only to the users who are most likely torespond. The immediate challenge, therefore, becomes how topredict those users with high response likelihood for a set ofrecommendations. In our system we utilize each user’s recentbehavioral data with the hypothesis that active users were morerecently interested and therefore are more likely to respond ingeneral. Following this logic, if a user was active in the last 24 hours his response likelihood will be higher than the someonewho was active 7 days ago, while users with a last activitywithin 7 days are more likely to respond than others who wereactive 20 days ago. For Careerbuilder we tracked three kinds ofbehaviors to calculate recent activity levels, namely, searchingfor a job, applying to a job, and updating a resume.We use the most recent of these activities to calculatewhat we call the
Activity Score , which is the basis for ouridentification of users who are, in general, most likely torespond to recommendation emails. Assume a user u wasactive on a date d a u while today’s date is d t . Also, assume thatwe only consider users who were active more recently than agiven date d o . We can calculate the Activity Score AS ( u ) asfollows: AS ( u ) = 1 − d t − d a u d t − d o The only parameter that needs to be chosen here is the d o .For CareerBuilder’s use case, we found that 90 days beforethe current date is a good cut-off threshold, as users tend tobe much more responsive to recommendation emails withintheir first 90 days, but a response becomes much less likelybeyond 90 days. B. Group Trends
In addition to looking at a user’s Activity Score to predict thegeneral likelihood of that user responding to a recommendationemail, it is also important to consider that not all recommen-dation emails sent to the user are equally likely to receive aresponse. Because we often have limited behavioral data forany particular user regarding the various data classification forwhich the user may be interested, we instead rely on historicalgroup behavioral trends to learn affinities between the groupof users within each classification and their likelihood of re-sponding to a recommendation within any other classification.The underlying theory here is that when users are classifiedinto categories based on their common features, they tend toshare similar interest as the other users in the same category.We built our second module of the proposed system upon thathypothesis, utilizing the group behavior within each categoryto predict the response rate of new users within the samecategory when shown items from any other category. Theimportance of this module is that it expands the selectionpool of items beyond just those that fall under the samecategory as the targeted users. Without some notion of howrelated different categories are, it can be risky to recommenditems not within the same category as the user. For example,in the recruitment domain users who are classified into the“Java Developer” category would probably not have a highresponse rate to recommended jobs from the “RegisteredNurse” category, while they might respond very favorablyto recommendation from the “Software Engineer” category,and reasonably well to recommendations from the “HadoopDeveloper” category. The recommendation engine should thusbe able to understand these category affinities when predictingresponse rates.To understand the probabilistic model which we imple-mented to represent these category affinities and to predict
Category a Category b X t,1 X t,2 X t,3 X t,4 Fig. 1. Group Trend. In this example users of category a interact with itemsof category b the interest of a user based on his group’s behavior, let us firstunderstand the notations we use to describe the model.1) u = 1 , , ..., U is the index of the user.2) a, b = 1 , , ..., C is the index of the category.3) u a = 1 , , ..., U a is the index of the user in category a .4) i a = 1 , , ..., I a is the index of the item in category a .5) t = 1 , , ..., T is the index of transition a → b , where auser u a interacts with item i b O t = { x t, , ..., x t,i } A set of all transitions of type t . x t,i is an instance i of transition t .7) r a,b is the trend of users in group a towards items ofgroup b S a,b = { x a,b, , ..., x a,b,n } A set of all items of category b seen by users of category a .In figure 1 we show an example of a group trend. Theexample shows the transition trend of users u a towardsitems i b where x t, is an instance of that transition and O t = { x t, , x t, , x t, , x t, } . We calculate the group trend asa transition probability from the users’ category to anothercategory based on the number of interactions between thoseusers and items from the other category. P ( r a,b | t, s a,b ) = | O t || s a,b | The probability score represents the likelihood that users fromcategory a would accept and interact with recommendationsincluding items from category b . We build a transition graphmodeling the probability score between different categories,and this transition graph is then utilized to select a list ofrecommendations corresponding with the highest probabilityof interest, as shown in figure 2.IV. E XPERIMENT AND R ESULTS
To test the proposed system, we applied it within therecommendation email system at CareerBuilder, which is oneof the largest job boards in the world. This system has millionsof job postings, more than 60 million actively-searchableresumes, over one billion searchable documents, and more thana million searches per hour [21]–[23]. The recommendation
Ac
Recommenda
Recommend Filter
Fig. 3. System Architecture. We leverage Apache Spark to determine each user’s likelihood to respond (Activity score) to any job, as well as to analyze thegroup behavior of other users within each classification to determine how they typically respond to jobs within other classifications, combining these factors todetermine each user’s overall likelihood to respond to any recommendation made by the hybrid user-item recommendation system.TABLE I. R
ESULTS FOR THE PROPOSED SYSTEM AGAINST THE BASELINE ( CONTROL ) SYSTEM . T
HE PROPOSED SYSTEM IMPROVES T OTAL A PPS , OSR , CT R , AND
AOR
WHILE REDUCING THE TOTAL NUMBER OF SENT EMAILS
Baseline Proposed SystemTotal Apps 10000 15000Total Sent 500,000 150,000OSR: Open to Send Ratio. Opens / Emails Sent 29% 40%CTR: Click Through Rate. Clicks / Emails Sent 8% 32%AOR: Apps to Open Ratio. Applications / Emails Opened 6% 25% c a c b c c c d c e Fig. 2. Trends Graph where we store all the possible transitions betweendifferent categories based on the calculated probability score engine selects jobs of interest to job seekers and then sendsthose jobs via recommendation emails to job seekers. Hence,these are user-item recommendations, where the users are jobseekers and the items are jobs. The previous recommendationemail system at CareerBuilder would restrict sent emails tothe target user’s category, which was not performing wellbecause the system was overly restrictive and was unableto consider alternate, related categories that users within theinitial category may also find interesting. Our methodology(shown in figure 3) has been applied in order to improve thequality and performance of CareerBuilder’s recommendationemails, so it was important to measure how the new systemis performing compared to the old one. Our measurementwas based on the open to send ratio and the number of jobapplications created based on the recommendation emails. We define an indicator function as O ( e u i ) = (cid:26) if e u i is opened if e u i is not openedThen we calculate the open to send ratio as: OSR = n (cid:80) i =1 O ( e u i ) n where OSR is the open to send ratio, e ui is the recommen-dation email sent to the user u i , and n is the total number ofemails which were sent. This score represents the relevancy ofthe emailed job recommendations given the hypothesis that auser will not open a recommendation email if the job in thatemail is not of interest to that user.While the OSR is a good initial indicator of relevancy,we should note that the user is only exposed to limitedinformation about the job being recommended (the job title)when reviewing the subject of the email. As a result, wecapture another intermediate metric called the CTR (click-through ratio). For the CTR, we first define another indicatorfunction as C ( e u i ) = (cid:26) if e u i is clicked from a link in the email if e u i is not clicked from a link in the email Then we calculate the click-through ratio as:
CT R = n (cid:80) i =1 C ( e u i ) n Both the OSR and the CTR provide valuable informa-tion about users’ perceptions about the relevance of therecommendations they are receiving. When comparing ourbaseline/control algorithm versus the proposed system, thesemetrics also show us useful information about drop-off at eachstep at which the user interacts with the recommendation. Ourend goal, however, to actually convert a user’s interest in thejob to an application for the job. To measure this, we need oneadditional metric: the application to open ratio (AOR). For theAOR, we defined one more indicator function: A ( App e ui ) = (cid:40) if App e ui is a resulting job application if App e ui is not a resulting job applicationThen we calculate the application to open ratio (AOR) as: AOR = n (cid:80) i =1 A ( App e ui ) m where App e ui is a job application created based on therecommendation email e u i , and m is the total number ofemails where A ( e u i ) = 1 . While OSR represents the openrate of the sent emails,
AOR represents the conversion of arecommendation e-mail into a job application, so
AOR is themost important factor in our case. Table I shows the significantimprovement in
OSR and
AOR delivered by the new systemover the old one.We can essentially view the email recommendations re-sponses as a funnel, where all users who are sent recom-mendations are the starting volume, less users open the email(measured by the OSR metric), even less users click on arecommendation (measured by the CTR metric), and even lessusers apply to the job (measured by the AOR metric). In thisfunnel, we note that the improvement of the proposed systemover the baseline/control system compounds at each step in thefunnel. For example, we note that the OSR increases from 29%to 40%, meaning that more users are reading the subject of theemail (which lists the title of the job being recommended)and identifying it as a potentially good match based uponthat limited information. Then, once a user actually viewsthe additional information about the job in the email, his/herinterest in clicking on the job (CTR) to continue engaging withit improves even further, from the baseline of 8% CTR allthe way to 32% CTR. Finally, for each of the opened emails,we also see an improvement in actual application rate (AOR)from 6% to 25%, meaning that drop-off has decreased atevery stage in the funnel and that users are collectively findingthe recommendations from the new system more relevant andtimely for their interests.V. C
ONCLUSION
In this paper we presented a novel approach to improvingthe quality and response rate of recommendation emails. The proposed system utilizes personal behavioral data to calculatea per-user activity score to determine which users are mostlikely to respond at the present time based upon the mostrecent activity and type of activity they have exhibited. Thenew system additionally utilizes historical group behavior tobuild a transition graph which represents the probabilitiesthat a typical user within any specific category would belikely to respond to recommendations for items from anyother category. By leveraging both the group transition graph(likelihood of a typical user within a category to respond torecommendations within any particular category) and the per-sonal activity score (likelihood of a specific user to respond toany recommendation), the proposed model is able to optimizethe aggregate choice of which recommendations should besent to which users for the given time period. The proposedmodel has been applied successfully within CareerBuilder’sjob recommendation email system to increase total conversionsby 50% while simultaneously decreasing emails sent by 72%.R
EFERENCES[1] Joseph A Konstan. Introduction to recommender systems: Algorithmsand evaluation.
ACM Transactions on Information Systems (TOIS) ,22(1):1–4, 2004.[2] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. Item-based collaborative filtering recommendation algorithms. In
Proceed-ings of the 10th international conference on World Wide Web , pages285–295. ACM, 2001.[3] James Bennett and Stan Lanning. The netflix prize. In
Proceedings ofKDD cup and workshop , volume 2007, page 35, 2007.[4] Jes´us Bobadilla, Fernando Ortega, Antonio Hernando, and AbrahamGuti´errez. Recommender systems survey.
Knowledge-Based Systems ,46:109–132, 2013.[5] Jonathan L Herlocker, Joseph A Konstan, Al Borchers, and John Riedl.An algorithmic framework for performing collaborative filtering. In
Proceedings of the 22nd annual international ACM SIGIR conferenceon Research and development in information retrieval , pages 230–237.ACM, 1999.[6] Michael J Pazzani and Daniel Billsus. Content-based recommendationsystems. In
The adaptive web , pages 325–341. Springer, 2007.[7] Prem Melville, Raymond J Mooney, and Ramadass Nagarajan. Content-boosted collaborative filtering for improved recommendations. In
AAAI/IAAI , pages 187–192, 2002.[8] Jonathan L Herlocker, Joseph A Konstan, and John Riedl. Explainingcollaborative filtering recommendations. In
Proceedings of the 2000ACM conference on Computer supported cooperative work , pages 241–250. ACM, 2000.[9] Gediminas Adomavicius and Alexander Tuzhilin. Toward the nextgeneration of recommender systems: A survey of the state-of-the-art andpossible extensions.
IEEE Trans. on Knowl. and Data Eng. , 17(6):734–749, June 2005.[10] Nikhil Rao, Hsiang-Fu Yu, Pradeep K Ravikumar, and Inderjit SDhillon. Collaborative filtering with graph information: Consistencyand scalable methods. In C. Cortes, N. D. Lawrence, D. D. Lee,M. Sugiyama, and R. Garnett, editors,
Advances in Neural InformationProcessing Systems 28 , pages 2107–2115. Curran Associates, Inc.,2015.[11] Xiaoyuan Su and Taghi M. Khoshgoftaar. A survey of collaborativefiltering techniques.
Adv. in Artif. Intell. , 2009:4:2–4:2, January 2009.[12] Xinyue Liu, Charu Aggarwal, Yu-Feng Li, Xiangnan Kong, XinyuanSun, and Saket Sathe. Kernelized matrix factorization for collaborativefiltering. In
SIAM Conference on Data Mining , pages 399–416, 2016. [13] Charu C Aggarwal, Joel L Wolf, Kun-Lung Wu, and Philip S Yu.Horting hatches an egg: A new graph-theoretic approach to collaborativefiltering. In
Proceedings of the fifth ACM SIGKDD internationalconference on Knowledge discovery and data mining , pages 201–212.ACM, 1999.[14] Jesus Bobadilla, Fernando Ortega, Antonio Hernando, and Javier Al-cal´a. Improving collaborative filtering recommender system resultsand performance using genetic algorithms.
Knowledge-based systems ,24(8):1310–1316, 2011.[15] Conor Hayes, P´adraig Cunningham, and Barry Smyth. A case-basedreasoning view of automated collaborative filtering. In
Case-BasedReasoning Research and Development , pages 234–248. Springer, 2001.[16] Luis M De Campos, Juan M Fern´andez-Luna, Juan F Huete, andMiguel A Rueda-Morales. Combining content-based and collaborativerecommendations: A hybrid approach based on bayesian networks.
International Journal of Approximate Reasoning , 51(7):785–799, 2010.[17] Robin Burke. Hybrid recommender systems: Survey and experiments.
User modeling and user-adapted interaction , 12(4):331–370, 2002.[18] George Lekakos and Petros Caravelas. A hybrid approach for movierecommendation.
Multimedia tools and applications , 36(1-2):55–70,2008.[19] Lina Yao, Quan Z Sheng, Aviv Segev, and Jian Yu. Recommendingweb services via combining collaborative filtering with content-basedfeatures. In
Web Services (ICWS), 2013 IEEE 20th InternationalConference on , pages 42–49. IEEE, 2013.[20] Jian Wang, Yi Zhang, Christian Posse, and Anmol Bhasin. Is it time fora career switch? In
Proceedings of the 22nd international conferenceon World Wide Web , pages 1377–1388. International World Wide WebConferences Steering Committee, 2013.[21] Khalifeh AlJadda, Mohammed Korayem, Camilo Ortiz, Trey Grainger,John A Miller, and William S York. Pgmhd: A scalable probabilisticgraphical model for massive hierarchical data problems. In
Big Data(Big Data), 2014 IEEE International Conference on , pages 55–60.IEEE, 2014.[22] Khalifeh AlJadda, Mohammed Korayem, Trey Grainger, and ChrisRussell. Crowd sourced query augmentation through semantic discoveryof domain-specific jargon. In , pages 808–815. IEEE, 2014.[23] Mohammed Korayem, Camilo Ortiz, Khalifeh AlJadda, and TreyGrainger. Query sense disambiguation leveraging large scale userbehavioral data. In