Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time
Chantat Eksombatchai, Pranav Jindal, Jerry Zitao Liu, Yuchen Liu, Rahul Sharma, Charles Sugnet, Mark Ulrich, Jure Leskovec
PPixie: A System for Recommending 3+ Billion Items to200+ Million Users in Real-Time
Chantat Eksombatchai, Pranav Jindal, Jerry Zitao Liu, Yuchen Liu,Rahul Sharma, Charles Sugnet, Mark Ulrich, Jure Leskovec
Pinterest{pong,pranavjindal,zitaoliu,yuchen,rsharma,sugnet,mu,jure}@pinterest.com
ABSTRACT
User experience in modern content discovery applications criticallydepends on high-quality personalized recommendations. However,building systems that provide such recommendations presents amajor challenge due to a massive pool of items, a large numberof users, and requirements for recommendations to be responsiveto user actions and generated on demand in real-time. Here wepresent Pixie, a scalable graph-based real-time recommender sys-tem that we developed and deployed at Pinterest. Given a set ofuser-specific pins as a query, Pixie selects in real-time from billionsof possible pins those that are most related to the query. To generaterecommendations, we develop Pixie Random Walk algorithm thatutilizes the Pinterest object graph of 3 billion nodes and 17 billionedges. Experiments show that recommendations provided by Pixielead up to 50% higher user engagement when compared to the pre-vious Hadoop-based production system. Furthermore, we develop agraph pruning strategy at that leads to an additional 58% improve-ment in recommendations. Last, we discuss system aspects of Pixie,where a single server executes 1,200 recommendation requests persecond with 60 millisecond latency. Today, systems backed by Pixiecontribute to more than 80% of all user engagement on Pinterest.
Pinterest is a visual catalog with several billion pins , which arevisual bookmarks containing a description, a link, and an image or avideo. A major problem faced at Pinterest is to provide personalized,engaging, and timely recommendations from a pool of 3+ billionitems to 200+ million monthly active users.Recommendations at Pinterest are a problem with a scale beyondthe classical recommendation problems studied in the literature. Pin-terest has a catalog of several billion pins that the recommender sys-tem can choose from. However, classical recommender systems thatconsider catalogs that only contain millions of items (movies [5, 18],videos [4, 7, 10], friends to follow [2, 14, 15]). In contrast, Pinterestrecommends from a catalog of billions of items, which makes therecommendations problem much more challenging.A second important challenge is posed by the fact that recom-mendations have to be calculated on demand and in real-time. Thisreal-time requirement ( i.e. , sub 100 millisecond latency per rec-ommendation request) is crucial for two reasons: (1) Users preferrecommendations responsive to their behavior, thus recommen-dations have to be computed on demand and in real-time so thesystem can instantaneously react to changes in user behavior andintent; (2) Real-time requirement also brings a drastic change in thedesign of the entire system. For example, even if recommendationswould only take one second to compute, such times are too long forthe user to wait. In turn, this would mean that recommendations for all users would have to be precomputed and materialized on adaily schedule. Moreover, the total number of registered users isusually much larger than the number of daily active users, so a lotof time and resources would be wasted updating recommendationsfor inactive users.
Present work: Pixie.
Here we present Pixie, a scalable real-timegraph-based recommendation system deployed at Pinterest. Cur-rently, pins recommended by Pixie represent more than 80% of alluser engagement at Pinterest. In an A/B tests recommendationsprovided by Pixie increase per pin engagement by up to 50% highercompared to the previous Pinterest recommendation systems.Users at Pinterest view pins and curate them into collectionscalled boards . This way a single pin can be saved by thousands ofusers into tens of thousands of different boards. For example, thesame recipe pin could be saved by different users to several differentboards such as “recipes”, “quick to cook”, “vegetarian”, or “summerrecipes.” This manual curation mechanism provides a great sourcefor recommendations, because curation captures the multi-facetedrelationships between objects. With hundreds of millions of usersmanually categorizing/classifying pins into boards we obtain anobject graph of multi-faceted relationships between pins. Thus, wecan think of Pinterest as a giant human curated bipartite graph of7 billion pins and boards, and over 100 billion edges .We use this bipartite graph of pins and boards to generate recom-mendations. As a user interacts with pins at Pinterest our methoduses these pins to create a query set Q of pins each with its ownweight. The query set is user-specific and changes dynamically—itcan contain most recently interacted pins as well as pins from longtime ago. Given the query Q , we then generate recommendationsusing Pixie Random Walk algorithm. Because the walk visits bothboards as well as pins both types of objects can be recommendedto the user. Furthermore, the algorithm is fast, scalable, and runs inconstant time that is independent of the input graph size.Our novel Pixie Random Walk algorithm includes the followinginnovations, which are critical for providing high-quality recom-mendations: (1) We bias the Pixie Random Walk in a user-specificway, for example, based on the topic and language of the user; (2)We allow for multiple query pins with different importance weights,which allows us to capture entire context of the user’s previousbehavior; (3) Our method combines results from multiple indepen-dent random walks in such a way that it rewards recommendationsthat are related to multiple query pins. In combinations with (2)this leads to more relevant recommendations; (4) Our Pixie Ran-dom Walk uses special convergence criteria which allows for early There are over 100 billion pins in all boards at Pinterest, represented by the 100 billionedges in our graph. The unique number of pins is smaller because the same pin isusually saved to many boards. a r X i v : . [ c s . I R ] N ov topping and is crucial for achieving real-time performance andthroughput; Last, (5) our Pixie algorithm allows for recommend-ing both pins as well as boards, which helps solving the cold-startproblem. To recommend fresh new pins Pixie first recommendsboards (rather than pins) and then serves the new pins saved tothose boards. In addition, we also develop a graph curation strategythat further increases the quality of recommendations by 58%. Thiscuration also lowers the size of the graph by a factor of six whichfurther improves runtime performance of Pixie.Our Pixie algorithm has several important advantages. The algo-rithm offers the flexibility to dynamically bias the walk. For example,in Pixie we bias the walk to prefer to recommend content local to thelanguage of the user, which boosts user engagement. Pixie allowsfor computing recommendations based on multiple query pins. Fur-thermore, we can vary the walk length to make trade-offs betweenbroader and narrower recommendations. For areas of Pinterest in-tended to provide unexpected, exploratory recommendations Pixiecan walk farther in the graph for more diverse recommendations.On the other hand, to generate narrowly focused and topical rec-ommendations, Pixie can use shorter walks. Pixie Random Walkalso has several advantages over traditional random walks (or oversimply counting the number of common neighbors): In classicalrandom walks low degree nodes with fewer edges contribute lesssignal. This is undesirable because smaller boards (lower degreenodes) tend to be more topically focused and are more likely toproduce highly relevant recommendations. In Pixie Random Walkwe solve this by boosting the impact of smaller boards. And last,Pixie Random Walk is efficient and runs in constant time that isindependent of the input graph size.Deployment of Pixie is facilitated by the availability of large RAMmachines. In particular, we use a cluster of Amazon AWS r3.8xlargemachines with 244GB RAM. We fit the pruned Pinterest graph of 3billion nodes and 17 billion edges into about 120 GB of main memory.This gives several important benefits: (1) Random walk does nothave to cross machines which brings huge performance benefits; (2)The system can answer queries in real-time and multiple walks canbe executed on the graph in parallel; And, (3) Pixie can be scaledand parallelized by simply adding more machines to the cluster.Overall, Pixie takes less than 60 milliseconds (99-percentile la-tency) to produce recommendations. Today, a single Pixie servercan serve about 1,200 recommendation requests per second, andthe overall cluster is serving nearly 100,000 recommendation re-quests per second. Pixie is written in C++ and is built on top of theStanford Network Analysis Platform (SNAP) [20].The remainder of this paper is structured around the primarycontributions of our work. After a brief discussion of the relatedwork in Section 2, we explain the Pixie Random Walk algorithm inSection 3.1. We discuss graph pruning in Section 3.2 and the systemimplementation in Section 3.3. Section 4 evaluates the system aswell as the recommendations. Section 5 discusses Pixie’s impactand use cases at Pinterest. Finally, we conclude in Section 6. Recommender systems are a large and well-investigated researchfield. Here we break the related work into several lines and focusin large-scale industrial recommeder systems.
Web-scale recommender systems.
Several web-scale productionsystems have been described in the past [1, 7, 21]. However, unlikePixie, these systems are not real-time and their recommendationsare precomputed. In practice, response times below 100 millisecondsare considered real-time because such systems can then be incor-porated in the real-time serving pipeline. For example, if providingrecommendations would take just 1 second, then the user wouldhave to wait too long for the recommendations to be generated.In such cases recommendations would have to be precomputed(say once a day) and then served out of a key-value store. However,old recommendations are stale and not engaging. The real-timerequirement is thus crucial because it allows the recommenderto instantaneously react to changes in user behavior and intent.Responding to users in real-time allows for highly engaging andrelevant recommendations. Our experiments show that reacting touser’s intent in real-time leads to 30-50% higher engagement thanneeding to wait days or hours for recommendations to refresh.Other examples of real-time recommendation systems includenews recommendations [9, 26]. However, such systems recommendonly the latest content. The major difference here is in scale—Pinterest’s catalog contains 1,000-times more items than traditionalrecommender systems can handle.
Random-walk-based approaches.
Many algorithms random walksto harness the graph structure for recommendations [2, 3, 28]. Per-haps, the closest to our work here is the “who to follow” systemsat Twitter [14, 15] that places the entire follow graph in memoryof a single machine and runs a personalized SALSA algorithm [19].These types of Monte Carlo approaches measure the importance ofone node relative to another, and recommendations can be made ac-cording to these scores [13]. In contrast, we develop a novel randomwalk that is faster and provides better performance.
Traditional collaborative filtering approaches.
More generally,our approach here is related to Collaborative filtering (CF) whichmakes recommendations by exploiting the interaction graph be-tween users and items by matching users that have similar item pref-erences. CF relies on factorizing user-item interaction matrices togenerate latent factors representing users and items [16, 17, 25, 30].However, the time and space complexity of factorization-based CFalgorithms scale (at least) linearly with the number of nodes inthe input user-item graph, making it challenging to apply thesealgorithms to problems containing billions of items and hundredsof millions of users. In contrast, our random walk based recommen-dation algorithm runs in constant time which is independent of thegraph/dataset size.
Content-based methods.
In purely content-based recommendersystems, representations for items are computed solely based ontheir content features [24]. Many state-of-the-art web-scale rec-ommendation systems are content-based, often using deep neuralnetworks [6, 8, 11, 29]. While these algorithms can scale to largedatasets because the dimension of parameter space only depends onthe dimension of feature space, these approaches have not leveragedinformation from the graph structure which (as our experimentsshow) is essential for Pinterest. PROPOSED METHOD
Pinterest is a platform where users interact with pins . Users can save relevant pins to boards of their choice. These boards are collectionsof similar pins. For example, a user can create a board of recipesand collect pins related to food items in it. Pinterest can be seen asa collection of boards created by its users, where each board is a setof curated pins and each pin can be saved to hundreds of thousandsof different boards. More formally, Pinterest can be organized asan undirected bipartite graph G = ( P , B , E ) . Here, P denotes a setof pins and B denotes the set of boards. The set P ∪ B is the set ofnodes of G . There is an edge e ∈ E between a pin p ∈ P and a board b ∈ B if a user saved p to b . We use E ( p ) to denote board nodesconnected to pin p (and E ( b ) for pins connected to b ). We assumethat G is connected, which is also the case in practice.On the input Pixie receives a weighted set of query pins Q = {( q , w q )} , where q is the query pin and w q is its importance inthe query set. The query set Q is user-specific and is generateddynamically after every action of the user—most recently interactedpins have high weights while pins from long time ago have lowweights. Given the query Q Pixie then generates recommendationsby simulating a novel version of a biased random walk with restarts.
To ease the explanation of Pixie we first explain the basic randomwalk and then discuss how to extend it into a novel random walkalgorithm used by Pixie. All the innovations on top of the basicrandom walk are essential for Pixie to achieve its full performance.
Basic Random Walk.
Consider the simple case when user-specificquery Q contains a single pin q . Given, an input query pin q , onecan simulate many short random walks on G , each starting from q ,and record the visit count for each candidate pin p , which countsthe number of times the random walk visited pin p . The more oftenthe pin is visited, the more related it is to the query pin q .The basic random walk procedure BasicRandomWalk is de-scribed in Algorithm 1 [28]. Each random walk produces a sequenceof steps . Each step is composed of three operations. First, given thecurrent pin p (initialized to q ) we select an edge e from E ( p ) thatconnects q with a board b . Then, we select pin p ′ by sampling anedge e ′ from E ( b ) that connects b and p ′ . And third, the current pinis updated to p ′ and the step repeats.Walk lengths are determined by parameter α . The total of thenumber of steps across all such short random walks determines thetime complexity of this procedure and we denote this sum by N .Finally, we maintain a counter V that maps candidate pins to thevisit count. To obtain the recommended pins, we can extract thetop visited pins from the returned counter and return them as thequery response. The time taken by this procedure is constant andindependent of graph size (determined by parameter N ).Having described the basic random walk procedure we nowgeneralize and extend it to develop the Pixie Random Walk. PixieRandom Walk algorithm is comprised of Algorithms 2 and 3 andincludes the following improvements over the basic random walk:(1) Biasing the random walk towards user-specific pins; (2) Multiple We note there are alternative ways to define graph G . For example, one could alsoconnect users to boards they own. Here we present this simplified graph but all ouralgorithms generalize to more complex graphs as well. Algorithm 1
Basic Random Walk; q is the query pin; E denotesthe edges of graph G ; α determines the length of walks; N is thetotal number of steps of the walk; V stores pin visit counts.BasicRandomWalk( q : Query pin, E : Set of edges, α : Real, N : Int) totSteps = 0, V = (cid:174) repeat currPin = q currSteps = SampleWalkLength( α ) for i = [1 : currSteps] do currBoard = E (currPin)[rand()] currPin = E (currBoard)[randNeighbor()] V [currPin]++ totSteps += currSteps until totSteps ≥ N return V query pins each with a different weight; (3) Multi-hit booster thatboosts pins that are related to multiple query pins; (4) Early stoppingthat minimizes the number of steps of the random walk whilemaintaining the quality of the results. (1) Biasing the Pixie Random Walk. It is important to bias therandom walk in a user-specific way. This way, even for the samequery set Q , recommendations will be personalized and will differfrom a user to user. For example, Pinterest graph contains pinsand boards with different languages and topics and from the userengagement point of view it is important that users receive recom-mendations in their language and on the topic of interest.We solve the problem of biasing the random walk by changingthe random edge selection to be biased based on user features. Therandom walk then prefers to traverse edges that are more rele-vant to that user. One can think of these edges as having higherweight/importance than the rest of the edges in the graph. This waywe bias the random walk in a user-specific way towards a particularpart of the graph and let it focus on a particular subset of pins. Inpractice, this modification turns out to be very important as it im-proves personalization, quality, and topicality of recommendations,which then leads to higher user engagement.Pixie algorithm takes as input a set of user features U (Algo-rithm 2). Notice that between different Pixie calls for differentusers and queries we can choose bias edge selection dynamicallybased on user and edge features which increases the flexibility ofPixie recommendations. In particular, PixieRandomWalk selectsedges with PersonalizedNeighbor(E,U) to prefer edges impor-tant for user U . This allows us to prefer edges that match usersfeatures/preferences such as language or topic. Conceptually, thisallows us to bias the walk in a user specific way but with minimalstorage as well as computational overhead. Essentially, one couldthink of this method as using a different graph for each user whereedge weights are tailored to that user (but without the need to storea different graph for each of the 200+ million users). In practicefor performance reasons we currently limits the weights to onlytake values from a discrete set of possible values. We further avoidoverhead by storing edges for similar languages and topics con-secutively in memory and thus PersonalizedNeighbor(E,U) is asubrange operator.
2) Multiple Query Pins with Weights.
To holistically model theuser it is important to make recommendations based on the entirehistorical context of a given user. We achieve this by performingqueries based on multiple pins rather just on one pin. Each pin q inthe query set Q is assigned a different weight w q . Weights are basedon the time since the user interacted with a pin and the type ofinteraction. To produce recommendations for a set Q of query pinswe proceed as follows. We run Pixie Random Walk (Algorithm 2)from each query pin q ∈ Q and maintain a separate counter V q [ p ] of pin p for each query pin q . Last, we combine the visit counts byapplying a novel formula which we describe later.An important insight here is that the number of steps required toobtain meaningful visit counts depends on the query pin’s degree.Recommending from a high-degree query pin that occurs in manyboards requires many more steps than from a pin with a smalldegree. Hence, we scale the number of steps allocated to each querypin to be proportional to its degree. However, the challenge remainsthat if we assign the number of steps in linear proportion to thedegree then we can end up allocating not even a single step to pinswith low degrees.We achieve our goal of step distribution by allocating the numberof steps based on a function that increases sub-linearly with thequery pin degree and scale the per pin weights w q by a scalingfactor s q . We construct the following scaling factor for each pin: s q = | E ( q )| · ( C − log | E ( q )|) (1)where s q is the scaling factor for a query pin q ∈ Q , | E ( q )| is thedegree of q , and C = max p ∈ P | E ( p )| is the maximum pin degree.This function, by design, does not give disproportionately highweights to popular pins. We then allocate the number of steps asthe following: N q = w q N s q (cid:205) r ∈ Q s r (2)where N q is the total number of steps assigned to the random walksthat start from query pin q . The distribution gives us the desiredproperty that more steps are allocated to starting pins with highdegrees and pins with low degrees also receive sufficient numberof steps. We implement this in line 2 of Algorithm 3. (3) Multi-hit Booster. Another innovation of Pixie algorithm isthat in general for queries with a set Q of query pins, we preferrecommendations that are related to multiple query pins in Q . Intu-itively, the more query pins a candidate pin is related to, the morerelevant it is to the entire query. In other words, candidates withhigh visit counts from multiple query pins are more relevant to thequery than for example candidates having equally high total visitcount but all coming from a single query pin.The insight here is that we let Pixie boost the scores of candidatepins that are visited from multiple query pins. We achieve thisby aggregating visit counts V q [ p ] of a given pin p in a novel way.Rather than simply summing visit counts V q [ p ] for a given pin p over all the query pins q ∈ Q , we transform them and this wayreward pins that get visited multiple times from multiple differentquery pins q : V [ p ] = (cid:169)(cid:173)(cid:171) (cid:213) q ∈ Q (cid:113) V q [ p ] (cid:170)(cid:174)(cid:172) (3) Algorithm 2
Pixie Random Walk algorithm with early stopping.PixieRandomWalk( q : Query pin, E : Set of edges, U : Userpersonalization features, α : Real, N : Int, n p : Int, n v : Int) totSteps = 0, V = (cid:174) nHighVisited = 0 repeat currPin = q currSteps = SampleWalkLength( α ) for i = [1 : currSteps] do currBoard = E (currPin)[PersonalizedNeighbor(E,U)] currPin = E (currBoard)[PersonalizedNeighbor(E,U)] V [currPin]++ if V [currPin] == n v then nHighVisited++ totSteps += currSteps until totSteps ≥ N or nHighVisited > n p return V where V [ p ] is the combined visit count for pin p . Note that when acandidate pin p is visited by walks from only a single query pin q then the count is unchanged. However, if the candidate pin is visitedfrom multiple query pins then the count is boosted. Subsequently,when the top visited pins are selected from the counter V , theproportion of “multi-hit” pins is higher. We implement this in line5 of Algorithm 3. (4) Early Stopping. The procedures described so far would runrandom walks for a given fixed number of steps N q . However, sincethe Pixie runtime is critically dependent on the number of steps wewant to run walks for smallest possible number of steps. Here weshow that we can substantially reduce the runtime by adapting thenumber of steps N q depending on the query q rather than having afixed N q for all query pins.Our solution to this problem is to terminate the walks once theset of top candidates becomes stable, i.e. , does not change muchwith more steps. Since Pixie recommends thousands of pins, ifimplemented naïvely, this monitoring can be more expensive thanthe random walk itself. However, our solution elegantly overcomesthis by having two integers n p and n v . We then terminate the walkswhen at least n p candidate pins have been visited at least n v times.This monitoring is easy and efficient to implement because we onlyneed a counter to keep track of the number of candidate pins thathave been visited at least n v times (lines 12-15 of Algorithm 2).We later show in Section 4.2 that early stopping produces almostthe same results as a long random walk but in about half the numberof steps, which speeds-up the algorithm by a factor of two. Another important innovation of our method is graph cleaning andpruning. Graph pruning improves recommendation quality andalso decreases the size of the graph so that it can fit on a smaller,cheaper machine with better cache performance for serving.The original Pinterest graph has 7 billion nodes and over 100billion edges. However, not all boards on Pinterest are topicallyfocused. Large diverse boards diffuse the walk in too many di-rections, which then leads to low recommendation performance. lgorithm 3 Pixie recommendations for multiple pins.PixieRandomWalkMultiple( Q : Query pins, W : Set of weights forquery pins, E : Set of edges, U : User personalization features, α :Real, N : Int) for all q ∈ Q do N q = Eq. 2 V q = PixieRandomWalk( q , E , U , α , N q ) for all p ∈ G do V [ p ] = (cid:16)(cid:205) q ∈ Q (cid:112) V q [ p ] (cid:17) return V Similarly, many pins are mis-categorized into wrong boards. Thegraph prunning procedure cleans the graph and makes it more top-ically focused. As a side benefit graph pruning also leads to a muchsmaller graph that fits into the main memory of a single machine.Not having to distribute the graph across multiple machines, leadsto huge performance benefits because the random walk does nothave to “jump” across the machines.We approach the problem of graph prunning as follows. First,we quantify the content diversity of each board by computing theentropy of its topic distribution. We run LDA topic models on eachpin description to obtain probabilistic topic vectors. We then usetopic vectors of the latest pins added to a board as input signalsto compute board entropy. Boards with large entropy are removedfrom the graph along with all their edges.Another challenge is that real-world graphs have skewed heavy-tailed degree distributions. In the case of Pinterest this means thatsome pins are extremely popular and have been saved to millions ofboards. For such nodes, the random walk needs to be run for manysteps because it gets diffused among a large number of networkneighbors. We address this problem by systematically discardingedges of high degree pins. However, rather than discarding theedges randomly we discard edges where a pin is miscategorizedand does not belong topically into a board. We use the same topicvectors as above and calculate the similarity of a pin to a board usingthe cosine similarity of the topic vectors and then only keep theedges with the highest cosine similarity. In particular, the extent ofpruning is determined by a pruning factor δ . We update the degreeof each pin p to | E ( p )| δ and discard the edges that connect p toboards whose topic vectors have low cosine similarity with thetopic vector of p .After pruning the graph contains 1 billion boards, 2 billion pinsand 17 billion edges. Surprisingly, we find that pruning is beneficialin two aspects: (1) decreases the size of the graph (and the memoryfootprint) by a factor of six; (2) and also leads to 58% more relevantrecommendations (further details are described in Section 4.3). Here we shall discuss the implementation details of Pixie. To meetthe real-time requirements Pixie relies on efficient data structureswhich we discuss first. Then we discuss how the graph of pinsand boards is generated. Finally, we briefly discuss the servers thatrespond to Pixie queries constructed by various applications atPinterest. Pixie is written in C++ and is built on top of the StanfordNetwork Analysis Platform (SNAP) [20].
Graph Data Structure.
The random walk procedure described inAlgorithm 2 spends most of its time in the inner loop (lines 6 to13). Therefore, for this procedure to be effective, we need efficientimplementations of graphs and counters. We describe these next.Consider lines 7-10 of Algorithm 2. For these operations to beefficient we need to quickly sample a board connected to a pin anda pin connected to a board. We next design a data structure thatperforms these operations in constant time.We develop a custom highly optimized data structure where weassign each node of G , i.e. , every pin p ∈ P and board b ∈ B aunique ID between 1 and | P ∪ B | . The graph is implemented as avariant of the standard adjacency list implementation. Each nodeis associated with a list of its neighbors. Allocating each such listdynamically is slow and causes memory fragmentation, thereforewe use the object pool pattern to concatenate all of the adjacencylists together in one contiguous array edgeVec .In edgeVec the i th entry offset i is the offset where node i ’s neigh-bors are stored in the associated edgeVec . Note that the numberof node i ’s neighbors is given by offset i + − offset i . To sample aneighbor of a node with ID i whose neighbors are stored in an edgeVec F , we read the ID stored at F (cid:2) offset i + (cid:0) rand () % ( offset i + − offset i ) (cid:1)(cid:3) (4)Thus, the accesses on lines 5, 8, and 10 of Algorithm 2 can beperformed efficiently in constant time. Visit Counter.
After sampling a pin p , PixieRandomWalk incre-ments the visit count—the number of times p has been visited (line11, Algorithm 2). We develop an open addressing hash table V thatimplements this operation efficiently with linear probing. First, weallocate a fixed size array where each element is a key-value pair.For Pixie, the keys are pin IDs in G and the values are visit counts.When incrementing the visit count of a pin ID k , we first use a hashfunction (described below) to index into this array. If k matches thekey stored at the index then we update the value. Otherwise, wecontinue probing the following indices until we either find a freeelement or k ( i.e. , linear probing). In the former case, we assign thekey k and value 1 to the free element. Our primary motivation touse linear probing is to maintain good cache locality while resolvingcollisions. For this procedure to be efficient, the hash function needsto be fast. We use the very light-weight multiplicative hash function( i.e. , multiply the key with a fixed prime number modulo the arraysize). Empirically, we have observed that the key insertions in thiscounter have performance comparable to random array accesses.After PixieRandomWalkMultiple terminates, the array is sortedin descending order of values and the pin IDs with top visit countsare returned as recommendations.Using arrays to implement hash tables can have problems; forexample, if the array is full then it would need to be resized. Inthe context of Pixie, the number of steps N provides an upperbound on the number of keys the hash table needs to support asthe number of pins with non-zero visit counts can never exceedthe number of steps. We conservatively allocate an array of size N when initializing the hash table to avoid resizing. Graph Generation and Pruning.
The graph generation first runsa Hadoop pipeline, followed by a graph compiler. The Hadooppipeline contains a series of MapReduce jobs that go through the ethod Top 10 Top 100 Top 1000Content-based (textual) 1.0% 2.2% 4.8%Content-based (visual) 1.1% 2.4% 4.5%Content-based (combined) 2.1% 4.6% 10.5%Pixie (graph-based) 6.3% 23.1% 52.2% Table 1: Given a query pin, predict which pin will be re-pinned. Performance is quantified by fraction of times thecorrect pin was ranked among top K . data at Pinterest and retrieve all the boards and the pins belongingto them. The pipeline outputs a raw text file that contains the edgesbetween boards and pins, and uploads it to a global storage. Thegraph compiler runs on a single terabyte-scale RAM machine thatpolls for new raw graphs. Once a new raw graph file is available,the graph compiler downloads and parses the raw data into mem-ory, prunes the graph, then persists it to disk in a binary format.These binaries can be shared easily between machines. This graphgeneration process runs once per day.Loading the graph binaries from the disk to shared memory takes ≈
10 minutes. The process is efficient as the load is a sequential readfrom the disk. We use Linux HugePages for this shared memoryto increase the size of each virtual memory page from 4 KB to 2MB thus decreasing the number of page table entries needed by afactor 512. Too many page table entries is especially problematicon virtual machines; the HugePages option enabled Pixie on virtualmachines to serve twice as many requests at half the runtime.
Pixie Server.
On start up, each Pixie server loads the graph fromthe disk into memory. Each server has an IO thread pool and aworker thread pool. The IO threads serialize and deserialize queriesand responses. They hand off sets of pins to worker threads. Eachworker thread has its own counter that collects visit counts. Toavoid synchronization costs among the workers, each query isserved by a single worker. The server also has a background threadthat periodically checks for the availability of new graphs. Whenavailable, the latest graph is downloaded to the disk. The serverrestarts once a day and loads the latest available graph in memory.
In this section we evaluate Pixie and empirically validate its per-formance. We quantify the quality of Pixie recommendations, theruntime performance of the Pixie algorithm, and the effect of thegraph pruning on recommendations.
The goal of Pixe is to produce highly-engaging recommendations.We quantify the quality of recommendations in two ways: (1) givena user we aim to predict which pin they will engage with next; (2)we perform A/B experiments where we can directly measure thelift in user engagement due to recommendations made by Pixie.For comparison we use two state-of-the-art deep learning content-based recommender methods that use visual and textual featuresof a given pin to produce recommendations. Each pin is associatedwith both an image and a set of textual annotations defined bythe users. We use these content features to create embeddings ofpins. To generate recommendations for a given query pin q we thenapply nearest neighbor methods to find the most similar pins. Experiment LiftHomefeed, per pin engagement +48%Related pins, per pin engagement +13%Board recommendations, per pin engagement +26%Localization, pins in user local language +48-75%Explore tab, per pin engagement +20% Table 2: Summary of A/B experiments across different Pin-terest user surfaces. Lift in engagement of Pixie vs. currentproduction systems.
The visual embeddings are the 6-th fully connected layer of aclassification network using the VGG-16 architecture [12, 27]. Thetextual annotation embeddings are trained using the Word2Vecmodel [23]. The context of an annotation consists of annotationsthat associate with each pin. For generating recommendations basedon visual embeddings we use the hamming distance, while forgenerating recommendations based on textual embeddings we usethe cosine distance.
Ranking the most related pin.
We quantify the success of rec-ommendations by performing the following prediction task: Givena user that is examining a query pin q , we aim to predict whichof all other pins is most related to the query pin. Here we rely onuser activity and say that pin x is most related to q if the user whilelooking at q has saved its related pin x . We formulate this as aranking task where given a query pin q we aim to rank all other2B pins with the goal to rank the saved pin x as high as possible.We measure the performance by hit rate , which we define as thefraction of times the saved pin was ranked among the top- K results.Table 1 gives the hit rate at K =
10, 100, and 1000. Pixie givesmuch better performance of recommendations and is able to predictwhich of the pins is most related to the query pin (and thus savedby the user) most accurately for all values of K . Note that dueto large-scale nature of our recommendation problem we do notcompare to other baselines (such as collaborative filtering or matrixfactorization methods) since no ready-to-use methods exist. Results of A/B experiments.
The ultimate test of the quality ofPixie recommendations is the lift in user engagement in a controlledA/B experiment where a random set of users experiences Pixie rec-ommendations, while other users experience recommendationsgiven by the old Hadoo-based production system that precomputesrecommendations and is not real-time. Any difference in engage-ment between these two groups can be attributed the increasedquality of Pixie recommendations. We measure engagement byquantifying the fraction of pins that a user engages in by clicking,liking, or saving them. Table 2 summarizes the lifts in engagementof pins recommended by Pixie in controlled A/B experiments withobserved increases between 13% to 48%.
Next we shall evaluate the runtime performance of Pixie algorithm.
Pixie Runtime.
We evaluate how the number of steps and the sizeof the query set Q affect the runtime of Pixie. For the experiment wesample 20,000 queries of size 1 and compute the average runtime foreach N . Figure 1(a) shows that the runtime increases linearly with
00 400 600 800 1000Number of steps (in thousands)255075100125150175200225 T i m e o f r a n d o m w a l k ( i n m illi s e c o n d s ) T i m e o f r a n d o m w a l k ( i n m ili s e c o n d s ) (a) (b) Figure 1: (a) Runtime of
PixieRandomWalk against the num-ber of steps and (b) against the size of the query set. the number of steps. Moreover, the runtime is below 50 millisecondsfor random walks with less than 200,000 steps.To evaluate runtime as a function of the query size, we randomlysample 20,000 queries of each query size. We keep the number ofsteps constant and compute the average runtime for queries withidentical size. Figure 1(b) shows that the runtime increases slowlywith the query size. This increase is primarily due to cache misses.During random walks from a query pin, the cache gets warmedwith the neighborhood around the pin. When the walks are startedfrom a different query pin, the cache becomes cold and needs to bewarmed again. With longer queries, the cache becomes cold moreoften, and the runtime slightly increases.
Variance of Top Results.
The recommendations produced by arandomized procedure such as Pixie are not deterministic. Eachtime Pixie is run, the visit counts depend on the sampled randomnumbers. However, we desire stability of recommendations, i.e. , theset of recommended pins should not change much across multipleruns. If we could run random walks for a enough steps for thewalk to converge then the recommendations would become stable.However, Pixie has a tight runtime requirement and we cannot runrandom walks for billions of steps. We study how the stability ofthe set of top visited pins varies with the number of steps with theaim to balance the conflicting requirements of low runtime andhigh stability.We randomly sample 20,000 queries of size 1 and then run eachquery 100 times. We then examine the top 1,000 results of eachof 100 responses and count the number of pins which appear inat least K responses ( K = , , . . . , K = K = Evaluation of the Biased Walk.
Here we evaluate the efficacyof biasing the Pixie Random Walk in a user-specific way. Thisway, even for the same query set, recommendations will be morepersonalized and will differ from a user to user. N u m b e r o f d i s t i n c t r e s u l t s a pp e a r i n g i n n o f e w e r t h a n K % o f r un s K=100K=90K=80K=70K=60K=50
Figure 2: The variance of results against number of steps. En → Japanese Japanese → JapaneseBasicRandomWalk 16 .
35% 52 . .
33% 100 . → Spanish Spanish → SpanishBasicRandomWalk 41 .
94% 74 . .
51% 100 . → Slovak Slovak → SlovakBasicRandomWalk 2 .
13% 16 . .
55% 100 . Table 3: Comparison of the proportion of target-languagecontent produced by
BasicRandomWalk and
PixieRan-domWalk . The second column shows the percentage of can-didates in the target language when the query pin is in theEnglish language and the third column shows the percent-age when the query pin itself is in the target language.
To illustrate the effectiveness of the biasing procedure we con-sider the following experiment where the goal is to provide recom-mendations in a given target language. We start the random walkat a pin in some language and then aim to bias the Pixie RandomWalk to visit pins from the target language.We report the results in Table 3. We consider three target lan-guages: Japanese, Spanish, and Slovak. For each language, we showthe percentage of target-languege pins in the response producedby simple random walks (Algorithm 1) and Pixie Random Walk(Algorithm 2). We consider two scenarios, when the query pins arein English (column 2) and when they are in target language (column3). For queries originating from different languages, target languagerecommendations provide much better and more engaging user ex-perience. We observe that Pixie Random Walk significantly booststhe target-language content in the query responses.
Early Stopping.
Pixie algorithm terminates random walks after n p pins reach a visit count of n v . Thus n v acts as a minimumthreshold of visit counts that the pins in the recommendation sethave to meet and n p denotes the minimum number of pins thatreach this threshold before we stop the walks. Lower values of n p and n v lead to lower running time but potentially also unstablerecommendation results.Here we study how the runtime and stability is affected by n p and n v in order to set these parameters appropriately. We randomly R e l a t i v e l a t e n c y Relative latency (top) and overlap of results in top 1k (bottom) with respect to the full walk O v e r l a p o f r e s u l t s n p =2000 1000 1500 2000 2500 3000 3500 4000 4500 50000.30.40.50.60.7 R e l a t i v e l a t e n c y Relative latency (top) and overlap of results in top 1k (bottom) with respect to the full walk O v e r l a p o f r e s u l t s n v =4 (a) (b) Figure 3: (a) Early stopping performance against n v with n p = , . (b) Early stopping performance against n p with n v = . sample 20,000 queries with size 1 and consider the top 1,000 recom-mendations. As a gold-standard set of recommendations we alsorun Pixie but with a fixed very large number of steps.First, we set n p = ,
000 and vary n v with results shown inFigure 3(a). We observe that lower values of n v lead to much fasterrun times (Figure 3(a), top). For n v =
6, the run time reduces tohalf. With high values of n v , we observe that the recommendationsproduced by Pixie have a high overlap with the gold-standard set.For example, for n v =
8, 900 (out of 1,000) results are common toboth the recommendation sets, while the running time is improvedby a factor of two. Second, we fix n v = n p as shownin Figure 3(b). The trends are similar. Overall, we observe that bychoosing parameters appropriately, e.g. , n p = ,
000 and n v = We evaluate the effect of graph pruning on the quality of Pixierecommendations, the memory usage, as well as running time. Weevaluate the quality on the link prediction task that we describenext. When a user adds a pin to a board they create a new edgein the graph. We use Pixie to predict the pins that would be savedto a board after a timestamp t , denoted by X , by querying thepins that already exist on the board before t , denoted by Q . Pixiesucceeds if the response R is identical to X . As is standard, weare interested in two quantities: recall and precision . The recallmeasures the percentage of pins in X that R contains as well. Andprecision measures the percentage of pins in R that are included in X . We then use the F1 score of precision and recall as a measure ofthe quality of the results.In this evaluation, we first select 100,000 boards at random. Thenwe take the latest 20 pins in each board before the time t . Eachsuch sample constitutes a Pixie query Q . We select the top hundredvisited pins as the recommendation set R . Finally, we compute the F score using the pins added to the boards after time t as X .We examine the effects of pruning by first removing the mostdiverse 10% of boards as described in Section 3.2. We then examine δ ) for the graph100110120130140150160 R e l a t i v e F s c o r e s ( % ) Relative F1 scoresRelative number of edges R e l a t i v e nu m b e r o f e dg e s ( % ) Figure 4: F1 scores for link prediction and number of edgesfor different graph pruning factors. the effect of pruning pin degree by varying the pruning factor δ .Recall that δ = δ prunesthe graph more. Figure 4 shows that the number of edges in thepruned graph decreases with δ monotonically.The F score changes as we keep pruning the graph. When δ becomes too low, even relevant edges are pruned and the qualityof the recommendations deteriorates. However, we also observethat graph pruning significantly improves the quality of recommen-dations. Figure 4 shows that when δ = .
91, the F score peaks at58% above the unpruned graph F and the graph contains only 20%the original number of edges. This means that the graph pruningactually improves the quality of recommendations by 58%, whilealso pruning the graph to only a sixth of its size.Finally, we show how graph pruning affects the memory usageand the runtime of Pixie in Figure 5. As the size of the graph de-creases, both the memory as well as the Pixie Random Walk runtimedecrease significantly. .750.800.850.900.951.00 Prune factor for the graph0255075100125150175200 M e m o r y u s a g e ( i n G B ) T i m e f o r e a c h r a n d o m w a l k ( i n m illi s e c o n d s ) Memory usage (left axis)Time of random walk (right axis)
Figure 5: The memory usage and Pixie runtime against dif-ferent pruned graphs.
There are many Pinterest applications that use Pixie to generaterelevant and timely recommendations. We shall discuss some ofthem below.
When a user loads Pinterest, they view a grid of pins that theymight find relevant to save in their boards. This grid is called theuser Homefeed. Using the real-time recommendations of Pixie, weare able to create an engaging and responsive Homefeed. Everytime the user takes an action on a pin (such as clicking, liking, orsaving a pin), we create a query, send it to a Pixie server, and refreshthe recommendations. More precisely, to obtain a Pixie query, wecollect all pins where user performed an action and assign a weightto each pin. A single pin’s initial weight depends on the actiontype, and decays with a half life of λ . User’s pins are then collectedin to a single Pixie query. The pin recommendations in the Pixieresponse are ranked and added to the user’s Homefeed. In an A/Bexperiment switching one of the offline Hadoop based sources ofpins for users to the Pixie system improved saves per pin by 50%. When a user clicks on a pin at Pinterest, we show Related Pinsbelow. For example, if a user clicks on a pin containing a trumpetthen they can browse other pins containing trumpets. Pixie is theprimary source of candidates for Related Pins, though the full sys-tem features several additional layers [22]. The queries arising inthis application contain a single pin that the user is viewing. Thisapplication requires the recommended pins to be very similar to theoriginal query pin. For example, we would prefer that the RelatedPins of a trumpet contain other trumpets and not some other musi-cal instrument. One general observation has been that as the lengthof random walk increases, Pixie visits pins that are increasinglydiverse. Therefore, for Related Pins we hypothesised that userswould prefer shorter walks compared to Homefeed. In fact, we ran online A/B tests on users which showed that by simply decreasingthe walk length lead to a significant lift in engagement—the numberof Related Pins saved per day increased by 3%.Similarly, we use Pixie to generate pins related to a board. Whenusers view their own board, Pinterest suggests other pins they cansave to this board. The Pixie query here consist of the last ten pinsadded to the board. This application helps the users grow theirboards and have better collections.
Another important recommendation problem at Pinterest is “PickedFor You” (PFY) boards. In the original Pinterest ecosystem usersmanually followed other user’s boards to have new pins added tothe follower’s stream as they were added to the boards, similarto other following based systems. This manual approach requiresa lot of work from users to find and maintain the list of boardsinteresting to them for following and frequently users would notmanually follow enough boards to get a good feed of new contentor would not update them as their tastes or interest changed leadingto reduced engagement with pins.Here Pixie recommends boards and then delivers the most recentnew pins from those boards to a user’s homefeed. This approachhas the benefit of providing additional diversity and a natural dis-tribution of cold-start, new, and trending content on the site.Using Pixie for board recommendations has allowed us to depre-cate the old offline systems while the A/B experiment also showsthat saves per pin improved by 26%.These are only few examples of applications that use Pixie—others include the email and personalized articles. Over half of allthe pins that users save each day on Pinterest come from systemsbacked by Pixie.
In this paper we presented Pixie, a flexible graph-based real-timerecommender system that we built and deployed at Pinterest. Eachserver in the Pixie fleet holds the entire bipartite graph of over abillion pins and boards, and supports 1,200 queries per second witha 99-percentile latency of 60 milliseconds.We have implemented and deployed the novel Pixie RandomWalk algorithm. Our offline experiments have empirically demon-strated high-quality of recommendations, robustness as well as theefficiency of the algorithm. Furthermore, online A/B experimentshave been used to launch Pixie to power multiple Pinterest surfaces,most notably the Homefeed and Related Pins products, so that nowover half of all pins saved on Pinterest each day come from systemsbacked by Pixie. We have also found Pixie useful for performingother tasks like label-propagation on the order of minutes insteadof days using distributed systems like Hadoop.Thanks to the high performance, scalability, and generic natureof the Pixie architecture and algorithms, we anticipate a brightfuture with new algorithms and graphs featuring novel node typesand edge definitions for even more applications at Pinterest.
Acknowledgments.
Allan Blair, Jeremy Carroll, Collins Chung,Yixue Li, David Liu, Jenny Liu, Peter Lofgren, Kevin Ma, and StephanieRogers among many who helped make Pixie a success! EFERENCES [1] D. Agarwal, B. Chen, Q. He, Z. Hua, G. Lebanon, Y. Ma, P. Shiv-aswamy, H. Tseng, J. Yang, and L. Zhang. Personalizing linkedinfeed. In
KDD , pages 1651–1660, 2015.[2] L. Backstrom and J. Leskovec. Supervised random walks: pre-dicting and recommending links in social networks. In
WSDM ,pages 635–644, 2011.[3] B. Bahmani, A. Chowdhury, and A. Goel. Fast incremental andpersonalized pagerank.
PVLDB , 4(3):173–184, 2010.[4] S. Baluja, R. Seth, D. Sivakumar, Y. Jing, J. .Yagnik, S. Kumar,D. Ravichandran, and M. Aly. Video suggestion and discoveryfor youtube: taking random walks through the view graph. In
WWW , pages 895–904, 2008.[5] J. Bennett and S. Lanning. The Netflix prize. In
In KDD Cupand Workshop in conjunction with KDD , 2007.[6] H. Cheng, L. Koc, J. Harmsen, T. Shaked, T. Chandra, H. Arad-hye, G. Anderson, G. Corrado, W. Chai, M. Ispir, et al. Wide &deep learning for recommender systems. In
DLRS Workshop ,pages 7–10, 2016.[7] P. Covington, J. Adams, and E. Sargin. Deep neural networksfor youtube recommendations. In
RecSys , pages 191–198, 2016.[8] P. Covington, J. Adams, and E. Sargin. Deep neural networksfor youtube recommendations. In
RecSys , pages 191–198, 2016.[9] A. Das, M. Datar, A. Garg, and S. Rajaram. Google news per-sonalization: scalable online collaborative filtering. In
WWW ,pages 271–280, 2007.[10] J. Davidson, B. Liebald, J. Liu, P. Nandy, T. V. Vleet, U. Gargi,S. Gupta, Y. He, M. Lambert, B. Livingston, and D. Sampath.The youtube video recommendation system. In
RecSys , pages293–296, 2010.[11] A. V. den Oord, S. Dieleman, and B. Schrauwen. Deep content-based music recommendation. In
NIPS , pages 2643–2651, 2013.[12] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng,and T. Darrell. Decaf: A deep convolutional activation featurefor generic visual recognition. In
International conference onmachine learning , pages 647–655, 2014.[13] D. Fogaras, B. Rácz, K. Csalogány, and T. Sarlós. Towardsscaling fully personalized pagerank: Algorithms, lower bounds,and experiments.
Internet Mathematics , 2(3):333–358, 2005.[14] A. Goel, P. Gupta, J. Sirois, D. Wang, A. Sharma, and S. Guru-murthy. The who-to-follow system at twitter: Strategy, algo-rithms, and revenue impact.
Interfaces , 45(1):98–107, 2015.[15] P. Gupta, A. Goel, J. J. Lin, A. Sharma, D. Wang, and R. Zadeh.WTF: the who to follow service at twitter. In
WWW , pages505–514, 2013.[16] M. Kabiljo and A. Ilic. Recommending items to more than abillion people.[17] J. A. Konstan, B. N. Miller, D. Maltz, J. L. Herlocker, L. R. Gor-don, and J. Riedl. Grouplens: applying collaborative filtering tousenet news.
Communications of the ACM , 40(3):77–87, 1997.[18] Y. Koren, R. M. Bell, and C. Volinsky. Matrix factorizationtechniques for recommender systems.
IEEE Computer , 42(8):30–37, 2009.[19] R. Lempel and S. Moran. SALSA: the stochastic approach forlink-structure analysis.
ACM Trans. Inf. Syst. , 19(2):131–160,2001. [20] J. Leskovec and R. Sosic. SNAP: A general-purpose networkanalysis and graph-mining library.
ACM TIST , 8(1):1:1–1:20,2016.[21] G. Linden, B. Smith, and J. York. Amazon.com recommenda-tions: Item-to-item collaborative filtering.
IEEE Internet Com-puting , 7(1):76–80, 2003.[22] D. Liu, S. Rogers, R. Shiau, D. Kislyuk, K. Ma, Z. Zhong, J. Liu,and Y. Jing. Related pins at pinterest: The evolution of a real-world recommender system. In
WWW , 2017.[23] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean.Distributed representations of words and phrases and theircompositionality. pages 3111–3119, 2013.[24] M. J. Pazzani and D. Billsus. Content-based recommendationsystems. In
The adaptive web , pages 325–341. Springer, 2007.[25] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Item-basedcollaborative filtering recommendation algorithms. In
WWW ,pages 285–295, 2001.[26] A. Sharma, J. Jiang, P. Bommannavar, B. Larson, and J. J.Lin. Graphjet: Real-time content recommendations at twit-ter.
PVLDB , 9(13):1281–1292, 2016.[27] K. Simonyan and A. Zisserman. Very deep convolutionalnetworks for large-scale image recognition. arXiv preprintarXiv:1409.1556 , 2014.[28] H. Tong, C. Faloutsos, and J. Pan. Fast random walk withrestart and its applications. In
Proceedings of the Sixth Inter-national Conference on Data Mining , ICDM ’06, pages 613–622,2006.[29] L. Zheng, V. Noroozi, , and P. S. Yu. Joint deep modeling ofusers and items using reviews for recommendation. In
WSDM ,pages 425–434, 2017.[30] Y. Zhuang, W. Chin, Y. Juan, and C. Lin. A fast parallel sgdfor matrix factorization in shared memory systems. In
RecSys ,pages 249–256, 2013.,pages 249–256, 2013.