Freudian and Newtonian Recurrent Cell for Sequential Recommendation
FFreudian and Newtonian Recurrent Cell forSequential Recommendation
Hoyeop Lee , Jinbae Im , Chang Ouk Kim , and Sehee Chung Knowledge AI Lab., NCSOFT Co., South Korea Department of Industrial Engineering, Yonsei University, SouthKorea * e-mail: [email protected] Abstract
A sequential recommender system aims to recommend attractive itemsto users based on behaviour patterns. The predominant sequential rec-ommendation models are based on natural language processing models,such as the gated recurrent unit, that embed items in some defined spaceand grasp the user’s long-term and short-term preferences based on theitem embeddings. However, these approaches lack fundamental insightinto how such models are related to the user’s inherent decision-makingprocess. To provide this insight, we propose a novel recurrent cell, namely
FaNC , from Freudian and Newtonian perspectives.
FaNC divides the user’sstate into conscious and unconscious states, and the user’s decision pro-cess is modelled by Freud’s two principles: the pleasure principle andreality principle. To model the pleasure principle, i.e. , free-floating user’sinstinct, we place the user’s unconscious state and item embeddings inthe same latent space and subject them to Newton’s law of gravitation.Moreover, to recommend items to users, we model the reality principle, i.e. , balancing the conscious and unconscious states, via a gating function.Based on extensive experiments on various benchmark datasets, this paperprovides insight into the characteristics of the proposed model.
FaNC ini-tiates a new direction of sequential recommendations at the convergenceof psychoanalysis and recommender systems.
With the explosion of information on the Internet, recommender systems [1, 2, 3]have become necessary for both users and service providers of web and mobileapplications, e.g. , web search, e-commerce, online music and video streaming,and social network services. Most predominant recommender systems view es-timating a sequence of user’s preferences from the machine learning perspective1 a r X i v : . [ c s . I R ] F e b nd thereby use natural language processing (NLP) models, such as the recur-rent neural network [4], attention mechanism [5], and their variants [6, 7], to cap-ture the user’s preference dynamics and to predict subsequent items [8, 9, 10, 11].Such sequential models have shown remarkable performance in the various rec-ommendation domains, such as movies [12], music [13], news articles [14], ande-commerce [15], by encoding items in a latent space and modelling the dy-namics of user’s short-term and long-term preferences in the latent space basedon the encoded items. However, such approaches lack fundamental insight intohow the models are related to the user’s inherent decision-making process.This paper proposes a novel sequential recommendation model that pro-vides insights from two perspectives: Freudian [16] and Newtonian [17]. Fromthe Freudian perspective, claiming that a human’s decision-making process isderived from not only the conscious but also the unconscious [18, 19], we ex-pect to enhance the recommendation model by reflecting the user’s consciousand unconscious. Freud, the founder of psychoanalysis, interpreted the con-scious and unconscious decision-making process through two competing prin-ciples, the pleasure principle and reality principle [20]. The pleasure principleencourages immediate gratification of unconscious instinct, but the instinct isinterrupted by the reality principle, which represents mankind’s conscious be-ing rational . In this paper, the conscious state represents the user’s thoughtsdirectly affected by interactions, such as consumption, and we suppose that thisstate exists in a latent space. Moreover, the unconscious state represents the free-floating user’s instinct underlying the conscious state, and this paper as-sumes that it lies in another latent space based on the dissociations betweenexplicit ( i.e. , conscious) and implicit ( i.e. , unconscious) memory [21, 22]. Tomodel the two competing principles for sequential recommendation, this paperregards buying a product and the subsequent experience with the product asan external stimulus, as shown in Fig. 1. Since the product and subsequentexperience change our conscious perception of the product family, the externalstimulus shifts the conscious state. Afterwards, as per psychological findingsthat the conscious selectively affects the unconscious, e.g. , by sleep [23], we letthe conscious state discriminatively impact the unconscious state. Thus, the ex-ternal stimulus indirectly makes the unconscious state nervous. In this paper,the pleasure principle lets the unconscious state be free-floating by reducing theenergy level of the unconscious ( i.e. , psychic energy [24]), and the state becomescalm. Consequently, the reality principle allows decision making to be appropri-ate for selecting a subsequent item by balancing the conscious and unconsciousstates. In other words, the customer’s next purchase is influenced by the shiftedand free-floating states, which means ‘buying behaviour’ and ‘conscious andunconscious states’ affect each other. Note that when explaining the buyingdecision process, we use the term ‘appropriate’ rather than ‘rational’ becausenot all users make rational consumption decisions, e.g. , impulse buying [25].The Freudian perspective’s main challenge is modelling the fluidity of theunconscious state, and we approach this task from the Newtonian perspective.To do so, we remind the reader of Newton’s law of universal gravitation: theattraction between two particles in Euclidean space is directly proportional to2 onscious state (Cs.) Decision-making based on the Cs. And Ucs. states (reality principle) Item consumption (external stimuli) Seeking gratification (pleasure principle)
Unconscious state (Ucs.)
Energy level of Ucs.
Figure 1: Illustration of dynamics of conscious and unconscious states and conse-quence decision-making process. As Freud’s theory states, the decision-makingprocess is derived from the conscious and unconscious, so we divide a user’sstate into conscious and unconscious states. The conscious and unconsciousstates are modelled via the pleasure principle and reality principle. Before thetwo principles are applied, the purchased item ( i.e. , external stimuli) shifts theconscious and unconscious states. Afterwards, the pleasure principle enablesthe unconscious state to seek gratification by reducing its energy level. Specifi-cally, we place the unconscious state and items in the same latent space so thatthere is a gravitational potential energy. In other words, the unconscious stateand items follow Newton’s law of universal gravitation, which implies that anitem attracts the user unconsciously. We regard the floating unconscious stateaccording to gravity as the pleasure principle because the energy decreases asthe unconscious state flows through gravity. Subsequently, the reality principleenables the final decision ( i.e. , select next item) by balancing the two shiftedand free-floating states.their masses and inversely proportional to the square of the distance betweenthem [17]. As a user is involuntarily attracted to an item, we presume that theuser’s unconscious state and items are the particles of Newton’s law. Specifically,we place the unconscious state and items in the same latent space, and we letthe items be fixed in the latent space, similarly to stars on the celestial surfacein traditional astrophysics. By contrast, our model enables the unconsciousstate to float through gravitational accelerations between the unconscious stateand items during the item-consumption time intervals. Thus, the unconsciousstate has gravitational potential energy, and the energy might increase ( i.e. ,nervous) when external stimuli are given. Since the potential energy decreases( i.e. , calm) as the unconscious state flows according to gravity, this paper viewsthe floating unconscious state via Newton’s law as Freud’s pleasure principle.
A sequential recommender system aims to recommend attractive items to auser based on the user’s behaviour in terms of recently consumed items. Thisstatement can be mathematically formulated as follows. Let I = { i , . . . , i N } be3 set of items. In contrast to previous studies that denote a behaviour sequenceas S = [ s , . . . , s L ], we define the behaviour sequence with L recent items as S = [( s , t ) , . . . , ( s L , t L )] , (1)where s l ∈ I ∀ l ∈ { , . . . , L } represents the l -th interacted item index and t l is the interaction time with the item. Because the previous definition can-not consider time information, we modify the definition for our study, wherethe unconscious state is free-floating during the time interval. The interactionbehaviour can be purchasing, writing a review, or browsing. Notably, our def-inition of behaviour sequence includes the previous definition because it canbe regarded as a special case of equation (1), which corresponds to the case of t l = l ∀ l ∈ { , . . . , L } ( i.e. , equal time intervals). Given the behaviour sequence S and the estimated next interaction time t L +1 , the sequential recommenda-tion model F aims to recommend the next item s L +1 , and this process can beformalised as s L +1 = F ( S , t L +1 ) . (2)Thus, the recommendation model expects the target user to interact withitem s L +1 at time t L +1 . Our recommendation model allows us to provide awhat-if analysis of consumption time. In other words, we can determine whatitem a user would have consumed if the user revisited in a week rather than amonth. The answer is necessary to service providers because they can determinethe time for promotion. Although research on forecasting the next interactiontime t L +1 [26, 27] is worth conducting, we consider it as being out-of-scope ofthis paper. This paper proposes a Freudian and Newtonian recurrent cell, namely,
FaNC (pro-nounced “fancy”), as illustrated in Fig. 2. Generally, a sequential recommenda-tion model F is composed of three layers: item embedding, sequence modelling,and recommendation layers [28]. The item embedding layer encodes items intoa low-dimensional real-valued latent space to measure the similarity betweenitems. Based on the embedded items, the sequence modelling layer captures thedynamics of the user’s interests, which affect the decision-making process. Therecommendation layer represents the user’s decision-making process and pre-dicts which item to consume in accordance with the captured interest. FaNC isa novel sequence modelling layer that models user’s conscious and unconscioussequential behaviour by dividing the user’s hidden state into conscious and un-conscious states. The sequential dynamics of the conscious state are modelledby the conventional recurrent layer. To model the flow of the unconscious statevia Newton’s law of gratification ( i.e. , pleasure principle), we employ the neuralordinary differential equation (ODE) solver.
FaNC produces a decision-making4 on sc i ou s ( C s . ) U n c on sc i ou s ( U cs . ) Next-item recommendation
Decision making state
Recurrent layerNeural ODE solver
Gravitational field
Cs. state at t l ( 𝒄 𝒍 ) Ucs. state at t l ( 𝒖 𝒍 ) Linear & gating layer
Linear & gating layer
Item Consumption
Cs. state at t l+1 ( 𝒄 𝒍+𝟏 ) Ucs. state at t l+1 ( 𝒄 𝒍+𝟏 ) Figure 2: Architecture of the Freudian and Newtonian recurrent cell, namely,
FaNC . To follow Freud’s (iceberg) theory, we place the trapezoid cell in the shapeof an iceberg between the blue sky and the blue ocean, which represent the areasof conscious and unconscious, respectively. The cell receives preceding consciousand unconscious states and a consumed item. Since the conscious state changeswhen an event occurs, i.e. , item consumption, we design the process throughthe recurrent layer, which generates the following conscious state.
FaNC modifiesthe previous unconscious state to the following unconscious state via the linear& gating layer and neural ordinary differential equation (ODE) solver. Theformer implies that the conscious state selectively affects the unconscious state,and the latter is used to model the pleasure principle. Specifically, we employNewton’s law of universal gravitation and model it via the neural ODE solverto model the pleasure principle. Then,
FaNC generates a decision-making stateby balancing the following conscious and unconscious states via another linear& gating layer that mimics the reality principle. Finally, our model provides arecommendation result based on the decision-making state.state by balancing the conscious and unconscious states ( i.e. , reality principle),and it includes a user’s interests. Thus, the subsequent recommendation layermakes a recommendation based on the decision-making state.
As in NLP research [29, 30], recent studies in sequential recommendation convertan item index into a low-dimensional vector, called an item embedding vector.Traditional recommendation models often use categorical metadata of items,such as genre and manufacturer, and convert the item index into a concatenationof multiple one-hot vectors in which only the item’s metadata value is one andthe remaining values are zero [31]. These vectors are very sparse, and thesimilarity between items is difficult to measure. Recent works [8, 11, 28, 32]employ an embedding layer to resolve these limitations, and this study also usesan item embedding layer. The item embedding layer f (cid:15) : N → R d u projects the5tem index onto a d u -dimensional real-valued dense vector as e l = f (cid:15) ( s l ) , ∀ l (3)where e l represents the item embedding vector for item s l . We assume that theitem embedding vector is in the same space as the unconscious state so that wecan apply Newton’s law, which is why we use u as a subscript of dimensionality. FaNC operates in the same way as a recurrent cell by receiving a consumed itemand the previous hidden state and producing a new hidden state. Since wedivide the user’s hidden state into conscious and unconscious states, our celltakes the consumed item embedding e l and previous conscious and unconsciousstates, denoted as c l − and u l − , respectively, and outputs the new conscious andunconscious states, c l and u l . Furthermore, FaNC generates a decision-makingstate d l by balancing the conscious and unconscious states. We introduce thedetailed procedure of FaNC as follows.
FaNC models the dynamics of the user’s conscious state c l ∈ R d c with a recurrentlayer that updates the conscious state only when the user consumes an item,where d c is the dimensionality of the conscious state. Specifically, the recurrentlayer has the same architecture as the gated recurrent unit (GRU) [4] becausethe GRU outperforms other recurrent cells for sequential recommendation [8].The GRU is commonly represented as c l = GRU( c l − , e l ) . (4)Additionally, we can expand the GRU function into the following equations (5)–(8). The GRU linearly interpolates the previous conscious state c l − and thecandidate conscious state ˆ c l as c l = (1 − z l ) c l − + z l ˆ c l , (5)where the update gate z l is given by z l = σ ( W z e l + U z c l − ) , (6)where σ is an activation function, such as a logistic sigmoid function. Thecandidate conscious state ˆ c l is computed as follows:ˆ c l = tanh( W c e l + U c ( g l (cid:12) c l − )) , (7)where (cid:12) represents the element-wise product and the reset gate g l is calculatedas g l = σ ( W g e l + U g c l − ) . (8)6 .3.2 Modelling the dynamics of the unconscious state We suppose that the free-floating dynamics of the unconscious state followFreud’s pleasure principle and model the dynamics via Newton’s law of univer-sal gravitation. Since we presume the item embedding vectors and unconsciousstate are in the same latent space, the unconscious state has gravitational po-tential energy. This paper regards reducing the potential energy as imitatingthe pleasure principle. To do so, we let the unconscious state u l be floatingaccording to the gravitational acceleration between the unconscious state andthe whole set of items I during the item-consumption time intervals t l − to t l . Based on physics research [33, 34], the d u -dimensional gravitational force F applied to the unconscious state u by an item i at time t is calculated as F ( u, i, t ) = Gm u m i (cid:107) r ( u, i, t ) (cid:107) d u r ( u, i, t ) , (9)where r ( u, i, t ) is the displacement from u to i , G is the gravitational constant, m u and m i are the masses of the unconscious state and item i , and (cid:107)·(cid:107) denotesthe norm of a vector. The mass of item m i represents the attractiveness of item i . Based on equation (9), we calculate the net force, which is the sum of theforces of all items and determines the net acceleration, as (cid:88) i ∈I F ( u, i, t ) (cid:124) (cid:123)(cid:122) (cid:125) net force = m u (cid:88) i ∈I Gm i (cid:107) r ( u, i, t ) (cid:107) d u r ( u, i, t ) (cid:124) (cid:123)(cid:122) (cid:125) net acceleration = m u a ( u, t ) , (10)where a ( u, t ) is the net acceleration for unconscious state u at time t . Thispaper does not address the mass of unconscious m u because it does not affectthe position ( i.e. , unconscious state u ). Since only the net acceleration influencesthe position, we can write the relation between position and net acceleration asthe following second-order ODE: ¨ u ( t ) = a ( u, t ) (11)Equation (11) can be reduced to an equivalent system of coupled first-orderODEs: ˙ u ( t ) = v ( u, t )˙ v ( u, t ) = a ( u, t ) , (cid:20) u l v l (cid:21) = (cid:20) u l − v l − (cid:21) + (cid:90) t l t l − (cid:20) v ( u, t ) a ( u, t ) (cid:21) d t, (12)= (cid:20) u l − v l − (cid:21) + (cid:90) t l − t l − (cid:20) v ( u, t (cid:48) + t l ) a ( u, t (cid:48) + t l ) (cid:21) d t (cid:48) , (13)where v l is the unconscious state’s velocity at time t l . Furthermore, we call thecombined vector of u l and v l the extended unconscious state, i.e. , h l = [ u l , v l ]. Inaddition, this paper sets u and v as zero vectors. Additionally, a ( u, t ) changes7s the unconscious state u varies in every moment because equation (10) holds.We substitute t (cid:48) for t − t l in equation (12) and obtain equation (13), whichimplies that time does not matter when the position and velocity are the samein the gravitational field. To solve the above ODEs, we have to determine G and m i in equation (10). We set the gravitational constant G to one in therecommender system for ease of calculation such that we need to estimate onlythe parameters m i .From the machine learning perspective, to estimate the parameters of theunconscious state under the ODE, this paper employs a neural ODE [35], whichis the continuous deep learning model. The neural ODE considers the ODEsolver as a black box function and conducts backpropagation using the adjointmethod [36]. The black box function receives an initial extended unconsciousstate h l − , gravitational acceleration function a ( u, t ), start time t l − , stop time t l , and the parameters m i and e i for all i , and produces a floated extendedunconscious state h l , as follows: h l = ODESolver ( h l − , a ( u, t ) , t l − , t l , { m i , e i | i ∈ I} ) . (14)However, ODEs have a critical limitation that they cannot cross paths, whichyields an important implication: modelling the unconscious state based on onlyODEs indicates that most users have isolated preferences. Therefore, we shoulddevise a means of shifting (escaping) the unconscious state to address this lim-itation. This paper solves the limitation by shifting the unconscious state byconnecting the conscious state to the unconscious state via the linear layer f θ and gating function f γ as h (cid:48) l − = γ l (cid:12) f θ ( c l ) + (1 − γ l ) (cid:12) h l − , (15)where the connection gate γ l is given by γ l = f γ ( c l , h l − ) = σ ( W γ c l + U γ h l − ) . (16)The unconscious state is instantaneously shifted when the conscious state isupdated; thus, the conscious has an immediate impact on the unconscious.Therefore, equation (14) should be modified as h l = ODESolver (cid:16) h (cid:48) l − , a ( u, t ) , t l − , t l , { m i , e i | i ∈ I} (cid:17) . (17)Notably, we shift not only the position but also its velocity. This process can beregarded as an asteroid crash on a planet that changes a planet’s velocity andposition. The proposed model generates a decision-making state d l at time t l by balancingthe conscious state c l and unconscious state u l via the linear layer f φ and gatingfunction f γ as d l = δ l (cid:12) f φ ( c l ) + (1 − δ l ) (cid:12) u l , (18)8here the decision-making gate δ l is given by δ l = f δ ( c l , u l ) = σ ( W δ c l + U δ u l ) . (19)This process mimics Freud’s reality principle: δ l determines the importance ofthe conscious and unconscious states. When δ l is large, the conscious state ismore influential than the unconscious state; in other words, the user is relativelyrational. On the contrary, when δ l is small, the unconscious state is essentialfor estimating consumption. The proposed model recommends the nearest items to the decision-making stateat time t l as s l = arg min i ∈I (cid:107) d l − e i (cid:107) . (20)Sharing item embedding vectors ( i.e. , e l ) in the item-embedding layer and near-est item finder can reduce overfitting. To train our model by backpropagation,we modify equation (20) to be differentiable, as follows: p li = e −(cid:107) d l − e i (cid:107) (cid:80) j e −(cid:107) d l − e j (cid:107) , (21)where p li is the probability of item i being recommended at time t l . Thisequation is a form of the softmax function, which receives the negative squaredEuclidean distance between the decision-making state and item embedding. We train the proposed model in an end-to-end manner. To learn the model pa-rameters described in equations (3) to (14), we use a cross-entropy loss functionas an objective function to be minimised: L = − B (cid:88) b =1 (cid:88) l,i y bli log p li , (22)where B is the mini-batch size and y bli is a binary indicator of whether item i was interacted with at the l -th order in the b -th behaviour sequence. Let S b be the b -th behaviour sequence and s bl and t bl be the interacted item indexand interaction time in the sequence, respectively. Then, y bli has a value of oneonly when i = s bl and is zero otherwise. We obtain p li by passing s l throughthe embedding layer, FaNC , and recommendation layer in consecutive order andtrain the whole model by backpropagating the loss. This paper uses theAdamoptimiser [37] with gradual warm-up learning rate scheduler [38] on the mini-batch (see Supplementary Section I for details on mini-batching strategy).9
Results
Fig. 3 shows the recommendation performance of
FaNC and four baselines (seeSupplementary Section II.1 for details on the baselines) on six real-world bench-mark datasets (see Supplementary Section II.2 for details on datasets). Froman economic perspective [39, 40], we can classify the datasets into three productgroups – search product group, experience product group, and mixed productgroup. The search product group is a set of goods whose characteristics areclearly evaluated before purchase, and the experience product group is a collec-tion of goods whose characteristics are difficult to assess in advance but can beascertained after consumption. Some goods in the mixed product group havecharacteristics of the search product group, and the remaining goods belong tothe experience product group. The ‘clothing, shoes, & jewelry’ and ‘patio, lawn,& garden’ datasets belong to the search product group, and the ‘beauty’ datasetbelongs to the experience product group. The remaining datasets belong to themixed product group. Our results demonstrate that
FaNC is effective on thesearch product group, whereas the proposed model is less effective on the expe-rience and mixed product groups. These results may be derived by modellingthe unconscious state, and we further investigate the inherent mechanism of theproposed model.Furthermore, we observed that the models’ performance varies dependingon the product group. For the experience product group, SASRec recorded thehighest performance, and ours showed the lowest performance. By contrast,the opposite result is observed for the search product group. One plausibleexplanation is that this pattern occurred due to the differences in statistics be-tween product groups (Supplementary Table S1). As the datasets belongingto the search product group generally contain a large number of items, SAS-Rec may have difficulty learning the representations of many items. However,
FaNC achieved better results since it reflects human’s inductive biases.
Fig. 4 shows how the pleasure and reality principles work in the trained pro-posed model. From the figure, we observed two characteristics of
FaNC . First,the reality principle tends to suppress the unconscious state as the state floatsaccording to the pleasure principle. For all datasets, the proposed model tendsto place more importance on the conscious state as the unconscious state isfree-floating, even though users’ unconscious states did not flow in the samedirection. Thus, users may be rational over the long term. In other words, thelonger a user spends contemplating before buying a product, the more rationalthe user becomes. Second, this paper also found that
FaNC behaves similarlyto actual consumers. The marginal distributions of importance ( i.e. , verticaldistribution) of search product groups are upper skewed, and those of the othergroups are lower skewed or uniform. Thus, our model increased the importance10igure 3: Performance of
FaNC on real-world benchmark datasets. We evaluated
FaNC on six datasets of three different groups – search product group, experienceproduct group, and mixed product group – in terms of recall (top) and nDCG(bottom). All values shown in the figures are the averages of five replicates.of the conscious state when providing recommendations on search product com-pared to that on other products. These results are in accordance with consumerbehaviour, which is more dependent on the conscious than unconscious whenbuying search products ( i.e. , simple situation) compared to experience prod-ucts ( i.e. , complicated situation) [41, 42]. The latter characteristic may alsobe responsible for the different results for the search product group and othergroups. Notably, we did not intend for our model to have these characteristics;we merely included the free-floating unconscious state in the model.
Since this paper modified the definition of the sequential recommendation modelas in the equation (2),
FaNC can provide a what-if analysis of consumption time.We illustrate two examples of a what-if analysis of consumption time on themovie dataset in Fig. 5. The first user in Fig. 5a had seen gloomy movies11 a) (b) (c)(d) (e) (f)
Figure 4: Scatter plot with marginal histograms of user behaviour via pleasureprinciple and reality principle on the movie (a), baby (b), toys & games (c),clothing, shoes & jewerly (d), patio, lawn & garden (e), and beauty (f) datasets.The x- and y-axes represent the displacement of the unconscious state u l duringthe free-floating process ( i.e. , pleasure principle) and the average importanceof the conscious state ¯ δ when making a decision ( i.e. , reality principle), respec-tively. The importance ranges from zero to one: the conscious state is moreinfluential than the unconscious state when the value is large.before actually consuming the item wizard of oz . This example shows that FaNC fails to successfully recommend an item when users’ preferences changedramatically: our model recommended dark types of movies that are similarbut slightly different. The second user in Fig. 5b alternated between romanticand dark movies, and
FaNC recommended a crime movie and an action-comedymovie. The results imply that time might be an important factor because theuser’s unconscious evolves. Therefore, the effectiveness of recommendation canbe maximised if the service provider considers the time, such as the releasedate [43, 44] and advertising time [45], to maximise profit.
Even though the proposed model reflects the users’ unconscious state,
FaNC fol-lows Freud’s unconscious in a complete sense. Freud believed that the uncon-scious determines individual behaviour. Our modelling lets users’ unconscious12 .5 days 3 days 0.2days2. Night of the Living Dead (Horror, Sci-Fi)
3. Invasion of the Body Snatchers (Horror, Sci-Fi)
4. Nosferatu (Horror)
5. Forbidden Planet (Drama, Sci-Fi)
Actual consumption:Wizard of Oz (Adventure, Children)
ℱ 𝑆,0.6 days later :Django Unchained (Action, Drama)
ℱ 𝑆,4 days later :Inception (Action, Crime)
Timeline (Comedy) (a)
Actual consumption:Shaolin Soccer (Action, Comedy)
ℱ 𝑆,2 days later :Taken 2 (Action, Crime)
ℱ 𝑆,a week later :Kick-Ass (Action, Comedy)
Timeline
10 days 3 days 6 days
1. Before
Sunrise (Drama, Romance)
2. The Village (Drama, Mystery)
3. Resident Evil (Action, Horror)
4. Before `Sunset (Drama, Romance)
5. Paycheck (Action, Sci-Fi)
10 days (b)
Figure 5: Illustration of a what-if analysis of consumption time for two sam-ple users on the movie dataset. The top-left five movies indicate the recentlyconsumed items, and the top-right movie is the actually consumed item. Thebottom-right two movies are the recommended movies at the correspondingtime. (a) is the what-if analysis if the next item was consumed later than it ac-tually was, and (b) is the what-if analysis if the next item was consumed earlierthan it actually was. This figure shows the genre of each movie below the title.states lie in the same latent space as the item embeddings, which means that
FaNC models users’ common behaviour patterns. By contrast, Jung, a psy-choanalyst as famous as Freud, claimed that the unconscious can be dividedinto a collective unconscious and personal unconscious [46]; thus,
FaNC can beregarded as reflecting the collective unconscious, which does not develop in-dividually but is inherited because we are human. To model a single user’sindividual behaviour pattern from the Freudian perspective, we must considerthe personal unconscious, which develops individually. We expect meta-learningmethods [47, 48, 49] to be candidates for modelling personal unconscious be-cause they can produce a personalised item embedding space or decision-makingprocess based on consumed items [50, 51]. Considering the personal unconscious13ia meta-learning may improve the performance not only on the search productgroup but also on the experience product group.Another interesting direction to improve our model exists. We assumed thatonly the user’s unconscious state is floating and that the item embeddings arefixed in the latent space. These assumptions were made because the N -bodyproblem, i.e., predicting the individual motions of many particles interactingwith each other gravitationally, is intractable in physics [52], and placing itemsin the latent space made our model tractable. We believe that it is possibleto make the N -body problem in the recommender system tractable by usingadvanced machine learning methods in the future. This paper proposed a recurrent cell, namely,
FaNC , for the sequential recom-mendation from the Freudian and Newtonian perspectives. From the Freudianperspective, we divided a user’s state into conscious and unconscious statesand modelled the user’s conscious and unconscious decision-making process bymeans of the pleasure and reality principles. To model the pleasure princi-ple, i.e. , the user’s free-floating unconscious, this paper followed the Newtonianperspective. We modelled the reality principle via a gating function that bal-ances the conscious and unconscious states. The proposed model outperformedbaseline methods on datasets belonging to the search product group and pro-duced comparable results on the other datasets.
FaNC opens a new directionof sequential recommendation at the convergence of psychoanalysis and recom-mender systems that enables us to understand users’ decision-making processes.
Data availability
The movie (MovieLens20M) dataset is publicly available at https://grouplens.org/datasets/movielens/ and other datasets are is publicly available at https://jmcauley.ucsd.edu/data/amazon/ . References [1] Sarwar, B., Karypis, G., Konstan, J. & Riedl, J. Item-based collaborativefiltering recommendation slgorithms. In
Proceedings of the 10th Interna-tional Conference on World Wide Web , 285–295 (2001).[2] Linden, G., Smith, B. & York, J. Amazon.com recommendations: Item-to-item collaborative filtering.
IEEE Internet Computing , 76–80 (2003).[3] Koren, Y., Bell, R. & Volinsky, C. Matrix factorization techniques forrecommender systems. Computer , 30–37 (2009).144] Cho, K. et al. Learning phrase representations using rnn encoder–decoderfor statistical machine translation. In
Proceedings of the 2014 Conferenceon Empirical Methods in Natural Language Processing , 1724–1734 (2014).[5] Bahdanau, D., Cho, K. & Bengio, Y. Neural machine translation by jointlylearning to align and translate. In
Proceedings of 3rd International Con-ference on Learning Representations (2015).[6] Vaswani, A. et al.
Attention is all you need. In
Advances in Neural Infor-mation Processing Systems , 5998–6008 (2017).[7] Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: Pre-training ofdeep bidirectional transformers for language understanding. In
Proceedingsof the 2019 Conference of the North American Chapter of the Associationfor Computational Linguistics: Human Language Technologies (2019).[8] Hidasi, B., Karatzoglou, A., Baltrunas, L. & Tikk, D. Session-based rec-ommendations with recurrent neural networks. In
Proceedings of 4th In-ternational Conference on Learning Representations (2016).[9] Twardowski, B. Modelling contextual information in session-aware recom-mender systems with neural networks. In
Proceedings of the 10th ACMConference on Recommender Systems , 273–276 (2016).[10] He, R., Kang, W.-C. & McAuley, J. Translation-based recommendation. In
Proceedings of the 11th ACM Conference on Recommender Systems , 161–169 (2017).[11] Kang, W.-C. & McAuley, J. Self-attentive sequential recommendation. In , 197–206 (2018).[12] Harper, F. M. & Konstan, J. A. The movielens datasets: History andcontext.
ACM Transactions on Interactive Intelligent Systems , 1–19(2015).[13] Van den Oord, A., Dieleman, S. & Schrauwen, B. Deep content-based musicrecommendation. In Advances in Neural Information Processing Systems ,2643–2651 (2013).[14] Li, L., Chu, W., Langford, J. & Schapire, R. E. A contextual-bandit ap-proach to personalized news article recommendation. In
Proceedings of the19th International Conference on World Wide Web , 661–670 (2010).[15] McAuley, J., Targett, C., Shi, Q. & Van Den Hengel, A. Image-basedrecommendations on styles and substitutes. In
Proceedings of the 38thInternational ACM SIGIR Conference on Research and Development inInformation Retrieval , 43–52 (2015).[16] Freud, S.
The Interpretation of Dreams (Modern Library, New York, 1900).1517] Newton, I.
The Principia: Mathematical Principles of Natural Philosophy (University of California Press, 1999).[18] Dijksterhuis, A. Think different: The merits of unconscious thought inpreference development and decision making.
Journal of Personality andSocial Psychology , 586 (2004).[19] Newell, B. R. & Shanks, D. R. Unconscious influences on decision making:A critical review. Behavioral and Brain Sciences , 1–19 (2014).[20] Freud, S. Formulations on the two principles of mental functioning. In TheStandard Edition of the Complete Psychological Works of Sigmund Freud,Volume XII (1911-1913): The Case of Schreber, Papers on Technique andOther Works , 213–226 (1958).[21] Poldrack, R. A. et al.
Interactive memory systems in the human brain.
Nature , 546–550 (2001).[22] Rugg, M. D. et al.
Dissociation of the neural correlates of implicit andexplicit memory.
Nature , 595–598 (1998).[23] Saletin, J. M., Goldstein, A. N. & Walker, M. P. The role of sleep indirected forgetting and remembering of human memories.
Cerebral Cortex , 2534–2541 (2011).[24] Freud, S. Beyond the pleasure principle. In The Standard Edition of theComplete Psychological Works of Sigmund Freud, Volume XVIII (1920-1922): Beyond the Pleasure Principle, Group Psychology and Other Works ,1–64 (1955).[25] Rook, D. W. The buying impulse.
Journal of Consumer Research ,189–199 (1987).[26] Wang, J. & Zhang, Y. Opportunity model for e-commerce recommendation:Right product; right time. In Proceedings of the 36th International ACMSIGIR Conference on Research and Development in Information Retrieval ,303–312 (2013).[27] Bhagat, R., Muralidharan, S., Lobzhanidze, A. & Vishwanath, S. Buy itagain: Modeling repeat purchase recommendations. In
Proceedings of the24th ACM SIGKDD International Conference on Knowledge Discovery &Data Mining , 62–70 (2018).[28] Jang, S., Lee, H., Cho, H. & Sehee, C. Cities: Contextual inference oftail-item embeddings for sequential recommendation. In (2020).[29] Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S. & Dean, J. Dis-tributed representations of words and phrases and their compositionality.In
Advances in Neural Information Processing Systems , 3111–3119 (2013).1630] Pennington, J., Socher, R. & Manning, C. D. Glove: Global vectors forword representation. In
Proceedings of the 2014 Conference on EmpiricalMethods in Natural Language Processing , 1532–1543 (2014).[31] Debnath, S., Ganguly, N. & Mitra, P. Feature weighting in content basedrecommendation system using social network analysis. In
Proceedings ofthe 17th International Conference on World Wide Web , 1041–1042 (2008).[32] Sun, F. et al.
Bert4rec: Sequential recommendation with bidirectional en-coder representations from transformer. In
Proceedings of the 28th ACM In-ternational Conference on Information and Knowledge Management , 1441–1450 (2019).[33] Elmer, J. A. & Olenick, R. P. Gravitational law in n dimensions (1982).[34] Wilkins, D. Gravitational fields and the cosmological constant in multidi-mensional newtonian universes.
American Journal of Physics , 726–731(1986).[35] Chen, R. T., Rubanova, Y., Bettencourt, J. & Duvenaud, D. K. Neural or-dinary differential equations. In Advances in Neural Information ProcessingSystems , 6571–6583 (2018).[36] Pontryagin, L. S.
Mathematical Theory of Optimal Processes (Routledge,2018).[37] Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. In
Proceedings of 3rd International Conference on Learning Representations (2015).[38] Goyal, P. et al.
Accurate, large minibatch sgd: Training imagenet in 1hour. arXiv preprint arXiv:1706.02677 (2017).[39] Nelson, P. Information and consumer behavior.
Journal of Political Econ-omy , 311–329 (1970).[40] Franke, G. R., Huhmann, B. A. & Mothersbaugh, D. L. Information con-tent and consumer readership of print ads: A comparison of search andexperience products. Journal of the Academy of Marketing Science ,20–31 (2004).[41] Dijksterhuis, A., Bos, M. W., Nordgren, L. F. & Van Baaren, R. B. Onmaking the right choice: The deliberation-without-attention effect. Science , 1005–1007 (2006).[42] Gao, J., Zhang, C., Wang, K. & Ba, S. Understanding online purchasedecision making: The effects of unconscious thought, information quality,and information quantity.
Decision Support Systems , 772–781 (2012).1743] Chiou, L. The timing of movie releases: Evidence from the home videoindustry. International Journal of Industrial Organization , 1059–1073(2008).[44] Lee, K., Lee, H. & Kim, C. O. Pricing and timing strategies for newproduct using agent-based simulation of behavioural consumers. Journalof Artificial Societies and Social Simulation , 1 (2014).[45] Feichtinger, G., Hartl, R. F. & Sethi, S. P. Dynamic optimal control modelsin advertising: Recent developments. Management Science , 195–226(1994).[46] Jung, C. G. The structure of the unconscious. Collected Works (1916).[47] Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D. et al. Matching net-works for one shot learning. In
Advances in Neural Information Processingsystems , 3630–3638 (2016).[48] Snell, J., Swersky, K. & Zemel, R. Prototypical networks for few-shotlearning. In
Advances in Neural Information Processing Systems , 4077–4087 (2017).[49] Finn, C., Abbeel, P. & Levine, S. Model-agnostic meta-learning for fastadaptation of deep networks. In
Proceedings of the 34th International Con-ference on Machine Learning-Volume , 1126–1135 (2017).[50] Pan, F., Li, S., Ao, X., Tang, P. & He, Q. Warm up cold-start advertise-ments: Improving ctr predictions via learning to learn id embeddings. In
Proceedings of the 42nd International ACM SIGIR Conference on Researchand Development in Information Retrieval , 695–704 (2019).[51] Lee, H., Im, J., Jang, S., Cho, H. & Chung, S. Melu: Meta-learned userpreference estimator for cold-start recommendation. In
Proceedings of the25th ACM SIGKDD International Conference on Knowledge Discovery &Data Mining , 1073–1082 (2019).[52] Hemsendorf, M. & Merritt, D. Instability of the gravitational n-body prob-lem in the large-n limit.
The Astrophysical Journal , 606 (2002).18 upplementary MaterialsI Mini-batching strategy for
FaNC
To allow mini-batching under ODE, we utilise the trajectory tracking of theODE solver to obtain the next states from multiple users who have different timeintervals, as shown in Figure S1. The neural ODE has a drawback in that it isdifficult to train the model in a mini-batch-wise manner because the ODE solvertypically flows instances ( i.e. , unconscious states) in the mini-batch over thesame amount of time. Since users’ item-consumption patterns are diverse, weshould consider a mini-batch that has varied time intervals ( e.g. , t − t (cid:54) = t − t in the figure). Therefore, we group items with the same sequence order withinthe mini-batch and calculate the mini-batch states sequentially according to theorder. Because equation (13) holds, only the time intervals between the l − l -th order are required to calculate the l -th mini-batch states. To handledifferent time intervals in a mini-batch, we let all unconscious states float freelyduring the maximum time interval (which we call time padding) and track thefloating paths. Then, we assign the state of the corresponding time interval inthe path to each instance’s next state. This approach is one of the simplest waysto apply mini-batching, and the time padding is similar to the zero padding forthe mini-batch in NLP models [1, 2]. Batch ℬ … 𝒮 𝑡 𝑡 𝑡 𝑡 𝑡 𝑡 𝒮 𝑡 𝑡 𝑡 𝑡 𝑡 𝑡 𝒮 𝑡 𝑡 𝑡 𝑡 𝑡 𝑡 − 𝑡 max 𝑖 𝑡 − 𝑡 𝑡 − 𝑡 − 𝑡 max 𝑖 𝑡 − 𝑡 𝑡 − 𝑡 Trajectory tracking by ODE Solver
Figure S1: Illustration of the mini-batch trick. To enable the mini-batch strat-egy for
FaNC during training, we trace the floating paths to obtain the nextstates from users with different time intervals. After tracing, the model assignsthe unconscious state of the corresponding time interval in the path to the user’snext unconscious state. 19
I Experimental details
II.1 Comparison methods
We compared the proposed model with four baseline models. Two baselineswere classic models, and the others were NLP-based models for sequential rec-ommendation. Specifically, we employed the following models: • Classic recommendation models: – POP: This model, the simplest baseline, ranks items in descendingorder in terms of the popularity calculated based on the number ofuser-item interactions in the training set. – Factorised Markov chain (FMC): Following the first-order Markov as-sumption, this model ranks items according to transition probabilitygiven the item in the last action, which is estimated in the trainingset. • NLP-based sequential recommendation models: – GRU4Rec [3]: This model projects items in the embedding spaceand uses GRU to capture user behaviour dynamics based on theembedded items and to recommend items. – SASRec [4]: This model uses left-to-right unidirectional self-attentionlayers ( i.e. , Transformer [5]) to capture user behaviour dynamicsbased on the embedded items and recommends items in a similarway to GRU4Rec.We did not consider collaborative-filtering-based models because they are un-suitable for predicting items under the typical sequential recommendation set-ting.
II.2 Dataset
We considered six public benchmark datasets from real-world applications: Movie-Lens 20M (movie for short) [6], Amazon baby, Amazon beauty, Amazon cloth-ing, shoes, & jewelry, Amazon patio, lawn & garden, and Amazon toys &games [7]. The Movielens 20M dataset is one of the most widely used datasetsto evaluate recommendation models. The Amazon datasets are the corpora ofproduct reviews crawled from the online shopping platform
Amazon.com .According to previous studies on information economics [8, 9], the major-ity of products in the ‘clothing, shoes, & jewelry’ and ‘patio, lawn & garden’datasets are search products. Some of the products in ‘baby’ and ‘toys & games’datasets are search products and the rest are experience products. Moreover,most of the products in the ‘beauty’ and ‘movie’ datasets are experience prod-ucts. However, ‘movie’ also has characteristics of experience products. Asover-the-top services such as Netflix emerge, users are selecting films based onthe their thumbnails [10]. Thus, the thumbnail can provide partial information20ataset Numberof items Number oftrainingsequences Averagetime intervalMovie 27,278 502 1.27 weeksBaby 7,050 928 4.86 monthsToys & Games 11,924 1,487 5.21 monthsClothing, Shoes, & Jewelry 23,033 3,103 4.38 monthsPatio, Lawn & Garden 101,902 5,672 7.39 monthsBeauty 12,101 1,883 4.57 monthsTable S1: Statistics of datasets.about the characteristics of a movie to users and can be used by users to as-certain movies. Thus, the ‘movie’ dataset may show a similar tendency to theconsumption of the mixed search and experience products such as ‘baby’ and‘toys & games’. These characteristics yield different results because consumershave different consumption patterns [11, 12].This paper converted all datasets into the sequential recommendation set-ting, as in the previous work [4, 13]. We selected only behaviour sequences thathave different item interaction timestamps. Although previous research [4, 13]used timestamps to determine the sequence order of item interaction, they didnot provide information on how they decided the order when the items have thesame timestamp. Since using behaviour sequences with the same timestamp de-rive the recommendation model to train alphabetic order or numerical order ofthe items’ identification rather than the user’s sequential behaviour pattern, weremoved these sequences. Moreover, we set the time unit of the ODE to a weekfor the ‘movie’ dataset and to three months for the other datasets. To resolvethe memory issue of ODE, this paper limited the maximum time interval to 1.5units of time, i.e. , the time padding was set to 1.5 time units. Then, we dividedthe training, validation, and test datasets based on the sequence’s identification.We randomly selected eighty percent of the sequences for training, ten percentfor validation, and the rest for testing. Notably, we used the all items exceptthe actually consumed item as negative samples. Therefore, we did not conducta negative sampling of items for the prediction, as in previous papers [4, 14], be-cause that approach may cause the performance to be overestimated comparedto the actual recommendation environment, where sampling is difficult to apply.Table S1 shows the statistics of the datasets after selecting the sequences.
II.3 Evaluation measures
The performance indicators used in this study were the recall at k (Recall@ k )and normalised discounted cumulative gain at k (nDCG@ k ). The recall at k k = (cid:40) , if actually consumed item is in the top- k list , , otherwise . (S1)Additionally, we used the nDCG at k to measure the quality of ranking. Thismetric has a large value when ranking a recommended item at the top rank anda low value otherwise. The definition of nDCG at k is as follows:nDCG@ k = DCG k IDCG k , (S2)DCG k = L (cid:88) l =1 R l − (1 + l ) (S3)where R l and IDCG k are the relevance of the l -th ranked item and the bestpossible ( i.e. ideal) DCG k , respectively. This paper provides the average valueof the evaluation measure of L items in all sequences after five replicates. II.4 Implementation details
For
FaNC , the embedding sizes of the item and unconscious state were both 16,and the dimension of the conscious state was 8. We used the Runge-Kuttamethod-based [15] ODE solver. The mini-batch size was set to 4 because theODE solver requires substantial memory allocation. The sequence length L wasset to 10 for the movie dataset and 5 for the five other datasets. The learningrate was optimized by grid search for each dataset, and it was set to 0.0001 forall datasets. We implemented the proposed model with the torchdyn library [16]written in PyTorch [17]. For GRU4Rec and SASRec, we set the embedding sizeof items to 16 and that of the hidden state to 8 to make the capacity of themodels similar. For all models, we set the maximum number of epochs to 100and applied an early stopping strategy [18] when no improvement in validationloss was observed for ten consecutive epochs. References [1] Cho, K. et al.
Learning phrase representations using rnn encoder–decoderfor statistical machine translation. In
Proceedings of the 2014 Conferenceon Empirical Methods in Natural Language Processing , 1724–1734 (2014).[2] Hu, B., Lu, Z., Li, H. & Chen, Q. Convolutional neural network srchi-tectures for matching natural language sentences. In
Advances in NeuralInformation Processing Systems , 2042–2050 (2014).[3] Hidasi, B., Karatzoglou, A., Baltrunas, L. & Tikk, D. Session-based rec-ommendations with recurrent neural networks. In
Proceedings of 4th In-ternational Conference on Learning Representations (2016).224] Kang, W.-C. & McAuley, J. Self-attentive sequential recommendation. In , 197–206 (2018).[5] Vaswani, A. et al.
Attention is all you need. In
Advances in Neural Infor-mation Processing Systems , 5998–6008 (2017).[6] Harper, F. M. & Konstan, J. A. The movielens datasets: History andcontext.
ACM Transactions on Interactive Intelligent Systems , 1–19(2015).[7] McAuley, J., Targett, C., Shi, Q. & Van Den Hengel, A. Image-basedrecommendations on styles and substitutes. In Proceedings of the 38thInternational ACM SIGIR Conference on Research and Development inInformation Retrieval , 43–52 (2015).[8] Franke, G. R., Huhmann, B. A. & Mothersbaugh, D. L. Information con-tent and consumer readership of print ads: A comparison of search andexperience products.
Journal of the Academy of Marketing Science ,20–31 (2004).[9] Bae, S. & Lee, T. Product type and consumers’ perception of online con-sumer reviews. Electronic Markets , 255–266 (2011).[10] Amat, F., Chandrashekar, A., Jebara, T. & Basilico, J. Artwork per-sonalization at netflix. In Proceedings of the 12th ACM Conference onRecommender Systems , 487–488 (2018).[11] Luan, J., Yao, Z., Zhao, F. & Liu, H. Search product and experienceproduct online reviews: An eye-tracking study on consumers’ review searchbehavior.
Computers in Human Behavior , 420–430 (2016).[12] Yang, J. & Mai, E. S. Experiential goods with network externalities effects:An empirical study of online rating system. Journal of Business Research , 1050–1057 (2010).[13] Rendle, S., Freudenthaler, C., Gantner, Z. & Schmidt-Thieme, L. Bpr:Bayesian personalized ranking from implicit feedback. In Proceedings of the21th Conference on Uncertainty in Artificial Intelligence , 452–461 (2009).[14] He, X. et al.
Neural collaborative filtering. In
Proceedings of the 26thInternational Conference on World Wide Web , 173–182 (2017).[15] Butcher, J. C.
The Numerical Analysis of Ordinary Differential Equations:Runge-Kutta and General Linear Methods (1987).[16] Poli, M., Massaroli, S., Yamashita, A., Asama, H. & Park, J. Torchdyn:A neural differential equations library. arXiv preprint arXiv:2009.09346 (2020). 2317] Paszke, A. et al.
Pytorch: An imperative style, high-performance deeplearning library. In
Advances in Neural Information Processing Systems ,8026–8037 (2019).[18] Yao, Y., Rosasco, L. & Caponnetto, A. On early stopping in gradientdescent learning.
Constructive Approximation26