Predicting Sparse Clients' Actions with CPOPT-Net in the Banking Environment
PPredicting Sparse Clients’ Actions withCPOPT-Net in the Banking Environment
Jeremy Charlier , Radu State , and Jean Hilger University of Luxembourg, L-1855 Luxembourg, Luxembourg { name.surname } @uni.lu BCEE, Avenue de la liberte, L-1930 Luxembourg, Luxembourg [email protected]
Abstract.
The digital revolution of the banking system with evolvingEuropean regulations have pushed the major banking actors to inno-vate by a newly use of their clients’ digital information. Given highlysparse client activities, we propose CPOPT-Net, an algorithm that com-bines the CP canonical tensor decomposition, a multidimensional matrixdecomposition that factorizes a tensor as the sum of rank-one tensors,and neural networks. CPOPT-Net removes efficiently sparse informationwith a gradient-based resolution while relying on neural networks fortime series predictions. Our experiments show that CPOPT-Net is capa-ble to perform accurate predictions of the clients’ actions in the contextof personalized recommendation. CPOPT-Net is the first algorithm touse non-linear conjugate gradient tensor resolution with neural networksto propose predictions of financial activities on a public data set.
Keywords:
Tensor Decomposition · Personalized Recommendation · Neural Networks.
The modern banking environment is experiencing its own digital revolution.Strong regulatory directives are now applicable, especially in Europe with theRevised Payment Directive, PSD2, or with the General Data Protection Regula-tion, GDPR. Consequently, financial actors are now exploring the latest progressin data analytics and machine learning to leverage their clients’ information inthe context of personalized financial recommendation and client’s action predic-tions. Recommender engines usually rely on second order matrix factorizationsince their accuracy has been proved in various publications [1,2,3]. However,matrix factorization are limited to the unique modeling of clients × products .Therefore, tensor factorization have skyrocketed for the past few years [4,5,6].Various tensor factorization, or tensor decomposition, exist for different appli-cations [7,8]. However, the CP decomposition [9,10] is the most frequently used.Two of the most popular resolution algorithms, the Alternating Least Square(ALS) [9,10] and the non-negative ALS [11], offer a relatively simple mathemat-ical framework explaining its success for the new generation of recommender a r X i v : . [ c s . L G ] M a y J. Charlier et al. X true (cid:80) X target NEURAL NETWORK a (1) i a (2) i a (3) i W c NEURAL NETWORK FOR PREDICTIONSREMOVE SPARSE INFORMATION
Fig. 1.
In CPOPT-Net, the function W c between the original tensor X true and thedecomposed tensor X target is minimized. Then, the latent factor vectors a (1) , a (2) , a (3) of each order are sent as input to the neural network. Following the neural networktraining, CPOPT-Net is able to predict the financial activities of the bank’s clients. engines [12,13,14]. In this paper, we use the gradient-based resolution for theCP decomposition [15] to address the predictions of clients’ financial activitiesbased on time, clients’ ID and transactions type. The method, illustrated in fig-ure 1, reduces the sparsity of the information while a neural network performsthe predictions of events. We outline three contributions of our paper: – We use the CP decomposition for separate modeling of each order of the dataset. Since one client can have several financial activities simultaneously, weinclude the independent modeling of clients and financial transactions. – We build upon non-linear conjugate gradient resolution for the CP decompo-sition, CPOPT [15]. We show CPOPT applied on a financial data set leadsto small numerical errors while achieving reasonable computational time. – Finally, we combine CPOPT with neural network leading to CPOPT-Net. Acompressed dense data set, inherited from CP, is used as an optimized inputfor the neural network to predict the financial activities of the clients.The remaining of the paper is organized as follows. Section 2 describes the CPtensor decomposition with its gradient-based resolution applied to third orderfinancial predictions with neural network. Then, we highlight the experimentalresults in section 3 and we conclude by emphasizing pointers to future work.
In the CP tensor decomposition [9,10], the tensor X ∈ R I × I × I × ... × I N is de-scribed as the sum of the rank-one tensors X = R (cid:88) r =1 a (1) r ◦ a (2) r ◦ a (3) r ◦ ... ◦ a ( N ) r (1) redicting Sparse Clients’ Actions in the Banking Environment 3 where a (1) r , a (2) r , a (3) r , ..., a ( N ) r are vectors of size R I , R I , R I , ..., R I N . Each vector a ( n ) r with n ∈ { , , ..., N } refers to one order and one rank of the tensor X . Wepoint out to [7] for further information. We use the Nonlinear Conjugate Gradient(NCG) method proposed in [15], CPOPT, with the strong Wolfe line search asit appears to be more stable in our case. Let X true a real-valued N -order tensorof size I × I × ... × I N . Given R , the objective is to find a factorization X true ≈ X target = R (cid:88) r =1 a (1) r ◦ ... ◦ a ( N ) r (2)with the factors a (1) r , ..., a ( N ) r initially randomized. Therefore, we denote by X target the target tensor composed of the factor vectors a (1) r , ..., a ( N ) r .The objective minimization function is denoted by W c ( X true , X target ). W c ( X true , X target ) = min f ( X true , X target ) = 12 ||X true − X target || (3)The values of the factor vectors can be stacked in a parameter vector x . x = [ a (1)1 · · · a (1) R · · · a ( N )1 · · · a ( N ) R ] T (4)Therefore, we can rewrite the objective function (3) as three summands. W c ( x ) = W c ( X true , X target ) = 12 ||X true || − (cid:104)X true , X target (cid:105) + 12 ||X target || (5)From (5), we deduce the gradient function of the CP decomposition involved inthe minimization process according to the factor vectors a (1)1 , ..., a ( N ) R . We referto [15] for more details about the gradient computation. Therefore, CPOPT-Netachieves a NCG resolution of the objective function W c ( X true , X target ). Sparseinformation contained in X true are removed in the factor vectors a (1) , ..., a ( N ) of X target . Then, the factor vectors are sent as optimized inputs to the neuralnetwork. Through the training of the data set to learn the function g ( . ) : R → R , the neural network is able to predict the financial activities of the bank’sclients. The implementation of CPOPT-Net is summed up in algorithm 1. Data Availability and Experimental Setup
In 2016, the Santander bankreleased an anonymized public dataset containing financial activities from itsclients . The file contains activities of 2.5 millions of clients classified in 22 The data set is available at
J. Charlier et al.
Algorithm 1:
CPOPT-Net for third order financial predictions
Data: tensor X ∈ R I × J × K , rank R Result: time series containing financial activities predictions, y ∈ R /* A = a (1) , B = a (2) , C = a (3) */ begin random initialization A ∈ R I × R , B ∈ R J × R , C ∈ R K × R x ← flatten( A , B , C ) as described in (4) ∇ W c = ∂∂x i W c ( x ) ← gradient of 5 at x α ← argmin α f ( x − α ∇ W c ) x = x − α ∇ W c n = 0 repeat ∇ W cn = ∂∂x i W c ( x n ) ← gradient of 5 at x n β HSn ← ∇ W Tcn ( −∇ W cn + ∇ W cn − ) s Tn − ( −∇ W cn + ∇ W cn − ) s n ← −∇ W cn + β HSn s n − α n ← argmin α f ( x n − α ∇ W cn ) update x n +1 = x n − α n ∇ W cn n = n + 1 until maximum number of iterations or stopping criteria A , B , C ← unflatten( x n ) send A , B , C to the input of the NN training of the NN to learn the function g ( . ) : R → R y ∈ R ← NN prediction of financial activities return y ∈ R transactions labels for a 16 months period between 28 January 2015 and 28April 2016. We choose the 200 clients having the most frequent financial activi-ties since regular activities are more interesting for the prediction modeling. Allthe information is gathered in the tensor X true of size 200 × ×
16. We define thetensor rank equal to 25. We use the Adam solver with the default parameters β = 0 . , β = 0 .
999 for the training of the neural network . Results and Discussions on CPOPT-Net
We test CPOPT-Net using threedifferent type of neural networks: Multi-Layer Perceptron (MLP), ConvolutionalNeural Network (CNN) and Long-Short Term Memory (LSTM) network. Addi-tionally, we cross-validate the performance of the neural networks with a DecisionTree (DT). The models have been trained on one year period from 28 January2015 until 28 January 2016. Then, the activities for the next three months arepredicted with a rolling time window of one month. First, the table 1 highlightthe lower numerical error obtained with the CPOPT resolution in comparisonto the ALS resolution. Then, the figure 2 shows that the LSTM models themost accurately the future personal savings activities followed by the MLP, theDT, and finally the CNN. The CNN fails visually to predict accurately the sav-ings activity in comparison to the other three methods, while the LSTM seemsto achieve the most accurate predictions. We highlight this preliminary conclu-sion for figure 2 in table 2 by reporting four metrics: the Mean Absolute Error The code is available at https://github.com/dagrate/cpoptnet.redicting Sparse Clients’ Actions in the Banking Environment 5
Table 1.
Residual errors ofthe objective function W c between CPOPT-Net reso-lution and ALS resolutionat convergence (the smaller,the better). Both methodshave similar computationtime. CPOPT-Net CP-ALS W c Error P r e d i c t i o n s V a l u e s ExpectedDTMLPCNNLSTM
Fig. 2.
Three months prediction of the evolution ofthe personal savings of one client. We can observe thedifference of CPOPT-Net depending on the neural net-work chosen for the predictions.
Table 2.
Latent predictions errors onpersonal savings. LSTM achieves supe-rior performance.
Error Measure DT MLP CNN LSTMMAE 0.044 0.004 0.282
Jaccard dist. 0.053 0.027 0.348 cosine sim. 0.967 0.953 0.966
RMSE 0.047 0.031 0.354
Table 3.
Aggregated predictions errorson all transactions. LSTM achieves supe-rior performance.
Error Measure DT MLP CNN LSTMMAE 0.029 0.027 0.272
Jaccard dist. 0.034 0.032 0.290 cosine sim. 0.827 0.909 0.880
RMSE 0.033 0.030 0.290 (MAE), the Jaccard distance, the cosine similarity and the Root Mean SquareError (RMSE). In table 3, we show the aggregated metrics among all transactionpredictions. In all the experiments, the LSTM network predicts the activities themost accurately, followed by the MLP, the DT and the CNN.
Building upon the CP tensor decomposition, the non-linear conjugate gradi-ent resolution and the neural networks, we propose CPOPT-Net, a predictivemethod for the banking industry in which the sparsity of the financial transac-tions is removed before performing the predictions on future clients’ transactions.We conducted experiments on a public data set highlighting the prediction dif-ferences depending on the neural network involved in CPOPT-Net. Due to therecurrent activities of most of the financial transactions, we underlined the bestresults were found when CPOPT-Net was used with LSTM. Future work willconcentrate on a limited memory resolution for a usage on very large data sets.
J. Charlier et al.
Furthermore, the personal financial recommendation will be assessed on smallertime frame discretization, weekly or daily, with other financial transactions. Itwill offer a larger choice of financial product recommendations depending on theclients’ mid-term and long-term interests.
References
1. Brand, M.: Fast online svd revisions for lightweight recommender systems. In:Proceedings of the 2003 SIAM International Conference on Data Mining. pp. 37–46. SIAM (2003)2. Ghazanfar, M.A., Prugel, A.: The advantage of careful imputation sources in sparsedata-environment of recommender systems: Generating improved svd-based recom-mendations. Informatica (1) (2013)3. kumar Bokde, D., Girase, S., Mukhopadhyay, D.: Role of matrix factorizationmodel in collaborative filtering algorithm: A survey. CoRR, abs/1503.07475 (2015)4. Lian, D., Zhang, Z., Ge, Y., Zhang, F., Yuan, N.J., Xie, X.: Regularized content-aware tensor factorization meets temporal-aware location recommendation. In:Data Mining (ICDM), 2016 IEEE 16th International Conference on. pp. 1029–1034. IEEE (2016)5. Zhao, S., Lyu, M.R., King, I.: Aggregated temporal tensor factorization model forpoint-of-interest recommendation. In: International Conference on Neural Infor-mation Processing. pp. 450–458. Springer (2016)6. Song, T., Peng, Z., Wang, S., Fu, W., Hong, X., Philip, S.Y.: Based cross-domainrecommendation through joint tensor factorization. In: International Conferenceon Database Systems for Advanced Applications. pp. 525–540. Springer (2017)7. Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM review (3) (2009)8. Acar, E., Kolda, T.G., Dunlavy, D.M.: All-at-once optimization for coupled matrixand tensor factorizations. arXiv preprint arXiv:1105.3422 (2011)9. Harshman, R.A.: Foundations of the parafac procedure: Models and conditions foran explanatory multimodal factor analysis (1970)10. Carroll, J.D., Chang, J.J.: Analysis of individual differences in multidimensionalscaling via an n-way generalization of eckart-young decomposition. Psychometrika (3) (1970)11. Welling, M., Weber, M.: Positive tensor factorization. Pattern Recognition Letters (12), 1255–1261 (2001)12. Ge, H., Caverlee, J., Lu, H.: Taper: A contextual tensor-based approach for per-sonalized expert recommendation. In: Proceedings of the 10th ACM Conferenceon Recommender Systems. pp. 261–268. ACM (2016)13. Almutairi, F.M., Sidiropoulos, N.D., Karypis, G.: Context-aware recommendation-based learning analytics using tensor and coupled matrix factorization. IEEE Jour-nal of Selected Topics in Signal Processing (5), 729–741 (2017)14. Cai, G., Gu, W.: Heterogeneous context-aware recommendation algorithm withsemi-supervised tensor factorization. In: International Conference on IntelligentData Engineering and Automated Learning. pp. 232–241. Springer (2017)15. Acar, E., Dunlavy, D.M., Kolda, T.G.: A scalable optimization approach for fittingcanonical tensor decompositions. Journal of Chemometrics25