[PDF] Predicting Sparse Clients' Actions with CPOPT-Net in the Banking Environment

Abstract

The digital revolution of the banking system with evolving European regulations have pushed the major banking actors to innovate by a newly use of their clients' digital information. Given highly sparse client activities, we propose CPOPT-Net, an algorithm that combines the CP canonical tensor decomposition, a multidimensional matrix decomposition that factorizes a tensor as the sum of rank-one tensors, and neural networks. CPOPT-Net removes efficiently sparse information with a gradient-based resolution while relying on neural networks for time series predictions. Our experiments show that CPOPT-Net is capable to perform accurate predictions of the clients' actions in the context of personalized recommendation. CPOPT-Net is the first algorithm to use non-linear conjugate gradient tensor resolution with neural networks to propose predictions of financial activities on a public data set.

Full PDF

PPredicting Sparse Clients’ Actions withCPOPT-Net in the Banking Environment

Jeremy Charlier , Radu State , and Jean Hilger University of Luxembourg, L-1855 Luxembourg, Luxembourg { name.surname } @uni.lu BCEE, Avenue de la liberte, L-1930 Luxembourg, Luxembourg [email protected]

Abstract.

The digital revolution of the banking system with evolvingEuropean regulations have pushed the major banking actors to inno-vate by a newly use of their clients’ digital information. Given highlysparse client activities, we propose CPOPT-Net, an algorithm that com-bines the CP canonical tensor decomposition, a multidimensional matrixdecomposition that factorizes a tensor as the sum of rank-one tensors,and neural networks. CPOPT-Net removes eﬃciently sparse informationwith a gradient-based resolution while relying on neural networks fortime series predictions. Our experiments show that CPOPT-Net is capa-ble to perform accurate predictions of the clients’ actions in the contextof personalized recommendation. CPOPT-Net is the ﬁrst algorithm touse non-linear conjugate gradient tensor resolution with neural networksto propose predictions of ﬁnancial activities on a public data set.

Keywords:

Tensor Decomposition · Personalized Recommendation · Neural Networks.

The modern banking environment is experiencing its own digital revolution.Strong regulatory directives are now applicable, especially in Europe with theRevised Payment Directive, PSD2, or with the General Data Protection Regula-tion, GDPR. Consequently, ﬁnancial actors are now exploring the latest progressin data analytics and machine learning to leverage their clients’ information inthe context of personalized ﬁnancial recommendation and client’s action predic-tions. Recommender engines usually rely on second order matrix factorizationsince their accuracy has been proved in various publications [1,2,3]. However,matrix factorization are limited to the unique modeling of clients × products .Therefore, tensor factorization have skyrocketed for the past few years [4,5,6].Various tensor factorization, or tensor decomposition, exist for diﬀerent appli-cations [7,8]. However, the CP decomposition [9,10] is the most frequently used.Two of the most popular resolution algorithms, the Alternating Least Square(ALS) [9,10] and the non-negative ALS [11], oﬀer a relatively simple mathemat-ical framework explaining its success for the new generation of recommender a r X i v : . [ c s . L G ] M a y J. Charlier et al. X true (cid:80) X target NEURAL NETWORK a (1) i a (2) i a (3) i W c NEURAL NETWORK FOR PREDICTIONSREMOVE SPARSE INFORMATION

Fig. 1.

In CPOPT-Net, the function W c between the original tensor X true and thedecomposed tensor X target is minimized. Then, the latent factor vectors a (1) , a (2) , a (3) of each order are sent as input to the neural network. Following the neural networktraining, CPOPT-Net is able to predict the ﬁnancial activities of the bank’s clients. engines [12,13,14]. In this paper, we use the gradient-based resolution for theCP decomposition [15] to address the predictions of clients’ ﬁnancial activitiesbased on time, clients’ ID and transactions type. The method, illustrated in ﬁg-ure 1, reduces the sparsity of the information while a neural network performsthe predictions of events. We outline three contributions of our paper: – We use the CP decomposition for separate modeling of each order of the dataset. Since one client can have several ﬁnancial activities simultaneously, weinclude the independent modeling of clients and ﬁnancial transactions. – We build upon non-linear conjugate gradient resolution for the CP decompo-sition, CPOPT [15]. We show CPOPT applied on a ﬁnancial data set leadsto small numerical errors while achieving reasonable computational time. – Finally, we combine CPOPT with neural network leading to CPOPT-Net. Acompressed dense data set, inherited from CP, is used as an optimized inputfor the neural network to predict the ﬁnancial activities of the clients.The remaining of the paper is organized as follows. Section 2 describes the CPtensor decomposition with its gradient-based resolution applied to third orderﬁnancial predictions with neural network. Then, we highlight the experimentalresults in section 3 and we conclude by emphasizing pointers to future work.

In the CP tensor decomposition [9,10], the tensor X ∈ R I × I × I × ... × I N is de-scribed as the sum of the rank-one tensors X = R (cid:88) r =1 a (1) r ◦ a (2) r ◦ a (3) r ◦ ... ◦ a ( N ) r (1) redicting Sparse Clients’ Actions in the Banking Environment 3 where a (1) r , a (2) r , a (3) r , ..., a ( N ) r are vectors of size R I , R I , R I , ..., R I N . Each vector a ( n ) r with n ∈ { , , ..., N } refers to one order and one rank of the tensor X . Wepoint out to [7] for further information. We use the Nonlinear Conjugate Gradient(NCG) method proposed in [15], CPOPT, with the strong Wolfe line search asit appears to be more stable in our case. Let X true a real-valued N -order tensorof size I × I × ... × I N . Given R , the objective is to ﬁnd a factorization X true ≈ X target = R (cid:88) r =1 a (1) r ◦ ... ◦ a ( N ) r (2)with the factors a (1) r , ..., a ( N ) r initially randomized. Therefore, we denote by X target the target tensor composed of the factor vectors a (1) r , ..., a ( N ) r .The objective minimization function is denoted by W c ( X true , X target ). W c ( X true , X target ) = min f ( X true , X target ) = 12 ||X true − X target || (3)The values of the factor vectors can be stacked in a parameter vector x . x = [ a (1)1 · · · a (1) R · · · a ( N )1 · · · a ( N ) R ] T (4)Therefore, we can rewrite the objective function (3) as three summands. W c ( x ) = W c ( X true , X target ) = 12 ||X true || − (cid:104)X true , X target (cid:105) + 12 ||X target || (5)From (5), we deduce the gradient function of the CP decomposition involved inthe minimization process according to the factor vectors a (1)1 , ..., a ( N ) R . We referto [15] for more details about the gradient computation. Therefore, CPOPT-Netachieves a NCG resolution of the objective function W c ( X true , X target ). Sparseinformation contained in X true are removed in the factor vectors a (1) , ..., a ( N ) of X target . Then, the factor vectors are sent as optimized inputs to the neuralnetwork. Through the training of the data set to learn the function g ( . ) : R → R , the neural network is able to predict the ﬁnancial activities of the bank’sclients. The implementation of CPOPT-Net is summed up in algorithm 1. Data Availability and Experimental Setup

In 2016, the Santander bankreleased an anonymized public dataset containing ﬁnancial activities from itsclients . The ﬁle contains activities of 2.5 millions of clients classiﬁed in 22 The data set is available at

J. Charlier et al.

Algorithm 1:

CPOPT-Net for third order ﬁnancial predictions

Data: tensor X ∈ R I × J × K , rank R Result: time series containing ﬁnancial activities predictions, y ∈ R /* A = a (1) , B = a (2) , C = a (3) */ begin random initialization A ∈ R I × R , B ∈ R J × R , C ∈ R K × R x ← ﬂatten( A , B , C ) as described in (4) ∇ W c = ∂∂x i W c ( x ) ← gradient of 5 at x α ← argmin α f ( x − α ∇ W c ) x = x − α ∇ W c n = 0 repeat ∇ W cn = ∂∂x i W c ( x n ) ← gradient of 5 at x n β HSn ← ∇ W Tcn ( −∇ W cn + ∇ W cn − ) s Tn − ( −∇ W cn + ∇ W cn − ) s n ← −∇ W cn + β HSn s n − α n ← argmin α f ( x n − α ∇ W cn ) update x n +1 = x n − α n ∇ W cn n = n + 1 until maximum number of iterations or stopping criteria A , B , C ← unﬂatten( x n ) send A , B , C to the input of the NN training of the NN to learn the function g ( . ) : R → R y ∈ R ← NN prediction of ﬁnancial activities return y ∈ R transactions labels for a 16 months period between 28 January 2015 and 28April 2016. We choose the 200 clients having the most frequent ﬁnancial activi-ties since regular activities are more interesting for the prediction modeling. Allthe information is gathered in the tensor X true of size 200 × ×

16. We deﬁne thetensor rank equal to 25. We use the Adam solver with the default parameters β = 0 . , β = 0 .

999 for the training of the neural network . Results and Discussions on CPOPT-Net

We test CPOPT-Net using threediﬀerent type of neural networks: Multi-Layer Perceptron (MLP), ConvolutionalNeural Network (CNN) and Long-Short Term Memory (LSTM) network. Addi-tionally, we cross-validate the performance of the neural networks with a DecisionTree (DT). The models have been trained on one year period from 28 January2015 until 28 January 2016. Then, the activities for the next three months arepredicted with a rolling time window of one month. First, the table 1 highlightthe lower numerical error obtained with the CPOPT resolution in comparisonto the ALS resolution. Then, the ﬁgure 2 shows that the LSTM models themost accurately the future personal savings activities followed by the MLP, theDT, and ﬁnally the CNN. The CNN fails visually to predict accurately the sav-ings activity in comparison to the other three methods, while the LSTM seemsto achieve the most accurate predictions. We highlight this preliminary conclu-sion for ﬁgure 2 in table 2 by reporting four metrics: the Mean Absolute Error The code is available at https://github.com/dagrate/cpoptnet.redicting Sparse Clients’ Actions in the Banking Environment 5

Table 1.

Residual errors ofthe objective function W c between CPOPT-Net reso-lution and ALS resolutionat convergence (the smaller,the better). Both methodshave similar computationtime. CPOPT-Net CP-ALS W c Error P r e d i c t i o n s V a l u e s ExpectedDTMLPCNNLSTM

Fig. 2.

Three months prediction of the evolution ofthe personal savings of one client. We can observe thediﬀerence of CPOPT-Net depending on the neural net-work chosen for the predictions.

Table 2.

Latent predictions errors onpersonal savings. LSTM achieves supe-rior performance.

Error Measure DT MLP CNN LSTMMAE 0.044 0.004 0.282

Jaccard dist. 0.053 0.027 0.348 cosine sim. 0.967 0.953 0.966

RMSE 0.047 0.031 0.354

Table 3.

Aggregated predictions errorson all transactions. LSTM achieves supe-rior performance.

Error Measure DT MLP CNN LSTMMAE 0.029 0.027 0.272

Jaccard dist. 0.034 0.032 0.290 cosine sim. 0.827 0.909 0.880

RMSE 0.033 0.030 0.290 (MAE), the Jaccard distance, the cosine similarity and the Root Mean SquareError (RMSE). In table 3, we show the aggregated metrics among all transactionpredictions. In all the experiments, the LSTM network predicts the activities themost accurately, followed by the MLP, the DT and the CNN.

Building upon the CP tensor decomposition, the non-linear conjugate gradi-ent resolution and the neural networks, we propose CPOPT-Net, a predictivemethod for the banking industry in which the sparsity of the ﬁnancial transac-tions is removed before performing the predictions on future clients’ transactions.We conducted experiments on a public data set highlighting the prediction dif-ferences depending on the neural network involved in CPOPT-Net. Due to therecurrent activities of most of the ﬁnancial transactions, we underlined the bestresults were found when CPOPT-Net was used with LSTM. Future work willconcentrate on a limited memory resolution for a usage on very large data sets.

J. Charlier et al.

Furthermore, the personal ﬁnancial recommendation will be assessed on smallertime frame discretization, weekly or daily, with other ﬁnancial transactions. Itwill oﬀer a larger choice of ﬁnancial product recommendations depending on theclients’ mid-term and long-term interests.

References

1. Brand, M.: Fast online svd revisions for lightweight recommender systems. In:Proceedings of the 2003 SIAM International Conference on Data Mining. pp. 37–46. SIAM (2003)2. Ghazanfar, M.A., Prugel, A.: The advantage of careful imputation sources in sparsedata-environment of recommender systems: Generating improved svd-based recom-mendations. Informatica (1) (2013)3. kumar Bokde, D., Girase, S., Mukhopadhyay, D.: Role of matrix factorizationmodel in collaborative ﬁltering algorithm: A survey. CoRR, abs/1503.07475 (2015)4. Lian, D., Zhang, Z., Ge, Y., Zhang, F., Yuan, N.J., Xie, X.: Regularized content-aware tensor factorization meets temporal-aware location recommendation. In:Data Mining (ICDM), 2016 IEEE 16th International Conference on. pp. 1029–1034. IEEE (2016)5. Zhao, S., Lyu, M.R., King, I.: Aggregated temporal tensor factorization model forpoint-of-interest recommendation. In: International Conference on Neural Infor-mation Processing. pp. 450–458. Springer (2016)6. Song, T., Peng, Z., Wang, S., Fu, W., Hong, X., Philip, S.Y.: Based cross-domainrecommendation through joint tensor factorization. In: International Conferenceon Database Systems for Advanced Applications. pp. 525–540. Springer (2017)7. Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM review (3) (2009)8. Acar, E., Kolda, T.G., Dunlavy, D.M.: All-at-once optimization for coupled matrixand tensor factorizations. arXiv preprint arXiv:1105.3422 (2011)9. Harshman, R.A.: Foundations of the parafac procedure: Models and conditions foran explanatory multimodal factor analysis (1970)10. Carroll, J.D., Chang, J.J.: Analysis of individual diﬀerences in multidimensionalscaling via an n-way generalization of eckart-young decomposition. Psychometrika (3) (1970)11. Welling, M., Weber, M.: Positive tensor factorization. Pattern Recognition Letters (12), 1255–1261 (2001)12. Ge, H., Caverlee, J., Lu, H.: Taper: A contextual tensor-based approach for per-sonalized expert recommendation. In: Proceedings of the 10th ACM Conferenceon Recommender Systems. pp. 261–268. ACM (2016)13. Almutairi, F.M., Sidiropoulos, N.D., Karypis, G.: Context-aware recommendation-based learning analytics using tensor and coupled matrix factorization. IEEE Jour-nal of Selected Topics in Signal Processing (5), 729–741 (2017)14. Cai, G., Gu, W.: Heterogeneous context-aware recommendation algorithm withsemi-supervised tensor factorization. In: International Conference on IntelligentData Engineering and Automated Learning. pp. 232–241. Springer (2017)15. Acar, E., Dunlavy, D.M., Kolda, T.G.: A scalable optimization approach for ﬁttingcanonical tensor decompositions. Journal of Chemometrics25