Restricted Boltzmann Machine Assignment Algorithm: Application to solve many-to-one matching problems on weighted bipartite graph
RR ESTRICTED B OLTZMANN M ACHINE A SSIGNMENT A LGORITHM :A PPLICATION TO SOLVE MANY - TO - ONE MATCHING PROBLEMSON WEIGHTED BIPARTITE GRAPH
A P
REPRINT
Francesco Curia
Department of Statistical ScienceSapienza, University of RomeRome, 00185 Italy [email protected]
May 3, 2019 A BSTRACT
In this work an iterative algorithm based on unsupervised learning is presented, specifically on aRestricted Boltzmann Machine (RBM) to solve a perfect matching problem on a bipartite weightedgraph. Iteratively is calculated the weights w ij and the bias parameters θ = ( a i , b j ) that maximize theenergy function and assignment element i to element j . An application of real problem is presentedto show the potentiality of this algorithm. K eywords Optimization · Combinatorial · Matching · Assignment Problems · Neural Networks · Unsupervisedlearning · Restricted Boltzmann Machine
The assignment problems fall within the combinatorial optimization problems, the problem of matching on a bipartiteweighted graph is one of the major problems faced in this compound. Numerous resolution methods and algorithmshave been proposed in recent times and many have provided important results, among them for example we find:Constructive heuristics, Meta-heuristics, Approximation algorithms, Iper-heuristics, and other methods. Combinatorialoptimization deals with finding the optimal solution between the collection of finite possibilities. The finished setof possible solutions. The heart of the problem of finding solutions in combinatorial optimization is based on theefficient algorithms that present with a polynomial computation time in the input dimension. Therefore, when dealingwith certain combinatorial optimization problems one must ask with what speed it is possible to find the solutionsor the optimal problem solution and if it is not possible to find a resolution method of this type, which approximatemethods can be used in polynomial computational times that lead to stable explanations. Solve this kind of problem inpolynomial time o ( n ) has long been the focus of research in this area until Edmonds [1] developed one of the mostefficient methods. Over time other algorithms have been developed, for example the fastest of them is the Micali eVazirani algorithm [2], Blum [3] and Gabow and Tarjan [4]. The first of these methods is an improvement on that ofEdmonds, the other algorithms use different logics, but all of them with computational time equal to o ( m √ n ) . Theproblem is fundamentally the following: we imagine a situation in which respect for the characteristics detected on agiven phenomenon is to be assigned between elements of two sets, as for example in one of the most known problemssuch as the tasks and the workers to be assigned to them. A classical maximum cardinality matching algorithm to takethe maximum weight range and assign it, in a decision support system, through the domain expert this could also beacceptable, but in a totally automatic system like a system could be of artificial intelligence that puts together pairs ofelements on the basis of some characteristics, this way would not be very reliable, totally removing the user’s control.Another problem related to this kind of situation is that of features. Let’s take as an example a classic problem of a r X i v : . [ m a t h . O C ] M a y estricted Boltzmann Machine Assignment Algorithm:Application to solve many-to-one matching problems on weighted bipartite graph A PREPRINT flight-gate assignment in an airport, on the basis of the history we could have information about the flight, the gates andthe time, the flight number and maybe the airline. Little information that even through the best of feature engineringwould lead to a model of machine learning, specifically of classification, very poor in information. Treating the sameproblem with classical optimization, as done so far, would lead to solving it with a perfect matching of maximumweight, and we would return to the beginning.
Matching problems are among the fundamental problems in combinatorial optimization. In this set of notes, we focuson the case when the underlying graph is bipartite. We start by introducing some basic graph terminology. A graph G = ( V, E ) consists of a set V = A ∪ B of vertices and a set E of pairs of vertices called edges. For an edge e = ( u, v ) ,we say that the endpoints of e are u and v ; we also say that e is incident to u and v . A graph G = ( V, E ) is bipartite ifthe vertex set V can be partitioned into two sets A and B (the bipartition) such that no edge in E has both endpoints inthe same set of the bipartition. A matching M is a collection of edges such that every vertex of V is incident to at mostone edge of M . If a vertex v has no edge of M incident to it then v is said to be exposed (or unmatched). A matchingis perfect if no vertex is exposed; in other words, a matching is perfect if its cardinality is equal to | A | = | B | . In theliterature several examples of the real world have been treated such as the assignment of children to certain schools [5],or as donors and patients [6] and workers at companies [7]. The problem of the weighted bipartite matching finds thefeasible match with the maximum available weight. This problem was developed in several areas, such as in the workof [8] about protein and structure alignment, or within the computer vision as documented in the work of [9] or as in thepaper by [10] in which the similarity of the texts is estimated. Other jobs have faced this problem in the classification[11],[12] e [13], but not for many to one correspondence. The mathematical formulation can be solved by presenting itas a linear program. Each edge ( i, j ) , where i is in A and j is in B , has a weight w ij . For each edge ( i, j ) we have adecision variable x ij = (cid:26) if the edge is contained in the matching otherwise (1)and x ij ∈ Z for i, j ∈ A, B , and we have the following LP:max x ij (cid:88) ( i,j ) ∈ A × B w ij x ij (2) (cid:88) j ∈ B x ij = 1 for i ∈ A (3) (cid:88) i ∈ A x ij = 1 for j ∈ B (4) ≤ x ij ≤ for i, j ∈ A, B (5) x ij ∈ Z for i, j ∈ A, B (6)
The problem in a weighted bipartite graph G = ( V = A ∪ B, W ) , is when we have different weights (historical data) W = ( w , w ..., w ij ) for the set of nodes to which corresponds the same set of nodes B . One of most popularsolution is Hungarian algorithm [16] . The assignment rule therefore in many real cases could be misleading andlimiting as well as it could be unrealistic as a solution. Machine learning (ML) algorithms are increasingly gainingground in applied sciences such as engineering, biology, medicine, etc. both supervised and unsupervised learningmodels. The matching problem in this case can be seen as a set of inputs x , ..., x k (in our case the nodes of the setA) and a set of ouput y , ..., y k (the respective nodes of the set B ), weighed by a series of weights w , ...w ij , whichinevitably recall the structure of a classic neural network. The problem is that in this case there would be a number ofclasses (in the case of assignment) equal to the number of inputs. Considering it as a classic machine learning problem,the difficulty would lie in the features and their engineering, on the one hand, while on the other the number of classes2estricted Boltzmann Machine Assignment Algorithm:Application to solve many-to-one matching problems on weighted bipartite graph A PREPRINT
Figure 1: Bipartite Weighted Matchingto predict (assign) would be very large. For example, if we think about matching applicants and jobs, if we only had thename of a candidate for the job, we would have very little info to build a robust machine learning model, and even agood features engineering would not lead to much, but having available other information on the candidate they couldbe extracted it is used case never as "weight" to build a neural network, but even in this case the constraint of a classicoptimization model solved with ML techniques would not be maintained, let’s say we would "force" a little hand. Whilewhat we want to present in this work is the resolution of a classical problem of matching (assignment) through theapplication of a ML model, in this case of a neural network, which as already said maintains the mathematical structureof a node (input) and arc (weight) but instead of considering the output of arrival (the set B ) as classification label(assignment) in this case we consider an unsupervised neural network, specifically an Restricted Boltzmann Machine. The contributions of this work are mainly of two types: the first is related to the ability to use an unsupervised machinelearning model to solve a classical optimization problem which in turn has the mathematical structure of a neuralnetwork based on two layers, in our case of RBM a visible and a hidden one. In this case the nodes of the set B becomethe variables of the model and the number of times that the node i has been assigned to the node j (for example inproblems that concerns historical data analysis), becomes the weight w ij which at its turn becomes the value of thevariable i -th in the RBM model. The second is the ability to solve real problems, as we will see later in the article, inwhich it is necessary to carry out a matching between elements of two sets and the maximum weight span is not said tobe the best assignment, especially in the case where the problem is many-to-one, like many real problems. Restricted Boltzmann Machine is an unsupervised method neural networks based [14]; the algorithm learns one layer ofhidden features. When the number of hidden units is smaller than that of visual units, the hidden layer can deal withnonlinear complex dependency and structure of data, capture deep relationship from input data , and represent the inputdata more compactly. Assuming there are c visible units and m hidden units in an Restricted Boltzmann Machine. So v i for i = 1 , ..., c indicates the state of the i − th visible unit, where v i = (cid:26) if the i-th term is annotated to the element otherwise (7)for i = 1 , ..., c and furthemore we have h j = (cid:26) the state of hidden unit is active otherwise (8)3estricted Boltzmann Machine Assignment Algorithm:Application to solve many-to-one matching problems on weighted bipartite graph A PREPRINT for j = 1 , ..., m and w ij is the weight associated with the connection between v i and h j and define also the jointconfiguration ( v, h ) . Figure 2: Restricted Boltzamann Machine StructuresThe energy function that capturing the interaction patterns between visible layer and hidden layer is define as follow: E ( v, h | θ ) = − c (cid:88) i =1 a i v i − m (cid:88) j =1 b j h j − c (cid:88) i =1 m (cid:88) j =1 v i h j w ij (9)where θ = ( w ij , a i , b j ) are parameters of the model: a i and b j are biases for the visible and hidden variables,respectively. The parameters w ij is the weights of connection between visible variables and hidden variables. The jointprobability is represented by the follow quantity: p ( v, h ) = e − E ( v, h ) Z (10)where Z = (cid:88) v,h e − E ( v,h ) is a normalization constant and the conditional distributions over the visible and hidden units are given by sigmoidfunctions as follows: p ( v i = 1 | h ) = σ m (cid:88) j =1 w ij h j + a i (11) p ( h j = 1 | v ) = σ (cid:32) c (cid:88) i =1 w ij v i + b j (cid:33) (12)where σ = e − x RBM are trained to optimizie the product of probabilities assigned to some training set V (a matrix, each row of whichis treated as a visible vector v) arg max w ij c (cid:89) i =1 p ( v i ) (13)The RBM training takes place through the Contrastive Divergence Algorithm (see Hinton 2002 [15]). For the (4) and(7) we can pass to log-likelihood formulation L v = log (cid:88) h e − E ( v,h ) − log (cid:88) v,h e − E ( v,h ) (14)4estricted Boltzmann Machine Assignment Algorithm:Application to solve many-to-one matching problems on weighted bipartite graph A PREPRINT and derivate the quantity ∂L v ∂w ij = (cid:88) h p ( h | v ) · v i h j − (cid:88) v,h p ( v, h ) · v i h j (15) ∂L v ∂w ij = E [ p ( h | v )] − E [ p ( v, h )] (16)In the above expression, the first quantity represents the expectation of v i · h j to the conditional probability of thehidden states given the visible states and the second term represents the expectation of v i · h j to the joint probability ofthe visible and hidden states. In order to maximize the (7), which involves log of summation, there is no analyticalsolution and will use the stochastic gradient ascent technique. In order to compute the unknown values of the weights w ij such that they maximize the above likelihood, we compute using gradient ascent : w k +1 ij = w kij + α · ∂L v ∂w ij where α ∈ (0 , is the learning rate. At this formulation we can add an term of penality and obtain the follow w k +1 ij = w kij + α · ∂L v ∂w ij + η · w k − ij w kij (17)where η > is penalty parameters and η · w k − ij w kij measure the contribute of weights variation in the k -th step of update. In this section is presented the algorithm RBM based and is explaneid the single steps.1. In this step the algorithm takes as input the matrix W relative to the weights of the assignments, either theweight w ij represents the number of times that the element i has been assigned to the element j
2. The matrix W is binarized. For each element w ij > ˜ W is created with elements0-1.3. In this step the RBM is applied taking as input the binary matrix ˜ W . The probability product related to thevisible units of the RBM is maximized (13). The weights w ij are updated according to (17) and the biasesaccording to the RBM training rule. Once these updated values are obtained, the optimized value ˆ p ( v i ) if thisis greater than (cid:15) > α = (cid:15) .4. The output of the algorithm is a matrix with values 0-1 and on each line we obtain a single value equal to 1which is equivalent to the assignment of the element i to the element j .The pseudocode is presented in the next section. Now we proceed to provide the results of the application of the algorithm (see Appendix). The problem instance is 351elements for the set A and 35 for the set B ; the goal is to assign for each element of A , one and only one element of B , so as to have the sum for row equal to 1 as in (3) and in step 4 of the algorithm. The weights w ij are representedby the number of times the element a i ∈ A has been assigned to the element b j ∈ B , based on a set of historicaldata that we have relative to flight-gate assignments of a well-known international airport. The difficulty is the onediscussed in the first part of the work, in which we want to obtain a robust machine learning algorithm that classifiesand assigns the respective gate for each flight. Starting from the available features, the algorithm presented in the workwas implemented. The computational results are very interesting in terms of calculation speed and assignment.5estricted Boltzmann Machine Assignment Algorithm:Application to solve many-to-one matching problems on weighted bipartite graph A PREPRINT
Algorithm 1:
Restricted Boltzmann Assignment Input : (cid:15) , threshold value b j , hidden units bias value a i , visible units bias value α , learning rate for visible bias updating β , learning rate for hidden bias updatingMatrix n × m , W = { w ij } number of times which i has been assigned to j Binarize W to get ˙ W = { ˙ w ij } ∈ [0 , foreach i in W doif w ij > then else while (cid:80) ci =1 ˆ w ij (cid:54) = do arg max ˙ w ij (cid:81) ci =1 p ( v i ) Update w ij , w k +1 ij = w kij + α · ∂L v ∂w ij Update b j , b k +1 j = b kj + β · ∂L v ∂b j Update a i , a k +1 i = a ki + α · ∂L v ∂a i if ˆ p ( v i ) > (cid:15) then else Output: Matrix n × m , ˆ P = { , } such that (cid:80) i ˆ p ij = 1 , ∀ i ∈ A This can be the starting point for more precise, fast and sophisticated algorithms, which combine the combinatorialoptimization with machine learning but on the basis of unsupervised learning and not only on the optimization of costfunctions.
References [1] J. Edmonds. "Paths, trees and flowers". Canadian Journal of Mathematics, 17: 449-467, 1965[2] S. Micali and V. V. Vazirani. "An O ( (cid:112) | V | · | E | ) algorithm for finding maximum matching in general graphs". InProceedings of the twenty First annual IEEE Symposium on Foundations of Computer Science,1980.[3] N. Blum. "A new approach to maximum matching in general graphs". In Proc. 17th ICALP, volume 443 of LectureNotes in Computer Science, pages 586 - 597. Springer-Verlag, 1990.[4] H. N. Gabow and R. E. Tarjan. "Faster scaling algorithms for general graph matching problems". J. ACM, 38(4):815- 853, 1991. 6estricted Boltzmann Machine Assignment Algorithm:Application to solve many-to-one matching problems on weighted bipartite graph A PREPRINT [5] Ryoji Kurata, Masahiro Goto, Atsushi Iwasaki, and Makoto Yokoo. "Controlled school choice with soft bounds andoverlapping types". In AAAI Conference on Artificial Intelligence (AAAI), 2015.[6] Dimitris Bertsimas, Vivek F Farias, and Nikolaos Trichakis. "Fairness, efficiency, and flexibility in organ allocationfor kidney transplantation". Operations Research, 61(1):73–87, 2013.[7] John Joseph Horton. "The effects of algorithmic labor market recommendations: evidence from a fieldexperiment",To appear, Journal of Labor Economics, 2017.[8] E Krissinel and K Henrick. "Secondary-structure matching (ssm), a new tool for fast protein structure alignment inthree dimensions". Acta Crystallographica Section D: Biological Crystallography, 60(12):2256–2268, 2004.[9] Serge Belongie, Jitendra Malik, and Jan Puzicha. "Shape matching and object recognition using shape contexts".IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(4):509–522, 2002.[10] Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng. "Text matching as imagerecognition". In AAAI Conference on Artificial Intelligence (AAAI), 2016.[11] Gediminas Adomavicius and YoungOk Kwon." Improving aggregate recommendation diversity using ranking-based techniques". IEEE Transactions on Knowledge and Data Engineering (TKDE), 24(5):896–911, 2012.[12] Chaofeng Sha, Xiaowei Wu, and Junyu Niu. "A framework for recommending relevant and diverse items". InProceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2016.[13] Azin Ashkan, Branislav Kveton, Shlomo Berkovsky, and Zheng Wen. "Optimal greedy diversity for recommenda-tion". In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 1742–1748, 2015.[14] A. Fischer and C. Igel, "An Introduction to Restricted Boltzmann Machines, in Progress in Pattern Recognition,Image Analysis, Computer Vision, and Applications", vol. 7441 of Lecture Notes in Computer Science, pp. 14–36,Springer Berlin Heidelberg, Berlin, Heidelberg, 2012.[15] Hinton, G. E. "A Practical Guide to Training Restricted Boltzmann Machines". Technical Report, Department ofComputer Science, University of Toronto, 2010.[16] H.W. Kuhn, "On the origin of the Hungarian method for the assignment problem, in J.K. Lenstra, A.H.G. RinnooyKan, A. Schrijver", History of Mathematical Programming, Amsterdam, North-Holland, 1991, pp. 77-817estricted Boltzmann Machine Assignment Algorithm:Application to solve many-to-one matching problems on weighted bipartite graph A PREPRINT
Node1 Node2 Node1 Node2 Node1 Node2 Node1 Node2 Node1 Node2 Node1 Node2 Node1 Node2