[PDF] Hierarchical clustering with deep Q-learning

Abstract

The reconstruction and analyzation of high energy particle physics data is just as important as the analyzation of the structure in real world networks. In a previous study it was explored how hierarchical clustering algorithms can be combined with kt cluster algorithms to provide a more generic clusterization method. Building on that, this paper explores the possibilities to involve deep learning in the process of cluster computation, by applying reinforcement learning techniques. The result is a model, that by learning on a modest dataset of 10; 000 nodes during 70 epochs can reach 83; 77% precision in predicting the appropriate clusters.

Full PDF

aa r X i v : . [ c s . A I] M a y Acta Univ. Sapientiae, Informatica, x, y (z) x–y Hierarchical clustering with deepQ-learning

Rich´ard FORSTER

E¨otv¨os Universityemail: [email protected] ´Agnes F ¨UL ¨OP

E¨otv¨os Universityemail: [email protected]

Abstract.

The reconstruction and analyzation of high energy particlephysics data is just as important as the analyzation of the structure inreal world networks. In a previous study it was explored how hierarchicalclustering algorithms can be combined with k t cluster algorithms to pro-vide a more generic clusterization method. Building on that, this paperexplores the possibilities to involve deep learning in the process of clustercomputation, by applying reinforcement learning techniques. The resultis a model, that by learning on a modest dataset of

10, 000 nodes dur-ing epochs can reach

83, 77 % precision in predicting the appropriateclusters.

Diﬀerent datasets should be clusterized with speciﬁc approaches. For real worldnetworks, hierarchical algorithms, like the Louvain method, provides an eﬃ-cient way to produce the clusters. Fusing some of the aspects of these processesand the k t jet clustering, a more generic process can be conceived as it wasstudied in [25]. This solution might prove very useful for heavy ion physics,where the jet physics plays an important role. The contribution in this paperis a deep learning method, that uses reinforcement learning, to teach an arti-ﬁcial neural network how to clusterize the input graphs without any external Computing Classiﬁcation System 2012:

I.1.4

Mathematics Subject Classiﬁcation 2010:

Key words and phrases: jet, cluster algorithm, hierarchical clustering, deep q-learning,neural network, multi-core, keras, cntk, louvain R. Forster, ´A. F¨ul¨op user interaction. The evaluation is provided on real world network data, thatconforms the original Louvain method’s properties, so a more thorough exam-ination is possible. Looking at the results, the neural network is capable toachieve an average precision on the test dataset of

83, 77 %, even by runningfor only epochs. This section contains a brief introduction of the used hierarchical clusteringalgorithm and of the jet algorithms from physics.

The Louvain method [4], is a multi-phase, iterative, greedy hierarchical clus-terization algorithm, working on undirected, weighted graphs. The algorithmprocesses through multiple phases, within each phase multiple iterations untila convergence criteria is met. Its parallelization was explored in [12], that wasfurther evolved into a GPU based implementation as was detailed in [8]. Themodularity is a monotonically increasing function, spreading across multipleiterations, giving a numerical representation on the quality of the clusters.Because the modularity is monotonically increasing, the process is guaranteedto terminate. Running on a real world dataset, termination is achieved in notmore than a dozen iterations.

On a set, S = C , C , ..., C k , containing every community in a given partitioningof V , where ≤ k ≤ N and V is the set of nodes, modularity Q is given bythe following [14]: Q = X i ∈ V e i → C ( i ) − X C ∈ S ( deg C · deg C ) , (1)where deg C is the sum of the degrees of all the nodes in community C and W is the sum of the weight of all the edges.Modularity has multiple variants, like the ones described in [18], [2] and [3].Yet the one deﬁned in Eq. 1 is the more commonly used. lustering with deep learning During the last 40 years several jet reconstruction algorithms have been devel-oped for hadronic colliders [23][1]. The ﬁrst ever jet algorithm was publishedby Sterman and Weinberg in the 1970’s [17]. The cone algorithm plays animportant role when a jet consists of a large amount of hadronic energy in asmall angular region. It is based on a combination of particles with their neigh-bours in η − ϕ space within a cone of radius R = p ( ∆ϕ + ∆η ) . Howeverthe sequential recombination cluster algorithms combine the pairs of objectswhich have very close p t values. The particles merge into a new cluster throughsuccessive pair recombination. The starting point is the lowest p t particles forclustering in the k t algorithm, but in the anti- k t recombination algorithm itis the highest momentum particles.The jet clustering involves the reconstructed jet momentum of particles,which leaves the calorimeter together with modiﬁed values by the trackersystem. The Cone algorithm is one of the regularly used methods at the hadron col-liders. The main steps of the iteration are the following [17]: the seed par-ticle i belongs to the initial direction, and it is necessary to sum up themomenta of all particle j , which is situated in a circle of radius R ( ∆R =( y i − y j ) + ( ϕ i − ϕ j ) < R ) around i , where y i and ϕ i are the rapidity andazimuth of particle i .The direction of the sum is applied as a new seed direction. The iterationprocedure is repeated as long as the direction of the determined cone is stable.It is worth noting what happens when two seed cone overlaps during theiteration. Two diﬀerent groups of cone algorithms are discussed: One possiblesolution is to select the ﬁrst seed particle that has the greatest transversemomentum. Have to ﬁnd the corresponding stable cone, i.e. jet and delete theparticles from the event, which were included in the jet. Then choose a newseed, which is the hardest particle from the remaining particles, and apply tosearch the next jet. The procedure is repeated until there is no particle thathas not worked. This method avoids overlapping.Other possibility is the so called ”overlapping” cones with the split-mergeapproach. All the stable cones are found, which are determined by iterationfrom all particles. This avoids the same particle from appearing in multiplecones. The split-merge procedure can be used to consider combining pair of R. Forster, ´A. F¨ul¨op cones. In this case more than a fraction f of the transverse momentum of thesofter cones derives from the harder particles; otherwise the common particlesassigned to the cone, which is closer to them. The split-merge procedure appliesthe initial list of protojets, which contains the full list of stable cones:1. Take the protojet with the largest p t (i.e. hardest protojet), label it a .2. Search the next hardest protojet that shares particles with a (i.e. over-laps), label it b . If no such protojet exists, delete a from the list ofprotojets and add it to the list of ﬁnal jets.3. Determine the total p t of the particles, which is shared between the twoprotojets, p t,shared . • If p t,shared > f , where f is a free parameter, it is called the overlapthreshold, replace protojets a and b with a single merged protojet. • Otherwise the protojets are scattered, for example assigning theshared particles to the protojet whose axis is closer.4. Repeat from step 1 as long as there are protojets left.A similar procedure to split-merge method is the so called split-drop proce-dure, where the non-shared particles, which fall into the softer of two overlap-ping cones are dropped, i.e. are deleted from the jets altogether.

They go beyond just ﬁnding jets and implicitly assign a clustering sequenceto an event, which is often closely connected with approximate probabilisticpictures that one may have for parton branching. The current work focuseson the k t algorithm, whose parallelisation was studied in [10] and [24]. The k t algorithm for hadrons In the case of the proton-proton collision,the variables which are invariant under longitudinal boots are applied. Thesequantities which were introduced by [26] and the distance measures are longi-tudinally invariant as the following: d ij = min ( p , p ) ∆R , ∆R ij = ( y i − y j ) + ( ϕ i − ϕ j ) (2) d iB = p . (3)In this deﬁnition the two beam jets are not distinguished. lustering with deep learning p = − , then it gives the ”anti- k t ” algorithm. In this case the clusteringcontains hard particles instead of soft particles. Therefore the jets extend out-wards around hard seeds. Because the algorithm depends on the energy andangle through the distance measure, therefore the collinear branching will becollected at the beginning of the sequence. Hierarchical k t clustering In [25] it was studied how to do hierarchicalclustering, following the rules of the k t algorithm. First the list of particleshas to be transformed into a graph, with the particles themselves appointedas nodes. The distance between the elements is a suitable selection for aweight to all edges between adjacent particles. But as it eventually leads upto n ∗ ( n − ) /2 links, where n is the number of nodes, a better solution is tomake connections between nearest neighbours and to the second to nearest.If the particle’s nearest ”neighbour” is the beam, it will be represented withan isolated node. While the Louvain algorithm relies on modularity gain todrive the computation, the jet clustering variant doesn’t have the modularitycalculation, as it is known that the process will end, when all particles areassigned to a jet.The result of this clustering will still be a dendogram, where the leafs willrepresent the jets. Since the beginning of the 1990s the artiﬁcial neural network (ANN) methodsare employed widely in the high energy physics for the jet reconstruction andtrack identiﬁcation[29][33]. These methods are well-known in oﬄine and onlinedata analysis also.Artiﬁcial neural networks are layered networks of artiﬁcial neurons (AN) inwhich biological neurons are modelled. The underlying principle of operationis as follows, each AN receives signals from another AN or from environment,gathers these and creates an output signal which is forwarded to another ANor the environment. An ANN contains one input layer, one or more hiddenlayers and one output layer of ANs. Each AN in a layer is connected to theANs in the next layer. There are such kind ANN conﬁgurations, where thefeedback connections are introduced to the previous layers.

R. Forster, ´A. F¨ul¨op

An artiﬁcial neuron is denoted by a set of input signals ( x , x , . . . x n ) fromthe environment or from another AN. A weight w i ( i =

1, . . . n ) is assignedto each input signal. If the value of weight is larger than zero then the signalis excited, otherwise the signal is inhibited. AN assembles all input signals,determines a net signal and propagates an output signal. Some features of neural systems which makes them the most distinct from theproperties of conventional computing: • The associative recognition of complex structures • Data may be non-complete, inconsistent or noisy • The systems can train, i.e. they are able to learn and organize themselves • The algorithm and hardware are parallelThere are many types of artiﬁcial neural networks. In the high energy partic-le physics the so-called multi-layer perception (MLP) is the most widespread.Here a functional mapping from input x k to output z k values is realised witha function f z k : z k = f z k  m + X j = w k j f y j n + X i = v ji x i ! , where v ji are the weights between the input layer and the hidden layer, and w kj are the weights between the hidden layer and the output layer. This typeof ANN is called feed-forward multi-layer ANN.It can be extended into a layer of functional units. In this case an activationfunction is implemented for the input layer. This ANN type is called functionallink ANN. The output of this ANN is similar such as previously ANN, withoutit has additional layer, which contains q functions h l ( x . . . x n )( l = ) .The weights between the input layer and the functional layer are u li = , if h l depends on x i , and u li = otherwise. The output of this ANN is: z k = f z k  m + X j = w kj f y j q + X l = v jl h l ( x . . . x n ) ! . lustering with deep learning Application in High-Energy Physics

The ﬁrst application, which waspublished in 1988, discussed a recurrent ANN for tracking reconstruction[27]. A recurrent ANN was also used for tracking reconstruction in LEPexperiment[28].An article published about a neural network method which was applied toﬁnd eﬃcient mapping between certain observed hadronic kinematical variablesand the quark-gluon identify. With this method it is able to separate gluonfrom quark jets originating from the Monte-Carlo generated e + e − events [30].A possible discrimination method is presented by the combination of a neuralnetwork and QCD to separate the quark and gluon jet of e + e − annihilation[34].The neural network clusterisation algorithm was applied for the ATLASpixel detector to identify and split merged measurements created by multiplecharged particles[31]. The neural network based cluster reconstruction algo-rithm which can identify overlapping clusters and improves overall particleposition reconstruction [32].Artiﬁcial intelligence oﬀers the potential to automate challenging data-processing tasks in collider physics. To establish its prospects, it was exploredto what extent deep learning with convolutional neural networks can discrim-inate quark and gluon jets [35]. Q-learning is a model-free reinforcement learning technique [39]. The reinforce-ment learning problem is meant to be learning from interactions to achieve agoal. The learner and decision-maker is called the agent. The thing it interactswith is called the environment, that contains everything from the world sur-rounding the agent. There’s a continuous interaction between, where the agentselects an action and the environment responds by presenting new situations(states) to the agent. The environment also returns rewards, special numericalvalues that the agent tries to maximize over time. A full speciﬁcation of anenvironment deﬁnes a task, that is an instance of the reinforcement learningproblem. Speciﬁcally, the agent and environment interact at each of a sequenceof discrete time steps t =

0, 1, 2, . . . . At each time step t , the agent receivesthe environment’s state, s t ∈ S , where S is the set of possible states, and based R. Forster, ´A. F¨ul¨op on that it selects an action, a t ∈ A( s t ) , where A( s t ) is the set of all availableactions in state s t . At the next time step as a response to the action, the agentreceives a numerical reward, r t + ∈ R , and ﬁnds itself in a new state, s t + (Figure 1).Figure 1: The agent-environment interaction in reinforcement learningAt every time step, the agent implements a mapping from states to proba-bilities of selecting the available actions. This is called the agent’s policy andis denoted by π t , where π t ( s, a ) is the probability that a t = a if s t = s .Reinforcement learning methods specify how the agent changes this using itsexperience. The agent’s goal is to maximize the total amount of reward itreceives over the long run. The purpose or goal of the agent is formalized in terms of a special rewardpassed from the environment. At each time step, the reward is a simple num-ber, r t ∈ R . The agent’s goal is to maximize the total reward it receives. lustering with deep learning If the rewards accumulated after time step t is denoted by r t + , r t + , r t + , . . . ,what will be maximized by the agent is the expected return R t , that is deﬁnedas some function of the received rewards. The simplest case is the sum of therewards: R t = r t + + r t + + r t + + · · · + r T , where T is the ﬁnal time step. Thisapproach comes naturally, when the agent-environment interaction breaks intosubsequences, or episodes. Each episode ends in a special terminal state, thatis then being reset to a standard starting state. The set of all nonterminalstates is denoted by S , while the set with a terminal state is denoted by S + .Introducing discounting, the agent tries to maximize the the sum of thediscounted rewards by selecting the right actions. At time step t choosingaction a t , the discounted return will be deﬁned with equation 4. R t = r t + + γr t + + γ r t + + · · · = ∞ X k = γ k r t + k + , (4)where γ is a parameter, ≤ γ ≤ , called the discount rate. It determinesthe present value of future rewards: a reward received at time step t + k isworth only γ k − times the immediate reward. If γ < 1 , the inﬁnite sum stillis a ﬁnite value as long as the reward sequence { r k } is bounded. If γ = ,the agent is concerned only with maximizing immediate rewards. If all actionsinﬂuences only the immediate reward, then the agent could maximize equation4 by separately maximizing each reward. In general, this can reduce access tofuture rewards and the return may get reduced. As γ approaches 1, futurerewards are used more strongly. Assuming a ﬁnite set of states and reward values, also considering how ageneral environment responds at time t + to the action taken at time t , thisresponse may depend on everything that has happened earlier. In this caseonly the complete probability distribution can deﬁne the dynamics: Pr { s t + = s ′ , r t + = r | s t , a t , r t , s t − , a t − , . . . , r , s , a } , (5)for all s ′ , r , and all possible values of the past events: s t , a t , r t , . . . , r , s , a .If the state has the Markov property the environment’s response at t + depends only on the state and action at t and the dynamics can be deﬁned byapplying only equation 6.0 R. Forster, ´A. F¨ul¨op Pr { s t + = s ′ , r t + = r | s t , a t } , (6)for all s ′ , r , s t , and a t . Consequently if a state has the Markov property,then it’s a Markov state, only if 6 is equal to 5 for all s ′ , r , and histories, s t , a t , r t , . . . , r , s , a . In this case, the environment has the Markov property. A reinforcement learning task satisfying the Markov property is a Markov de-cision process, or MDP. If the state and action spaces are ﬁnite, then it is aﬁnite MDP. This is deﬁned by its state and action sets and by the environ-ment’s one-step dynamics. Given any state, action pair, ( s, a ) , the probabilityof each possible next state, s ′ , is P ass ′ = Pr { s t + = s ′ | s t = s, a t = a } Having the current state and action, s and a , with any next state, s ′ , theexpected value of the next reward can be computed with R ass ′ = E { r t + | s t = s, a t = a, s t + = s ′ } . These quantities, P ass ′ and R ass ′ , completely specify the most important as-pects of the dynamics of a ﬁnite MDP. Reinforcement learning algorithms are generally based on estimating valuefunctions, that are either functions of states or state-action. They estimatehow good a given state is, or how good a given action in the present state is.How good it is, depends on future rewards that can be expected, more precisely,on the expected return. As the rewards received depends on the taken actions,the value functions are deﬁned with respect to particular policies. A policy, π ,is a mapping from each state, s ∈ S , and action, a ∈ A ( s ) , to the probability π ( s, a ) of taking action a while in state s . The value of a state s under a policy π , denoted by V π ( s ) , is the expected return when starting in s and following π . For MDPs V π ( s ) is deﬁned as V π ( s ) = E π { R t | s t = s } = E π { ∞ X k = γ k r t + k + | s t = s } , lustering with deep learning E π is the expected value given that the agent follows policy π . Thevalue of the terminal state is always zero. V π is the state-value function forpolicy π . Similarly, the value of taking action a in state s under a policy π ,denoted by Q π ( s, a ) is deﬁned as the expected return starting from s , takingthe action a , and following policy π : Q π ( s, a ) = E π { R t | s t = s, a t = a } = E π { ∞ X k = γ k r t + k + | s t = s, a t = a } .Q π is the action-value function for policy π . V π and Q π can be estimated from experience. If an agent follows policy π and maintains an average of the actual return values in each encountered state,then it will converge to the state’s value, V π ( s ) , as the number of times thatstate is encountered approaches inﬁnity. If in a given state every action has aseparate average, then these will also converge to the action values, Q π ( s, a ) . To solve a reinforcement learning task, a speciﬁc policy needs to be found,that achieves a lot of reward over the long run. For ﬁnite MDPs, an optimalpolicy can be deﬁned. Value functions deﬁne a partial ordering over policies.A policy π is deﬁned to be better than or equal to a policy π ′ if its expectedreturn is greater than or equal to π ′ for all states. Formally, π ≥ π ′ if and onlyif V π ( s ) ≥ V π ′ ( s ) for all s ∈ S . At least one policy exists, that is better thanor equal to all other policies and this is the optimal policy. If more than oneexists, the optimal policies are denoted by π ∗ . The state-value function amongthem is the same, called the optimal state-value function, denoted by V ∗ , anddeﬁned as V ∗ ( s ) = max π V π ( s ) , for all s ∈ S . The optimal action-value functions are also shared, denotedby Q ∗ , and deﬁned as Q ∗ ( s, a ) = max π Q π ( s, a ) , for all s ∈ S and a ∈ A ( s ) . For the state-action pair ( s, a ) , this gives theexpected return for taking action a in state s and following an optimal policy.Thus, Q ∗ can be deﬁned in terms of V ∗ as follows:2 R. Forster, ´A. F¨ul¨op Q ∗ ( s, a ) = E { r t + + γV ∗ ( s t + ) | s t = s, a t = a } . The Deep Q-learning (DQL) [40] [41] is about using deep learning techniqueson the standard Q-learning (section 4).Calculating the Q state-action values using deep learning can be achieved byapplying the following extensions to standard reinforcement learning problems:1. Calculate Q for all possible actions in state s t ,2. Make prediction for Q on the new state s t + and ﬁnd the action a t + = max a a ∈ A ( s t + ) , that will yield the biggest return,3. Set the Q return for the selected action to r + γQ ( s t + , a t + ) . For allother actions the return should remain unchanged,4. Update the network using back-propagation and mini-batches stochasticgradient descent.This approach in itself leads to some additional problems. The exploration-exploitation issue is related to which action is taken in a given state. By se-lecting an action that always seems to maximize the discounted future reward,the agent is acting greedy and might miss other actions, that can yield higheroverall reward in the long run. To be able to ﬁnd the optimal policy the agentneeds to take some exploratory steps at speciﬁc time steps. This is solved byapplying the ǫ − greedy algorithm [39], where a small probability ǫ will choosea completely random action.The other issue is the problem of the local-minima [42]. During trainingmultiple states can be explored, that are highly correlated and this may makethe network to learn replaying the same episode. This can be solved, by ﬁrststoring past observations in a replay memory and taking random samples fromthere for the mini-batch, that is used to replay the experience. The environment provides the state that the agent will react to. In case ofclustering the environment will be the full input graph. The actual state thenecessary information required to compute the Louvain method, packaged intoa Numpy stack. These include the weights, degrees, number of loops, the actual lustering with deep learning and in any other case the returned value will be − . Afterstepping, the next state will contain the modiﬁed community informations.The agent’s action space is ﬁnite and predeﬁned and the environment alsohas to reﬂect this. Let the cardinality of the action space be noted for all s ∈ S states by | A ( s ) | For this reason, the state of the environment containsinformation about only | A ( s ) | neighbors. This can lead to more nodes, thanhow many really is connected to a given element. In this case the additionaldummy node’s values are ﬁlled with extremals, in the current implementationwith negative numbers. One limitation of the actual solution is that if thenumber of neighbors are higher, than | A ( s ) | , then only the ﬁrst | A ( s ) | neighborswill be considered, in the order in which they appear in dataset. The ﬁrst”neighbor” will be currently evaluated node, so in case the clusterization willnot yield any better community, the model should see, that the node stays inplace.To help avoid potential overﬂow during the computation, weights of theinput graph are normalized to be between and . The agent acts as the decision maker, selecting the next community for a givennode. It takes the state of the environment as an input and gives back the indexof the neighbor that is considered to be providing the best community.

Keras [43] is a Python based high-level neural networks API, compatible withthe TensorFlow, CNTK, and Theano machine learning frameworks. This APIencourages experimentation as it supports rapid development of neural net-works. It allows easy and fast prototyping, with a user friendly, modular, andextensible structure. Both convolutional networks and recurrent networks canbe developed, also their combinations are also possible in the same agent. Asall modern neural network API it both runs on CPU and GPU for higherperformance.The core data structure is a model, that is a collection of layers. The simplest4

R. Forster, ´A. F¨ul¨op type is the

Sequential model , a linear stack of layers. More complex architec-tures also can be achieved using the Keras functional API.The clustering agent utilizes a Sequential model: from keras.models import Sequentialmodel = Sequential()

Stacking layers into a model is done through the add function: from keras.layers import Densemodel.add(Dense(128, input_shape=(self.state_size,self.action_size), activation=’relu’))model.add(Dropout(0.5))model.add(Dense(128, activation=’relu’))model.add(Dropout(0.5))model.add(Dense(128, activation=’relu’))

The ﬁrst layer will handle the input and has a mandatory parameter deﬁningits size. In this case input shape is provided as a 2-dimensional matrix, where state size is the number of parameters stored in the state and action size isthe number of possible actions. The ﬁrst parameter tells how big the outputdimension will be, so in this case the input will be propagated into a 128-dimensional output.The following two layers are hidden layers (section 3) with internalnodes, with rectiﬁed linear unit (ReLU) activation. The rectiﬁer is an activa-tion function given by the positive part of its argument: f ( x ) = x + = max (

0, x ) ,where x is the input to a neuron. The rectiﬁer was ﬁrst introduced to a dy-namical network in [19]. It has been demonstrated in [20] to enable bettertraining of deeper networks, compared to the widely used activation functionprior 2011, the logistic sigmoid [44].During training overﬁtting happens, when the ANN goes to memorize thetraining patterns. In this case the network is weak in generalizing on newdatasets. This appears for example, when an ANN is very large, namely it hastoo many hidden nodes and hence, there are too many weights which need tobe optimized.The dropout for the hidden layers is used to prevent overﬁtting on the learningdataset. Dropout is a technique that makes some randomly selected neuronsignored during training. Their contribution to the activation of neurons ondeeper layers is removed temporally and the weight updates are not applied lustering with deep learning model.add(Flatten()) Finally to have the output provide the returns on each available actions, thelast layer changes the output dimension to action size : model.add(Dense(self.action_size, activation=’linear’)) Once the model is set up, the learning process can be conﬁgured with the compile function: model.compile(loss=’mse’, optimizer=Adam(lr=self.learning_rate)), where learning rate has been set to . For the loss function mean squarederror is used, optimizer is an instance of

Adam [21] with the mentioned learningrate. The discount rate for future rewards have been set to γ = . Thisway the model will try to select actions, that yield the maximum rewards inthe short term. While maximizing the reward in long term can eventually leadto a policy, that computes the communities correctly, choosing it this smallmakes the model learn to select the correct neighbors faster.To make a prediction on the current state, the predict function is used: model.predict(state.reshape(1, self.state_size,self.action_size)) For Keras to work on the input state, it always have to be reshaped intodimensions (

1, state size, action size ) , while the change always has to keepthe same number of state elements. Evaluation of the proposed solution is done by processing network clustering onundirected, weighted graphs. These graphs contain real network information,instead of evaluating on physics related datasets (section 2.2), as it is more6

R. Forster, ´A. F¨ul¨op suitable for the original Louvain method. Because of this, the modularity canbe used as a sort of metric to measure the quality (subsection 2.1) of theresults. Additionally the number of correct predictions and misses are used todescribe the deep Q-learning (section 5) based method’s eﬃciency.Numerical evaluations are done by generating one iteration on the ﬁrst levelof the dendogram as the top level takes the most time to generate as it isbased on all the original input nodes. The GPU implementation of the Louvainmethod being used was ﬁrst described in [8].

The proposed model, as well as the Louvain clustering works on undirected,weighted graphs. Such graphs can be generated from U.S. Census 2010 andTiger/Line 2010 shapeﬁles, that are freely available from [45]. They containthe following: • the vertices are the Census Blocks; • there’s an edge between two vertices if the corresponding Census Blocksshare a line segment on their border • each vertex has two weights: – Census2010 POP100 or the number of people living in that CensusBlock – Land Area of the Census Block in square meters • the edge weights are the pseudo-length of the shared borderlines. • each Census Block is identiﬁed by a point, that is given longitudinal andlatitudinal coordinatesA census block is the smallest geographical unit used by the United StatesCensus Bureau for tabulation of 100-percent data. The pseudo-length is givenby p ( x + y ) , where x and y are the diﬀerences in longitudes and latitudesof each line segment on the shared borderlines. The ﬁnal result is multipliedby to make the edge weights integers. For clusterization the node weightsare not used.The matrices used for evaluation contains the information related to NewYork, Oregon and Texas (table 1), that was arbitrary selected from the SuiteS-parse Matrix Collection [37]. The graph details can be found in [36]. lustering with deep learning Edges

1, 709, 544 979, 512 4, 456, 272

Table 1: Details of the selected datasetsDue to the limitations of the proposed solution as was described in subsec-tion 5.1, neighbors are kept for each nodes during the computation. The model described in section 5 have been trained on the Oregon graph,taking the ﬁrst nodes based on the order how they are ﬁrst mentionedin the original dataset, running for epochs. The ratio of the good and badpredictions are shown in table 2.New York Oregon TexasPositive Negative

39, 197 24, 593 163, 668

Table 2: Precision of the deep learning solutionThe deep learning solution’s precision in average is

83, 77 %. Speciﬁcallyon the datasets it’s respectively

87, 4 %,

85, 7 % and

78, 2 %. Precision can befurther increased by running the training for more epochs or by further tunethe hyperparameters.

The Louvain method assumes nothing of the input graph. The clusterizationcan be done without any prior information of the groups being present in thenetwork. The modularity is presented (table 3) for all test matrices for boththe Louvain algorithm and the deep Q-learning based solution.New York Oregon TexasDQL

38, 752.84 6, 659.93 99, 789.88

Louvain

45, 339.84 7, 724.99 120, 745.75

Table 3: Precision of the deep learning solution8

R. Forster, ´A. F¨ul¨op

The modularities showing similar results to the precision of the network: theNew York graph has a modularity less with

14, 53 % compared to the Louvaincomputation, while Oregon is less with

13, 8 % and Texas is less with

17, 36 %.This proves, that by loosing from the precision, the qualities of the clusters donot degrade more than, what is lost on the precision.

In this paper a new hierarchical clustering was proposed based on the Lou-vain method, using deep learning. The detailed model was capable to achieve

83, 77 % of precision, while only being teached for epochs. Even with theerror, the resulting modularity in average was less compared to the Louvainmethod’s result with only

15, 23 %. The current solution can’t process the whole graph, just a subset of the neigh-bors are considered, when processing the communities. This needs to be ex-tended further, to have a fully a realized deep learning clusterizer. The modelstill needs to be evaluated on Jet related datasets and needs to be exploredif any changes are required in the agent to work eﬃciently on those type ofgraphs. Most of the errors come from choosing a dummy node as the best com-munity, which implies that the way, how these nodes are represented shouldbe studied further.

References [1] A. Ali, G. Kramer, Jets and QCD: A Historical Review of the Discovery ofthe Quark and Gluon Jets and its Impact on QCD

Eur. Phys. J. H (2011)245-326. [arXiv:1012.2288 [hep-ph]]. ⇒ ⇒ ⇒ Journal of Statistical Mechanics: Theory and Experiment (2008), doi:P10008 ⇒ lustering with deep learning [5] M.G. Bowler, Femptophysics , Pergamon Press 1990. ⇒ [6] S.D. Ellis, D. E. Soper, Successive combination jet algorithm for hadron collisions Phys. Rev. D ⇒ [7] S. D. Ellis, J. Huston, K. Hatakeyama, P. Loch and M. Tonnesmann,Jets in Hadron-Hadron Collisions Prog. Part. Nucl. Phys. (2008) 484[arXiv:0712.2447 [hep-ph]]. ⇒ [8] R. Forster. Louvain Community Detection with Parallel Heuristics On GPUs,20th Jubilee IEEE International Conference on Intelligent Engineering Systems (2016), ISBN:978-1-5090-1216-9, doi: 10.1109/INES.2016.7555126 ⇒

2, 16[9] R. Forster, A. F¨ul¨op, Parallel k t jet clustering algorithm, Acta Univ. Sapientiae Informatica ⇒ [10] R. Forster, A. F¨ul¨op, Jet browser model accelerated by GPUs, Acta Univ. Sapientiae Informatica ⇒ Acta Univ. Sapientiae, Inf. , , 2 (2013) 184–211. ⇒ [12] Hao Lu, Mahantesh Halappanavar, Ananth Kalyanaraman, Parallel heuristicsfor scalable community detection, Parallel Computing 47 (2015) 1937 ⇒ Foundation of Quantum Chrodinamics , World Scientiﬁc Press 1986. ⇒ [14] M.E.J. Newman, M. Girvan, Finding and evaluating community structure innetworks, Phys. Rev. E 69 (2) (2004) 026113. ⇒ Field Theory , Westview Press, 1995. ⇒ [16] D. Rohr, S. Gorbunov, A. Szostak, M. Kretz, T. Kollegger, T. Breit-ner, T. Alt, ALICE HLT TPC Tracking of Pb-Pb Events on GPUs, Journal of Physics: Conference Series ⇒ [17] G. Sterman and S. Weinberg, Jets from Quantum Chromodynamics, Phys. Rev. Lett. (1977)1436. ⇒ ⇒ ⇒ ⇒ ⇒ Eur. Phys. J. C (2010)637-686[arXiv:0906.1833 [hep-ph]]. ⇒ [23] S. Salur, Full Jet Reconstruction in Heavy Ion Collisions, Nuclear Physics A ⇒ k t jet clustering algorithm, Acta Univ. SapientiaeInformatica ⇒ R. Forster, ´A. F¨ul¨op [25] R. Forster, A. F¨ul¨op, Hierarchical k t jet clustering for parallel achitectures, ActaUniv. Sapientiae Informatica ⇒

1, 5[26] S. Carani, Yu.L Dokshitzer, M.H. Seymour, B.R. Webher, Longitudinally-invariant k ⊥ -clustering algorithms for hadron-hadron collisions, Nuclear PhysicsB (1993)187-224. ⇒ Computer Physics Communications (1988)429-448. ⇒ Nuclear Instruments and Meth-ods

A279 (1988)537. ⇒ Com-puter Physics Communications (1999)219. ⇒ Nuclear Physics

B349 (1991) 675-702. ⇒ Journal of Physics: Conbnference Series (2014)012023. ⇒ Nuclear Instruments and Methods in Physics Research A (2013)363-365. ⇒ NuclearInstruments and Methods in Physics Research A (1995) 14-20. ⇒ Phys. Rev. D ⇒ J. High Energy Physics (2017)2017:110. ⇒ ⇒ ⇒ ⇒ ⇒

7, 12[40] Volodymyr Mnih1 et al., Playing Atari with Deep Reinforcement Learning, 2013,arXiv:1312.5602 ⇒ ⇒ ⇒ ⇒ ⇒ ⇒⇒