[PDF] Analysis of association football playing styles: an innovative method to cluster networks

Abstract

In this work we develop an innovative hierarchical clustering method to divide a sample of undirected weighted networks into groups. The methodology consists of two phases: the first phase is aimed at putting the single networks in a broader framework by including the characteristics of the population in the data, while the second phase creates a subdivision of the sample on the basis of the similarity between the community structures of the processed networks. Starting from the representation of the team's playing style as a network, we apply the method to group the Italian Serie A teams' performances and consequently detect the main 15 tactics shown during the 2015-2016 season. The information obtained is used to verify the effect of the styles of play on the number of goals scored, and we prove the key role of one of them by implementing an extension of the Dixon and Coles model (Dixon and Coles, 1997).

Full PDF

AAnalysis of association football playing styles: an innovativemethod to cluster networks

Jacopo Diquigiovanni , Bruno Scarpa Abstract

In this work we develop an innovative hierarchical clustering method to divide asample of undirected weighted networks into groups. The methodology consists oftwo phases: the first phase is aimed at putting the single networks in a broaderframework by including the characteristics of the population in the data, whilethe second phase creates a subdivision of the sample on the basis of the similaritybetween the community structures of the processed networks. Starting from therepresentation of the team’s playing style as a network, we apply the method togroup the Italian Serie A teams’ performances and consequently detect the main15 tactics shown during the 2015-2016 season. The information obtained is used toverify the effect of the styles of play on the number of goals scored, and we prove thekey role of one of them by implementing an extension of the Dixon and Coles model(Dixon and Coles , 1997).

In recent years the exponential growth of available data and the progressive openingup to ’statistical culture’ by football operators have led to the development of severalstudies in this field (Anderson and Sally , 2013). The unique nature of associationfootball means that simple statistics do not allow a detailed measurement of teamperformances: in fact, the signal contained in the data is hardly extractable sincemany accidental events may increase the noise, especially if data consist only of lowcounts such as the number of goals scored during the match (Pe˜na and Touchette ,2012).As a consequence, the playing style (or tactic ) is certainly an element of primaryimportance in this sport (Pe˜na and Navarro , 2015). Generally speaking, it is possibleto define it as a combination of two different aspects: the offensive playing style,defined as the way a team moves the ball on the pitch, and the defensive playing style,defined as the way a team defends against the way the opposing team moves theball on the pitch. In this work we focus on the former, and so later in the discussionthe term ’playing style’ will be used only to refer to the offensive playing style. Overtime, some studies have addressed this issue: for example, Hughes and Franks (2005)1 a r X i v : . [ s t a t . A P ] J a n ompare the length of passing sequences, Gyarmati et al. (2014) analyse the passingstructures to find similarities and differences between teams, Cintia et al. (2015a)and Cintia et al. (2015b) propose a pass-based performance indicator to predict theoutcome of the matches and Pe˜na (2014) studies the possession by using a Markovprocess.In statistical terms, the playing style could be adequately represented by aweighted network (Brandt and Brefeld , 2015; Clemente et al. , 2015b). Most ofthe papers in this field consider a team’s players as nodes and the passes betweenthem as edges to extract information on the tactics (Clemente et al. , 2015a; Pe˜naand Touchette , 2012; Pina et al. , 2017). Unfortunately, this formulation makescomparisons between different playing styles not immediate as the players vary fromsquad to squad. In view of this, this paper considers networks whose nodes aredifferent areas of the pitch and whose edges describe the movements of the ballbetween these areas; in so doing the playing styles are properly summarised and it isalso possible to make comparisons because the pitch is obviously shared between theteams.The aim of the paper is to group the networks in order to compare the differentstyles played by football teams during the season. In order to do this, we proposea novel clustering method for network data, the aim of which is to divide the n weighted networks into clusters according to a specific criterion. The authors are notaware of specific methods to detect clusters of networks, although some models (e.g.,the bayesian model proposed by Durante et al. , 2017, for binary networks) can beapplied for this purpose, so the present paper suggests an initial proposal in this fieldof research. In this scenario, a couple of aspects should be considered: the networkis, in fact, a set of connected nodes, but also a unit drawn from a more generalpopulation of networks. In view of the above, after the preprocessing phase thatputs the networks in a sample framework, the method defines the similarity betweennetworks on the basis of the similarity between community structures detected bythe Louvain method (Blondel et al. , 2008). The methodology has a hierarchicalagglomerative structure: from the initial situation in which each unit is assigned toa separate group, the process creates a sequence of nested subdivisions of data.The method, by clustering the performances of the teams as networks, identifiesthe main styles of play, and this information is used to verify whether and how ateam’s playing style affects the number of goals scored by implementing an extensionof the Dixon and Coles model (Dixon and Coles , 1997).The paper is organised as follows: in Section 2 we create the networks from theavailable data; in Section 3 the clustering method is developed; in Section 4 themethod is applied to the analysis of tactics and in Section 5 an overview of the mainresults is provided. Different approaches are available to analyze association football tactics (Andersonand Sally , 2013). Among them, the representation of the playing style as a networkis particularly useful (Pe˜na and Touchette , 2012). In this work, for each combinationof match and team (i.e., for each of the two teams playing a specific match) theplaying style is represented by a network whose nodes are the different areas intowhich the pitch is divided and whose edges describe the movements of the ball2etween these areas: this specification is preferable to the one that considers theplayers as nodes and the passes between them as edges (Clemente et al. , 2015a)because it allows immediate comparison between different tactics since the nodes(i.e. the areas of the pitch) are shared between the teams. Specifically, we considerweighted undirected networks (Newmark , 2010a).The first factor to consider is the shape and the number of areas K identifyingthe different nodes. As regards the first aspect, for the sake of simplicity the areasare obtained by dividing both the length and the width of the pitch into equalsegments (Borrie et al. , 2002; Narizuka et al. , 2014). As regards the second aspect,in a preliminary study we have compared the partitions obtained by the clusteringmethod presented in Section 3 with various values of K ( K = 9, K = 12, K = 18),finding deep differences between the three scenarios. This evidence represents acritical issue since we are not aware of specific information to set K : consequently,the choice of K is based purely on subjective considerations, and we decide to choose K = 9 because it is associated with the best known subdivision of the length ofthe pitch (defense zone - midfield zone - attack zone) and with the best knownsuddivision of the width of the pitch (right zone - central zone - left zone).The available data - provided by InStat ( http://instatfootball.com/ ) - referto the 380 matches of the Italian Serie A TIM 2015-2016 season; since each matchis made up of two distinct networks - one for the home team and one for the awayteam -, the sample size is n = 760.For each combination of match and team, the available data detect spatialcoordinates ( x, y ) of specific plays made by the team’s players during the match,with the x position along the horizontal axis - i.e., the length of the pitch - and the y position along the vertical axis - i.e., the width of the pitch -. These plays aredivided into 12 categories, which can be summarised in four macro categories: • Passes: the starting position of a pass. It includes: accurate passes, inaccuratepasses, assists, accurate crosses and inaccurate crosses. • Dribbling: the starting position of a dribble. It includes: successful dribblesand unsuccessful dribbles. • Tackles: the position of a tackle. It includes: successful tackles, unsuccessfultackles, challenges won and challenges lost. • Shots: the position of the shot.Starting from the spatial coordinates, each play is assigned to one of the nine areas,labeled by the numbers from 1 to 9 (see Figure 1). Hence the weight of the edgelinking node i and node j is represented by the number of pairs of consecutiveplays made by the players of the team without interruption from the players of theopposing team and which take place in areas i and j respectively. If an event is notpreceded and followed by a teammate’s play, this is ignored as an isolated event;moreover, the proposed structure allows the creation of self-loops when i = j . Themain characteristics of the networks are summarised in Appendix A.Naturally, the networks are only an approximation of the playing styles: thedata do not consider the movements of the players when in possession of the ball.As a consequence, the ball is considered to have passed only through the i and j areas in the time between two consecutive events which have occurred in those3 igure 1. Division of the pitch into nine areas. The pitch measurements are 105 ×

68 m: the areas are obtained by dividing both the length and the width of the pitchinto three equal segments. The teams attack from left to right.areas. Based purely on football considerations, one is justified in expecting theteams whose playing style is strongly characterised by the movements with the ballby the players to be different from the teams where this is more limited; the levelof accuracy affecting the approximation therefore changes according to the unitconsidered. Nevertheless, the study of the number of plays recorded by our dataduring the season is informative with regard to this loss of information: the averagenumber of events during the 380 matches, taking into account the effective playingtime of an Italian football match (Santini , 2014), is 23.96 per minute ( sd = 1 . In this section we propose a clustering method for network data, the aim of whichis to divide the n undirected weighted networks into groups according to a specificcriterion: the only constraint required is that the statistical units share the set ofnodes K = { k , k , . . . , k K } .The procedure has a hierarchical structure: in contrast to the partitioningapproach, in which at each step the statistical units are reallocated in a number ofclusters fixed a priori, the hierarchical approach creates nested subdivisions of data(for this and further differences see Kaufman and Rousseeuw , 1990). The method isagglomerative: starting from an initial state in which the number of groups is equalto the number of networks, the algorithm continues until all the units belong to thesame group. As a consequence, the partition at step h can be obtained from theprevious one by merging two clusters. We use an ad hoc measure of similarity tocompare the groups, and the number of clusters - if it is not known a priori - is definedby the analysis of the corresponding dendrogram. In addition, the methodologyconsists of two phases: the preprocessing phase, that requires the specification of athreshold, and the merging phase. A simulation study is provided in Appendix B.4 .1 Similarity between networks Focusing on a single network, the core of the cluster analysis is represented bya set of procedures known as analysis of community structure ; it is essential inmany applications to detect communities of nodes with frequent connections withincommunities and sparse connections between communities in order to find latentrelationships in the data (Newmark , 2010b). In this context, the Louvain method(Blondel et al. , 2008) is certainly a widely used approach, and it represents thestarting point of this paper.Focusing now on a sample of n networks with K shared nodes, a naive clusteringmethod consists of detecting the n community structures and merging, step by step,the networks whose partitions are most similar. The limit of this approach is clear,since every unit is taken individually regardless of its close relationship with the restof the data. In this scenario, a couple of aspects should be taken into account: thenetwork is, in fact, considered on the one hand as a set of connected nodes, and onthe other as a unit drawn from a more general population of networks. Regardingthe latter, it is essential to stress that: • The average number of connections can vary between the networks, with somenetworks sparsely connected and others densely connected. • The weight of the edges can vary within the networks. • The weight of a specific edge can vary between the networks. As a consequence,the community structures can differ considerably.In view of the above, it is first of all necessary to define the concept of similarity between networks. The method considers the community structures detected bythe Louvain method after including the population’s characteristics in the units: inother words, the similarity between networks R t , R s is determined according to thesimilarity of the community structures of the networks R (cid:48) t , R (cid:48) s , with the network R (cid:48) representing appropriate processing of starting network R . The procedure to obtain R (cid:48) is described below. Considering the sequence of weightslinked to the edge connecting nodes i and j in n networks w ij = ( w ij , . . . , w nij ), wedefine: b ij = max( w ij , . . . , w nij ) a ij = min( w ij , . . . , w nij ) (1)with i, j = 1 , . . . , K and then normalize vector w ij : u tij = (cid:40) w tij − a ij b ij − a ij ∈ [0 ,

1] if a ij (cid:54) = b ij , t = 1 , . . . , n . a ij = b ij , t = 1 , . . . , n (2)For the sake of simplicity, if a ij = b ij the value u tij is set equal to the expected valueof a random variable with continuous uniform distribution U (0 , a and b according to the pair of nodes considered allows the5 igure 2. Example of problem in the application of Louvain method.distributive differences between the sequences of weights w ij to be included in theanalysis, an essential topic which has already been discussed.Unfortunately, the normalization is functional but not comprehensive becauseanother problem commonly arises in this context. Let us consider the followingexample: let K = 5 and R be a network in which all the normalized weights areextremely low u ,l = 0 .

05 except for the edge between node 1 and node 2 whichweighs twice as much as the previous ones u ,h = 0 .

10 (see the bottom left of Figure2). Let R also be a network in which all the normalized weights are definitelyhigher u ,l = 0 .

40 and, as in R , the edge between the first two nodes weighs twiceas much ( u ,h = 0 .

8, see the up of Figure 2). The community structures of thetwo networks are identical (see the bottom right of Figure 2): in each case the firsttwo nodes belong to a cluster distinct from the one containing the remaining nodesand consequently the two units have maximum similarity, assuming of course theuse of a measure of similarity that assigns maximum value if the partitions areidentical. However in the first case the group including nodes 1 and 2, consideringthe particularly low normalized quantity u ,h , is found only because the other weightsof the network are even smaller; evidently in this situation detecting this cluster iscounterproductive as u ,h is much lower than the other connections observed in thesample between the first pair of nodes. The reason for this critical issue lies in theuse of the Louvain method which, by construction, compares the edges within thenetwork regardless of the broader framework of this paper.For this reason, a possible approach to limit the problem consists of introducinga threshold s ∈ [0 , u tij (2) we obtain thevalue w (cid:48) tij defined as follows: w (cid:48) tij = (cid:40) u tij if u tij > s t = 1 , . . . , n u tij ≤ s t = 1 , . . . , n (3)Hence the processed network R (cid:48) has the same nodes as starting network R , whereasthe weights are w (cid:48) ij instead of w ij . The choice of the threshold plays a central role:indeed numerically formula (3) does not alter the weights greater than s , while it setsall the other ones to zero. In view of the above, on the one hand a high thresholdvalue ensures that only the communities characterised by remarkable connections6or the population are detected; on the other hand it does not discriminate betweenweights less than or equal to s . Conversely, a near-zero threshold value involves theproblem of the example provided but does not involve loss of information.At the interpretative level, the two proposed adjustments - normalization first,followed by introduction of threshold - relate to two complementary and jointlynecessary aspects: the aim of the former is to unify the range of observable valuesso as not to penalise the connections between pairs of nodes that are sparser thanothers due to the population’s characteristics; conversely, the aim of the latter is tocurb the negative implications of the Louvain method due to its application networkby network .The choice of s represents the last fundamental step in the preprocessing phase.Considering the possible distributive differences between the sequences of weights w ij , it is reasonable to vary s according to the value of the couple ( i, j ). We thereforedefine: s ij = q ( u ij , α ) i, j = 1 , . . . , K (4)with u ij = ( u ij , . . . , u nij ) and q ( x, α ) quantile of order α for x . Formula (4) appears tobe a reasonable compromise: the threshold could vary depending on the distributivecharacteristics of the vector u ij despite the specification of a single scalar α . Formally,formula (3) is updated as follows: w (cid:48) tij = (cid:40) u tij if u tij > s ij t = 1 , . . . , n u tij ≤ s ij t = 1 , . . . , n (5)Since the cluster analysis is typically an unsupervised learning technique, frequentlythe choice of α can not be achieved by using a training set (see, e.g., Hastie et al. ,2009): as a consequence, although a simulation study (see Appendix B) may providesome useful information, the value of the threshold shall be selected on the basis ofthe analyst choice to place more emphasis on one of the two abovementioned aspects. Starting from the new networks, it is possible to detect the n community structurespresent in the sample. As the methodology would like to create groups composedof networks with similar connections between nodes, before the application of theLouvain method the weight of every self-loop is set to zero. Indeed, two networkswhose adjacency matrices share the entries outside the main diagonal could presentdeeply different community structures if the diagonal elements are considered, anaspect that inevitably causes undesirable effects in the reliability of the clusteringmethod. Clearly, the interpretation of the community structures changes radicallywhen considering the processed networks rather than the starting ones: any commu-nity denotes a relationship between nodes which is particularly surprising for thepopulation and not necessarily identifiable by the analysis of the original units.Given the nature of the methodology, it is now possible to justify the use of undi-rected networks since the utilisation of directed ones does not induce improvements.Theoretically, the inclusion of the edges’ direction adds useful information becausethe role of connection i → j is different from that of connection j → i . However, theproposed method detects only the community structures that, by definition, do nothave direction and therefore it cannot, by construction, capture this aspect. As aconsequence, for the sake of simplicity we have considered undirected networks.7he tool used to define the similarity between a pair of partitions is the AdjustedRand Index (or ARI, Hubert and Arabie , 1985), the most common corrected-for-chance version of the Rand Index (Rand , 1971). By calculating this index for eachpair of units it is possible to create a similarity matrix and so to merge the groupsusing the UPGMA method (Sokal and Michener , 1958): the procedure ends whenall the networks belong to the same cluster.The choice of the number of groups N G represents the last fundamental step inthe merging phase. In this context, the evaluation of the difference between themaximum values of similarity (i.e., the maximum values of ARI) in two successivesteps of the method is common practice for the hierarchical procedures (see, e.g.,Azzalini and Scarpa , 2012): for example, a significant drop in similarity providesguidance that the clusters are too different to proceed with subsequent unions.Obviously, as often happens in this field, the criterion cited can be accompanied byqualititative evaluations by the analyst. A brief overview of the procedure is provided in Algorithm 1. Let R = { R , . . . , R n } be a sample of n undirected weighted networks, with K = { k , k , . . . , k K } the setof nodes shared between the networks. The value w tij is the weight of the edgeconnecting node i and node j in network t , with t = 1 , . . . , n e i, j = 1 , . . . , K . Thepossible absence of connection between a couple of nodes involves a zero weight linkedto that edge. Let α also be the selected value for α . The algorithm is structuredas follows: in the first phase, calculate quantity w (cid:48) tij by the normalization of dataand the introduction of the threshold (line 3), therefore obtain the new sample R (cid:48) = { R (cid:48) , . . . , R (cid:48) n } (line 4). In the second phase, set the self-loops of the networksto zero (line 6), detect the community structure for every network using the Louvainmethod (line 7) and create the similarity matrix (line 8). Finally, two steps proceediteratively: obtain the pair of groups with maximum similarity using the UPGMAmethod (if it is not unique, the pair is randomly chosen from those that maximizethe ARI) and merge the pair of groups selected at the previous step (lines 9-14). Theprocedure ends when all the networks belong to the same group. The first aspect to consider in the application of the clustering method is the choiceof the threshold s : since we do not have either a training set or external informationfor set α , it is based on the balance between the two contrasting features presentedin Section 3.2. The selected value for α is 0.95. Such a high threshold is aimed atsacrificing strong relationships detected in the data in order to obtain communitystructures characterised by incredibly deep connections between nodes: in so doing,the problem related to the network by network application of the Louvain method isconsiderably reduced. On the other hand, the methodology does not consider theobvious heterogeneity in the large set of values which are less than or equal to thethreshold. 8 lgorithm 1 Clustering method for sample of networks Start: each network is assigned to a separate group procedure First phase for t in 1 : n , i in 1 : K , j in 1 : K do w (cid:48) tij ← Formulas (1),(2),(4),(5) for t in 1 : n obtain R (cid:48) t new network , with K as set of nodes and weights w (cid:48) tij procedure Second phase for i in 1 : K do w (cid:48) ii ← for t in 1 : n do c t ← community structure of R (cid:48) t by Louvain Method for t in 1 : ( n − t in ( t + 1) : n do s ( R (cid:48) t , R (cid:48) t ) ← ARI ( c t , c t ) while number of groups (cid:54) = 1 do if pairs of groups with max similarity using the UPGMA method > then select a pair of groups randomly from those that maximize the ARI else { the pair of groups with max similarity is unique } select it end if Merge the pair of groups selected end while

Once the method is applied, the analysis of the dendrogram provides usefulindications about the procedure: in the first phase the examination of the number ofmergers with maximum similarity, i.e. the number of steps whose maximum ARI isequal to one, is informative. The value found is 519, and the corresponding partitionconsists of 59 clusters with more than one unit and 182 singletons. This subdivisionshows all the 241 different playing styles of the Italian Serie A TIM 2015-2016season: the following mergers group these tactics proposing clusters characterised byincreasing heterogeneity.The final selection of the number of groups is based on the balance of statisticalcriteria and practical necessities. As explained in Section 3.3, the evaluation of thedifference between the maximum values of similarity in two successive steps of themethod is common practice when choosing N G which, on the other hand, shouldnot be too high to allow the analysis of the partition obtained. In the case of 26groups, the maximum ARI is less than 0.10 (ARI= 0 . N G = 15 in order, at least, to obtain a manageablenumber of clusters. The detailed description of the groups is provided in AppendixC. Modelling the final result of a football match represents a field of deep interest insports analysis. Moving from the paper written by Maher (1982), several authorshave proposed increasingly complex models: Dixon and Coles (1997), Baio andBlangiardo (2010) and Koopman and Lit (2015) are only a few examples. Theaim of this section is to verify whether and how a team’s playing style affects thenumber of goals scored. In a preliminary study we have considered a model which isable to bring together the main features of Maher’s approach and the informationabout playing styles, i.e. the Poisson regression with canonical link function. This9imple model seems to suggest that the on-the-wings playing style (see Appendix C)affects the number of goals scored (for all the details, contact J.Diquigiovanni). Thisexplorative model undoubtedly represents an oversimplification of the phenomenon,and it is unnecessary to focus on it: so as to verify the role of the group 15, anextension of the Dixon and Coles model is implemented as follows.Let X k ∼ P oisson ( λ k ) be the number of goals scored by the home team inmatch k with k = 1 , . . . ,

380 and let Y k ∼ P oisson ( µ k ) be the number of goalsscored by the away team on the same occasion; therefore a home team i and anaway team j are implicitly assigned to every match k with 1 ≤ i (cid:54) = j ≤

20. Forconvenience, the matches are considered chronologically and the season is dividedinto a series of half-weekly time points. Dixon and Coles (1997) propose the use ofa ‘pseudolikelihood’ L t ( α i , β i , ρ, γ ; i = 1 , . . . ,

20) for each time point t , with:log( λ k ) = γ + α i ( k ) + β j ( k ) log( µ k ) = α j ( k ) + β i ( k ) (6)and with γ parameter which allows for the home effect, α i ( k ) , β i ( k ) parameters whichmeasure respectively the attack and defence rates of the teams, i ( k ), j ( k ) indiceswhich denote the home and away teams.So as to verify the effect of the on-the-wings playing style on the number of goalsscored, the suggested adjustment is conceptually simple. Starting from equations (6), we obtain:log( λ k ) = γ + α i ( k ) + β j ( k ) + δc i ( k ) log( µ k ) = α j ( k ) + β i ( k ) + δc j ( k ) with c i ( k ) = 1 if the playing style of team i during match k is assigned to group15, 0 otherwise; δ effect of this tactic on the number of goals scored. In addition,for convenience, the constraint − (cid:80) i =1 α i = α is used instead of the one used byDixon and Coles (cid:80) i =1 α i = 20.In contrast to the original model, the time points are not homogeneously dis-tributed over the season, but the specific day of the year on which the match k takesplace is considered. This adjustment is due to the fact that nowadays, unlike whenDixon and Coles carried out the study, the teams play almost every day of the weekand so a more precise subdivision of the season is recommended.Since the quantity of interest is δ , the inference focuses only on this parameter.Starting from the composite profile likelihood (see, e.g., Molenberghs and Verbeke ,2005) defined as: L Pt ( δ ) = L t ( δ, ˆ α i δ , ˆ β i δ , ˆ ρ δ , ˆ γ δ ) = L t ( δ, ˆ θ δ )with θ = ( α i , β i , ρ, γ ) set of nuisance parameters, it is possible to construct confidenceintervals on the basis of the composite likelihood ratio statistic (Varin et al. ,2011). Let us define ψ = ( δ, θ ), u t ( ψ ) = ∇ ψ log L t ( ψ ) = ∇ ψ l t ( ψ ), the sensitivitymatrix at time t as H t ( ψ ) = E ψ {−∇ ψ u t ( ψ ) } , the variability matrix at time t as J t ( ψ ) = var ψ { u t ( ψ ) } , the Godambe information matrix (Godambe , 1960) at time t as G t ( ψ ) = H t ( ψ ) J t ( ψ ) − H t ( ψ ) and with H δδt ( J δδt , G δδt respectively) the inverseof H t ( ψ ) ( J t ( ψ ) , G t ( ψ ) respectively) pertaining to δ . Therefore according to theasymptotic result proposed by Satterthwaite (1946), the confidence intervals aredefined as follows: (cid:26) δ : 2 (cid:0) l Pt (ˆ δ ) − l Pt ( δ ) (cid:1) ( H δδt ) − G δδt < χ − α (cid:27) − . . . . . . Time points % c on f i den c e i n t e r v a l s Figure 3.

95% confidence intervals and point estimates (horizontal dashes) forparameter δ . The dashed lines divide the time points into months (February-March-April).Since there is only one parameter of interest, the solution provided corresponds tothat established by Geys et al. (1999). From a computational point of view, Varin etal. (2011) propose the following quantities to estimate the sensitivity and variabilitymatrix when the sample size is large:ˆ H t ( ψ ) = − | A t | (cid:88) k ∈ A t ∇ u k ( ˆ ψ CL ) ˆ J t ( ψ ) = 1 | A t | (cid:88) k ∈ A t u k ( ˆ ψ CL ) u k ( ˆ ψ CL ) T with A t = { k : t k < t } , | A t | cardinality of A t , u k ( ψ ) score associated with log-likelihood term l k ( ψ ) such that l t ( ψ ) = (cid:80) k ∈ A t l k ( ψ ), ˆ ψ CL maximum compositelikelihood estimate.As a consequence, in order to have a satisfactory sample size, the confidenceintervals are computed from February 14th, 2016 ( t = 61), whereas the start ofthe season is used to optimize the choice of the parameter ξ through the proceduredescribed by Dixon and Coles; the selected value is 0.003 and this remains constant forthe whole season. In addition, the last five matches of each team are not consideredsince the achieving of the team goal - whether this is to survive relegation or to winthe league - may involve a lack of effort affecting the performances. As a result, theconfidence intervals are computed to April 17th, 2016 ( t = 84) for an overall amountof 24 distinct matchdays: note that, by construction, the progressive increase inthe data considered causes a decrease in the width of the intervals, with obviousimprovements in inference.Figure 3 shows the 95% confidence intervals. The results should be approachedwith particular caution since the composite profile likelihoods are not independent:nevertheless, the evidence in the period considered involves fairly inferential con-clusions. Indeed, all the intervals include only positive values: the on-the-wingsplaying style seems to affect the number of goals scored positively. While maintaining11ts typical oscillatory trend, the sequence of point estimates is a decreasing one:a possible explanation is that, after an initial adjustment period, the teams taketechnical-tactical precautions to limit this tactic, causing a decrease in its attackingefficiency. We have developed an innovative hierarchical clustering method to divide a sampleof undirected weighted networks into clusters. The procedure represents a flexiblesolution as the choice of the threshold s allows the characteristics of the populationto be managed on a case-by-case basis, and so a varied set of scenarios can beproperly addressed. On the other hand, the balance of the trade-off between theloss of information and the reliability of community structures is a potential limitin an unsupervised context: as illustrated through simulations, a good choice of s produces great results, whereas some others adversely affect the partition.The application of the methodology to the analysis of playing styles showsdecidedly interesting results: the building up of the offensive manoeuvre from thelateral zones of the field has a positive effect on the number of goals scored by a team.The high internal heterogeneity of groups - necessary to obtain a manageable numberof clusters - obviously lessens the effectiveness of the approach, and consequently theeffect of the tactics detected on the final outcome of a match.Unfortunately, the adjustment made to the Dixon and Coles model does not allowthe final result of a game to be forecast as the teams’ playing styles are availableonly at the end of the match. To use the proposed method in this context, we planto construct a model to predict the tactic of a team on the basis of its previousperformances; additionally, thanks to the proliferation of in-running betting, it isalso possible to use the information about the playing style during the first half ofthe match to predict the overall tactic in a certain game. Supplementary material

Appendix A

The following analysis focuses on the features that may mark a generic populationof networks (see Section 3.1).The first aspect to investigate is the average weight of the different units; sincethe Italian teams differ considerably as regards passing ability, one is justified inexpecting this value to vary according to the network considered. Figure 4 showsthe kernel density estimation of the average weight according to the team’s positionin the standings at the end of the season. Specifically, the teams are divided intofour categories, high ranking (positions 1-5), middle-high ranking (positions 6-10),middle-low ranking (positions 11-15) and low ranking (positions 16-20). The figureseems to suggest that teams with middle-high and middle-low rankings have similardistributions, whereas the top teams differ greatly from the teams in the relegationzone. Indeed, the first quintile of the rightmost distribution is greater than the ninthdecile of the leftmost one. 12 . . . . . . . . Mean of edges' weight K e r ne l den s i t y e s t i m a t i on Figure 4.

Kernel density estimation of the average weight of the edges for highranking teams (black line), middle-high ranking teams (red line), middle-low rankingteams (blue line) and low ranking teams (green line). The dashed lines display themedians, while the continuous lines display the first quintile (black line) and theninth decile (green line) of the respective distributions.Another element of particular interest concerns the connections between the areasof the pitch: also in this case there is high heterogeneity, with an average valueranging from 0.11 - observable in the relationship between zone 3 (left defense) andzone 7 (right attack), see Figure 1 - to 25.59, observable in the relationship betweenzone 4 (left midfield) and zone 5 (central midfield). In particular, the two edgeslinking nodes (3,7) and (1,9) - left defense/right attack and right defense/left attackrespectively - present a highly asymmetrical distribution of weights: in additionto the inflation of zero (91% and 89% respectively), they are characterised by theminimum variance (0.12 and 0.15 respectively) and the lowest maximum value, i.e.three. This evidence is not completely surprising as the nodes involved refer to areasdiametrically opposed on the pitch: the almost complete absence of connectionssuggests that the ones which were detected are accidental and fully assimilable to thelong passes which ended up off the pitch (and so were not counted). As a consequence,in the application of the clustering method all the weights of the abovementionededges are set to zero in order not to modify the community structures on the basisof fortuitous events.The graphical-descriptive analysis can be also informative with regard to aquestion of vital importance concerning the utility of the proposed methodology: theapproach described in Section 3 requires a preprocessing phase aimed at managingthe characteristics of the sample which, if it is of little use, represents a meaninglesscomplication compared to a simpler comparison between the community structures ofthe starting networks. Let’s consider two contrasting teams as regards the quality ofball possession: Napoli and Frosinone. The former showed incredibly dense possessionas evidenced by the maximum value for the median, third quartile and maximumconcerning the average number of connections between nodes during a match. Onthe other hand the latter, who were dramatically relegated to Serie B ConTe.it at13

Edge % Napoli % Frosinone ↑ Edge % Napoli % Frosinone(4,1) - 0.404 (5,4) 14.78 7.48(9,6) 0.410 0.373 (6,5) 10.77 6.81(9,3) 0.207 0.446 (9,4) 9.53 7.05(6,3) 0.207 0.287 (7,6) 6.31 7.28(9,2) 0.191 0.454 (4,3) 6.13 6.79(8,2) 0.122 - (3,2) 5.15 6.96(7,2) 0.098 0.455 (6,1) 4.79 5.97(7,1) 0.082 0.365 (9,8) 4.65 -(8,3) 0.074 0.201 (5,2) 4.37 6.97(9,1) 0.051 0.157 (2,1) 3.97 5.85(8,1) 0.039 0.267 (8,7) 3.83 4.47(7,3) 0.036 0.160 (6,2) - 4.63Total 1.52 3.57 Total 74.28 70.26

Table 1.

Labels and relative weights of the edges with the lowest and highestrelative weight for Napoli and Frosinone.the end of the season, showed sporadic possession as evidenced by the minimumvalue for the first quartile, mean, median and third quartile concerning the samequantity. For each of the 38 matches played by each of the two teams, it is possibleto calculate the ratio of a specific edge’s weight to the overall weight of the networkin order to find differences between the two teams without considering the overallnumber of connections. By comparing the average values of these quantities betweenthe two teams, it is noticeable that 10 of the 11 edges with the lowest relative weight- accounting for 1.52% and 3.57% respectively of Napoli’s and Frosinone’s overallweight - are common to the two teams, as are 10 of the 11 connections with thehighest relative weight - accounting for 74.28% and 70.26% respectively -. Table 1summarises this evidence. In view of the above, there is a clear consideration to bemade: the directions in which the playing styles are mainly developed are sharedby the teams due to the football-specific dynamics, i.e. in statistical terms due tothe characteristics of the population studied. As a consequence, a clustering methodconsidering the broader framework in which the single network is placed is stronglyrecommended.

Appendix B

The following brief simulation study is aimed at evaluating the suitability of theproposed clustering method. In particular, the impact of the value chosen for α onthe reliability of the methodology is analyzed. Moreover, the results are comparedto those obtained by an alternative approach. Scenarios

Let R = { R , . . . , R n } be a sample of n undirected weighted networks, with K = { k , k , . . . , k K } the set of nodes shared between the networks. In addition, letthe sample be partitioned into two groups, with n and n the respective sizes. Inall the scenarios, we set n = 100 , K = 30 , n = n = 50. For the sake of simplicity,the nodes are labelled with numbers from 1 to 30 and with an even/odd node werefer to a node labelled with an even/odd number, respectively.The study is structured into two scenarios, subdivided in their turn into two14 igure 5. Scenario 1. A network belonging to the first group (at the top) and anetwork belonging to the second group (at the bottom) are represented: the dashedlines represent connections generated independently from X , the continuous linesrepresent connections generated independently from Y .subscenarios of differing complexity. A graphical representation of the first scenariois provided in Figure 5, of the second one in Figure 6.In the first scenario, the networks belonging to the first group are characterised byconnections between nodes generated independently from a discrete random variable X , whereas those belonging to the second group are characterised by connectionsbetween nodes generated independently from a discrete random variable Y . Thus: • First subscenario : D X = { , , , } the support of X , D Y = { , , , } the support of Y and π X = π Y = { . , . , . , . } the probability vector of X and Y . • Second subscenario : D X = { , , , , } the support of X , D Y = { , , , , } the support of Y and π X = π Y = { . , . , . , . , . } the probability vectorof X and Y .The community structures are completely random for all the units, but thenetworks of the first group differ from those of the second group with regard to theiroverall weight. The second subscenario displays a higher complexity than the firstone since the difference in the overall weight between the two groups drops. Thepurpose of this scenario is to replicate one of the features shown in Appendix A,namely that the average weight of the different units varies according to the teamconsidered. 15n the second scenario: • the weight of each edge linking two even nodes is generated independentlyfrom the discrete random variable X in the first group and from the discreterandom variable Y in the second group. • the weight of each edge linking two odd nodes is generated independently fromthe discrete random variable X in the first group and from the discrete randomvariable Y in the second group. • the weight of each edge linking an even node with one of the first eight odd nodes { } is generated independently from X in the first groupand from Y in the second group. • the weight of each edge linking an even node with one of the remaining odd nodes { } is generated independently from X in the first groupand from Y in the second group.For the sake of simplicity, let us define E X ( E X , E Y , E Y respectively) the set ofedges whose weights are generated from X ( X , Y , Y respectively). Thus: • First subscenario : D X = { , , } , D X = { , , } , D Y = { , , } , D Y = { , , } the supports of X , X , Y , Y ; π X = { . , . , . } ,π X = { . , . , . } , π Y = { . , . , . } , π Y = { . , . , . } the probabil-ity vectors of X , X , Y , Y . • Second subscenario : D X = { , , , , , } , D X = { , , , , , } , D Y = { , , , , , } , D Y = { , , , , , } the supports of X , X , Y , Y ; π X = π Y = { . , . , . , . , . , . } , π X = π Y = { . , . , . , . , . , . } the probability vectors of X , X , Y , Y .The second subscenario displays a higher complexity than the first one since thedifference in the weights of the edges between the two groups drops; the purpose ofthis scenario is to replicate another one of the features shown in Appendix A, namelythe one where the teams share the directions in which the playing styles are mainlydeveloped but where some different trends between the teams can be detected in thedensely and sparsely connected areas of the pitch .Every scenario is developed for three values of α : 0.3, 0.7, 0.95. The methodologyis compared to the naive approach introduced in Section 3.1: it consists of consideringeach starting network on its own, detecting its community structure and merging,step by step, the networks whose partitions are most similar. Even in this case everyself-loop is set to zero for the reason described in Section 3.3. The suitability of themethods is examined by comparing the partition obtained requiring N G = 2 withthe real subdivision of the sample; to do so we use the ARI (Hubert and Arabie, 1985). The simulations are achieved by using programming language R (R CoreTeam , 2016), the detection of community structures by igraph package (Csardi andNepusz , 2006) and the computation of ARI by pdfCluster package (Azzalini andMenardi , 2014). Every scenario is evaluated using N = 1000 replications. Results

Figure 7 displays the box-plots of ARI concerning the 1000 replications foreach scenario. The first scenario shows an improvement in the method when thehighest value for α is considered: the choice of α = 0 .

95 allows only the particularlysurprising connections for the population to be included and therefore enables thedifferences between the two groups to be detected. The low values of the agreementwith the real partition are easily explained since the arbitrary nature of communitystructures involves networks belonging to the same group having, by construction, low16 igure 6.

Scenario 2. A network belonging to the first group (at the top) anda network belonging to the second group (at the bottom) are represented: thecontinuous red lines represent connections generated independently from X , thedashed black lines represent connections generated independently from X , the dashedred lines represent connections generated independently from Y , the continuousblack lines represent connections generated independently from Y .17 llllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Scenario 1 − Subscenario 1 Scenario 1 − Subscenario 2Naive a =0.3 a =0.7 a =0.95 Naive a =0.3 a =0.7 a =0.950.000.250.500.751.00 Method A R I llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Scenario 2 − Subscenario 1 Scenario 2 − Subscenario 2Naive a =0.3 a =0.7 a =0.95 Naive a =0.3 a =0.7 a =0.950.000.250.500.751.00 Method A R I Figure 7.

Results of the simulation study.18imilarity. A decidedly positive aspect is noticed when moving through subscenarios:the performance of the procedure remains almost unchanged despite the fact thatthe groups are less dissimilar.The second scenario shows a complete failure in the naive approach: the factthat E X = E Y and E X = E Y between groups involves the community structuresbetween the two subpopulations being identical, and so the normalization of theweights seems to be a fundamental adjustment to improve the quality of the partitions.The remarks concerning the role of α vary according to the subscenario considered:in the first one the choices of α = 0 .

30 and α = 0 .

95 allow the differences between thetwo groups to be managed, whereas intermediate value α = 0 .

70 produces decidedlyunsatisfactory results. On the other hand, in the second one the differences betweengroups are so minimal that only a value as high as α = 0 .

95 achieves the goal.Overall, the methodology provides a reasonable compromise in the handling ofthe different aspects characterising the clustering of networks, but the impact of thevalue of α on the results represents an undeniable limit in an unsupervised context. Appendix C

The objective criterion used to identify the main characteristics of a specific groupconsists of the evaluation of the percentage of times that, considering the networksof that cluster, pairs and triads of nodes are allocated to the same community: ahigh value for this quantity means that the group is made up of playing styles withfrequent connections between those pairs and triads of areas. A brief descriptionof the groups is provided below (for all the two-dimensional and three-dimensionalarrays reporting the described percentages, contact J.Diquigiovanni). For the sake ofclarity, the clusters are divided into 6 categories identifying the main macro typologiesof playing styles. Figure 8 shows the number of networks allocated to each group foreach team.

Sparse playing style • Group 1 , made up of 347 networks. The group includes almost half of theunits, which are characterised by the absence of any significant connectionbetween the areas of the pitch; the size of the cluster is not entirely surprisingas the value chosen for α is particularly high. It is mainly made up of lowranking teams, which are allocated to this group 67% of the time; on the otherhand, Napoli (four out of 38 networks), Roma (five out of 38 networks) andFiorentina (five out of 38 networks) are the least involved teams. Long ball playing style • Group 2 , made up of 41 networks. This playing style is characterised by longpasses between the right defense zone and the central attack zone (1-8). Despitethe fact that the group is mainly made up of middle-low and low ranking teams(66%), there is no lack of high ranking teams (20%). • Group 3 , made up of 24 networks. This playing style is characterised by longpasses involving the central defense zone (2-8, 2-9). The low ranking teams arethe most numerous in the group (38%). • Group 4 , made up of 8 networks. This playing style is characterised by longpasses between the central defense zone and the right attack zone (2-7): allthe networks of the group share this community. The use of this tactic isdistributed homogeneously between the teams.19

VeronaFrosinoneCarpiUdinesePalermoSampdoriaBolognaAtalantaTorinoEmpoliGenoaChievoLazioMilanSassuoloFiorentinaInterRomaNapoliJuventus 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Groups T ea m s value Figure 8.

Number of networks allocated, for each team, to each group.20 ense playing style • Group 5 , made up of 38 networks. This playing style is characterised by ballpossession in the defense zone (1-3, 1-2-3). The high and middle-high rankingteams are the most numerous in the group (71%). • Group 6 , made up of 46 networks. This playing style is characterised byconnections between the central defense zone and the lateral midfield zones(2-4, 2-6, 2-4-6). The low ranking teams are the least numerous in the group(11%). • Group 7 , made up of 21 networks. This playing style is characterised byconnections between the central defense zone and the central/right midfieldzones (2-5, 5-6, 2-5-6). The use of this tactic is distributed homogeneouslybetween the teams. • Group 8 , made up of 34 networks. This playing style is characterised by ballpossession in the offensive zones (4-7, 8-9). The high and middle-high rankingteams are the most numerous in the group (68%). • Group 9 , made up of 50 networks. This playing style is characterised by ballpossession in the offensive zones (4-9,5-7,6-7,4-8-9,5-6-7). The high rankingteams are the most numerous in the group (64%).

Mixed playing style • Group 10 , made up of 28 networks. This playing style is characterised by longpasses between the left defense zone and the left attack zone (3-9) and byconnections between the left defense zone and the central midfield zone (3-5).The high and middle-high ranking teams are the most numerous in the group(68%). • Group 11 , made up of 23 networks. This playing style is characterised by longpasses between the left defense zone and the central attack zone (3-8) and byconnections between the right defense zone and the central midfield zone (1-5).The use of this tactic is distributed homogeneously between the teams.

Rapid attack playing style • Group 12 , made up of 13 networks. This playing style is characterised byconnections between the right midfield zone and the central attack zone (6-8).The low ranking teams are the most numerous in the group (46%). • Group 13 , made up of 16 networks. This playing style is characterised byconnections between the central midfield zone and the central attack zone (5-8).The low ranking teams are not present in this group. • Group 14 , made up of 19 networks. This playing style is characterised byconnections between the left midfield zone and the central attack zone (4-8)and between the lateral zones of the attack (7-9). The high and middle-highranking teams are the most numerous in this group (74%).

On-the-wings playing style • Group 15 , made up of 52 networks. This playing style is characterised byconnections between the lateral zones of the defense and the adjacent lateralzones of the midfield (1-6,3-4): 87% of the networks present at least one ofthe two communities. The high ranking teams are the most numerous (50%),with Fiorentina (nine networks) and Napoli (six networks) the most involvedteams. In view of the importance of this cluster (see Section 4.2), a graphicalrepresentation is provided in Figure 9.In view of the above, it is now possible to extrapolate some information about21 igure 9.

Graphical representation of the on-the-wings playing style.either the league or the teams. For example, Juventus were on an incredible winningstreak (26 out of 28 matches) after a disastrous start of the season characterisedby only 12 points in the first ten matches. Consequently, it is certainly surprisingthat only two of the ten matches in which Juventus displayed a sparse playing style(Group 1) refer to that critical starting period. This evidence, in spite of its simplicity,seems to confirm a clich´e on Italian football: the team which plays better does notalways win.

Acknowledgements

This work was partially funded by grant CPDA154381/15 from the University ofPadova, Italy. The authors are grateful to Lazar Petrov, Italian lead of InStat, andLorenzo Favaro, CEO of SportAnalisi, who provided data. We also thank DanieleDurante and Nicola Sartori for fundamental comments and advices, and DavidDandolo for providing the code on the Dixon and Coles model (Dandolo , 2017).

References

Anderson, C. and Sally, D. (2013).

The numbers game: Why everything you knowabout soccer is wrong . Penguin.Azzalini, A. and Menardi, G. (2014). Clustering via nonparametric density estimation:The R package pdfCluster.

Journal of Statistical Software , (11), 1–26. URL .Azzalini, A. and Scarpa, B. (2012). Data analysis and data mining: An introduction .Oxford University Press USA.Baio, G. and Blangiardo, M. (2010). Bayesian hierarchical model for the predictionof football results.

Journal of Applied Statistics , (2), 253–264.22londel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E. (2008). Fastunfolding of communities in large networks. Journal of statistical mechanics:theory and experiment , (10), P10008.Borrie, A., Gudberg K. J., and Magnus S. M. (2002). Temporal pattern analysis andits applicability in sport: an explanation and exemplar data. Journal of sportssciences .10: 845-852.Brandes, U., Delling, D., Gaertler, M., G¨orke, R., Hoefer, M., Nikoloski, Z., andWagner, D. (2006). Maximizing modularity is hard. arXiv preprint physics/0608255 .Brandt, M. and Brefeld, U. (2015). Graph-based approaches for analyzing teaminteraction on the example of soccer. In Proceedings of the ECML/PKDD Workshopon Machine Learning and Data Mining for Sports Analytics .Cintia, P., Giannotti, F., Pappalardo, L., Pedreschi, D., and Malvaldi, M. (2015a).The harsh rule of the goals: Data-driven performance indicators for football teams.In , pages 1–10. IEEE.Cintia, P., Rinzivillo, S., and Pappalardo, L. (2015b). A network-based approach toevaluate the performance of football teams. In

Proceedings of the ECML/PKDDWorkshop on Machine Learning and Data Mining for Sports Analytics .Clemente, F. M., Couceiro, M. S., Martins, F. M. L., and Mendes, R. S. (2015a).Using network metrics in soccer: a macro-analysis.

Journal of human kinetics , (1), 123–134.Clemente, F. M., Martins, F. M. L., Kalamaras, D., Oliveira, J., Oliveira, P., andMendes, R. S. (2015b). The social network analysis of switzerland football teamon fifa world cup 2014. Journal of Physical Education and Sport , (1), 136.Csardi, G. and Nepusz, T. (2006). The igraph software package for complex networkresearch. InterJournal , Complex Systems , 1695. URL http://igraph.org .Dandolo, D. (2017). Modellazione statistica di risultati calcistici.

Bachelor’s thesis .Dixon, M. J. and Coles, S. G. (1997). Modelling association football scores andinefficiencies in the football betting market.

Journal of the Royal Statistical Society:Series C (Applied Statistics) , (2), 265–280.Durante, D., Dunson, D. B., and Vogelstein, J. T. (2017). Nonparametric bayes mod-eling of populations of networks. Journal of the American Statistical Association ,pages 1–15.Geys, H., Molenberghs, G., and Ryan, L. M. (1999). Pseudolikelihood modelingof multivariate outcomes in developmental toxicology.

Journal of the AmericanStatistical Association , (447), 734–745.Godambe, V. P. (1960). An optimum property of regular maximum likelihood estimation. The Annals of Mathematical Statistics , (4), 1208–1211.23yarmati, L., Kwak, H., and Rodriguez, P. (2014). Searching for a unique style insoccer. arXiv preprint arXiv:1409.0308 .Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of StatisticalLearning: Data Mining, Inference, and Prediction (Second Edition) . SpringerVerlag, New York.Hubert, L. and Arabie, P. (1985). Comparing partitions.

Journal of classification , (1), 193–218.Hughes, M. and Franks, I. (2005). Analysis of passing sequences, shots and goals insoccer. Journal of sports sciences , (5), 509–514.Kaufman, L. and Rousseeuw, P. J. (1990). Finding groups in data: an introductionto cluster analysis , volume 344. John Wiley & Sons.Koopman, S. J. and Lit, R. (2015). A dynamic bivariate poisson model for analysingand forecasting match results in the english premier league.

Journal of the RoyalStatistical Society: Series A (Statistics in Society) , (1), 167–186.Maher, M. J. (1982). Modelling association football scores. Statistica Neerlandica , (3), 109–118.Molenberghs, G. and Verbeke, G. (2005). Pseudo-likelihood. In Molenberghs, G.and Verbeke, G., editors, Models for discrete longitudinal data , pages 189–202.Springer.Narizuka, T., Ken Y., and Yoshihiro Y. (2014). Statistical properties of position-dependent ball-passing networks in football games.

Physica A: Statistical Mechanicsand Its Applications : 157-168.Newman, M. (2010a). Mathematics of networks. In Newman, M., editor,

Networks:an introduction , pages 109–167. Oxford university press.Newman, M. (2010b). Matrix algorithms and graph partitioning. In Newman, M.,editor,

Networks: an introduction , pages 345–394. Oxford university press.Pe˜na, J. L. (2014). A markovian model for association football possession and itsoutcomes. arXiv preprint arXiv:1403.7993 .Pe˜na, J. L. and Navarro, R. S. (2015). Who can replace xavi? a passing motif analysisof football players. arXiv preprint arXiv:1506.07768 .Pe˜na, J. L. and Touchette, H. (2012). A network theory analysis of football strategies. arXiv preprint arXiv:1206.6904 .Pina, T. J., Paulo, A., and Ara´ujo, D. (2017). Network characteristics of successfulperformance in association football. a study on the uefa champions league.

Frontiersin psychology , , 1173.R Core Team (2016). R: A Language and Environment for Statistical Computing .R Foundation for Statistical Computing, Vienna, Austria. URL . 24and, W. M. (1971). Objective criteria for the evaluation of clustering methods.