[PDF] Optimising Long-Term Outcomes using Real-World Fluent Objectives: An Application to Football

Abstract

In this paper, we present a novel approach for optimising long-term tactical and strategic decision-making in football (soccer) by encapsulating events in a league environment across a given time frame. We model the teams' objectives for a season and track how these evolve as games unfold to give a fluent objective that can aid in decision-making games. We develop Markov chain Monte Carlo and deep learning-based algorithms that make use of the fluent objectives in order to learn from prior games and other games in the environment and increase the teams' long-term performance. Simulations of our approach using real-world datasets from 760 matches shows that by using optimised tactics with our fluent objective and prior games, we can on average increase teams mean expected finishing distribution in the league by up to 35.6%.

Full PDF

OOptimising Long-Term Outcomes using Real-WorldFluent Objectives: An Application to Football

Ryan Beal

University of Southampton, [email protected]

Georgios Chalkiadakis [email protected] University of Crete, Greece

Timothy J. Norman [email protected] of Southampton, UK

Sarvapali D. Ramchurn [email protected] of Southampton, UK

ABSTRACT

In this paper, we present a novel approach for optimising long-term tactical and strategic decision-making in football (soccer) byencapsulating events in a league environment across a given timeframe. We model the teams’ objectives for a season and track howthese evolve as games unfold to give a fluent objective that can aidin decision-making games. We develop Markov chain Monte Carloand deep learning-based algorithms that make use of the fluentobjectives in order to learn from prior games and other games inthe environment and increase the teams’ long-term performance.Simulations of our approach using real-world datasets from 760matches shows that by using optimised tactics with our fluentobjective and prior games, we can on average increase teams meanexpected finishing distribution in the league by up to 35.6%.

ACM Reference Format:

Ryan Beal, Georgios Chalkiadakis, Timothy J. Norman, and Sarvapali D.Ramchurn. 2021. Optimising Long-Term Outcomes using Real-World FluentObjectives: An Application to Football. In

Proc. of the 20th InternationalConference on Autonomous Agents and Multiagent Systems (AAMAS 2021),Online, May 3–7, 2021 , IFAAMAS, 9 pages.

There are many examples in the real-world of agents or teams ofagents aiming to optimise their performance over long periods oftime. These often involve a series of multi-step games that feedinto one another as well as other factors in the wider environment.Examples of this includes security games where agents aim toconstantly protect facilities against attackers that are able to changetheir tactics and decisions [14, 18, 23], as well as in the stock marketwhere agents aim to continually make optimal decisions to makeprofits in fluid real-world environments [1, 13, 16].In this paper, we focus on the long term optimisation of decision-making in team sports. Specifically in games of Association Football(soccer). Although the models could be applied in a number ofdomains, football presents us with an interesting challenge wherea team of human agents compete against other teams of agentsacross long periods and the success of teams is not only judged inindividual games but how they perform over a season in a leagueformat (supported with many years of real-world datasets). This Referred to as just “football” throughout this paper.

Proc. of the 20th International Conference on Autonomous Agents and Multiagent Systems(AAMAS 2021), U. Endriss, A. Nowé, F. Dignum, A. Lomuscio (eds.), May 3–7, 2021, Online means that there are a set of teams whom each season play ev-ery other team twice, both home and away. Teams are awardedpoints based on winning, losing or drawing and at the end of theseason teams are awarded prize money and other incentives basedon their points gained in comparison to all other teams in a leaguerankings/standings. Past work in this area has focused on op-timising performance in individual games [3] or for extractingcontribution of individual players [4, 7, 9]. However, to date, thereis no formal model for optimising team performance and tacticaldecision-making over a longer period of time.Against this background, we propose a formal model for opti-mising the long-term performance of football teams and how theycan extract knowledge from other games in the league environ-ment. We introduce the novel notion of a fluent objective whichis a sequence of “objective variables", each one corresponding toa particular point in the agent’s planning horizon (i.e., a game inthe game season). We should also clarify that these variables cantake the form of a broader goal (e.g., win the league or do not getrelegated). We use Markov chain Monte Carlo simulations to helplook ahead into the future and allow us to set realistic achievableobjectives which add more context to our tactical decision-makingin individual games. We also take inspiration from observationallearning [2, 6, 11] to help teams extract information from othergames that happen in the environment and past games they haveplayed themselves. This is used to identify tactical decisions thatboost the chances of gaining positive results against given opposi-tions. As the season progresses, teams learn more as more gamesunfold — we encapsulate this into our modelling. Thus, this paperadvances the state of the art in the following ways:(1) We propose a mathematical model for optimising the long-term performance of human teams and apply this to thegame of football.(2) Using real-world data from 760 real-world football gamesfrom the past two seasons of the English Premier League(EPL), we can set the fluent objective based on accurateleague simulations and further improve individual gamepayoffs by using knowledge from prior games. In particular,we show that we can increase teams finishing position onaverage by up to 2.9 ranks (out of 20).(3) By using a fluent objective and prior game knowledge we areable to show an increased probability of improved long-termperformance in real-world football teams (by up to 35.6%). a r X i v : . [ c s . A I] F e b ur results show that by looking ahead and thinking about long-term goals, teams can add more context to the tactical decisionsthat are made for individual games and thus are more likely toachieve the long-term objectives that they have set.The rest of this paper is structured as follows, in Section 2 weprovide a background and in Section 3 we discuss how we modellong term performance. In Section 4 and 5 we discuss how we cal-culate the fluent objective and learn from prior games respectively.We run simulation experiments on our models in Section 6 anddiscuss these in Section 7. Finally, Section 8 concludes. In this section, we review related literature showing other examplesof modelling real-world problems. We also give an overview ofwhy long-term football tactics are important, what is involved anddiscuss how this is approached for individual games in [3].

Here, we explore the related work to how we can model long-termflowing games such as a sports league as well and giving somebackground into sports tactics literature.

As far as we are aware, the notionand modelling of fluent objectives in this paper, which allows us tooptimise long-term performance, is entirely novel. However, it wasinspired by work presented in situations and fluents in first-orderlogic and situation calculus [15]. We see this approach being used tocreate a model for environmental context in [21]. The authors modelenables context awareness to help build context-aware applications.Similarly in our model, we aim to gain context of the other teamsin the environments to help make decisions based on the futureleague standings. There are also agents reacting to situations intheir environment in [25], where agents react to the ever-changingvariables in the stock market.In our work, we also aim to learn from prior games and othergames that happen in the environment to gain a better understand-ing into what tactics work against given opponents. This is closelyrelated to the work presented in [6], where the authors explorethe notion of “observation learning" which is is a type of learn-ing that occurs as a function of observing, retaining and imitatingthe behaviour of another agent. This is applicable to football as ifwe observe another team perform well against another opponentthen we may want to imitate their tactics to help us to win. Otherexamples of this type of work are shown in [19, 22, 24].

In the sports domain, there are examplesof work focused on team tactics and decision-making in footballand other team sports [5]. In terms of long-term decision-makingthough the key example of agents being used to optimise this insport is shown in [17] which presents a successful model for com-peting in fantasy football games. Here, the authors use machinelearning to predict the performance of individual players and thenuse deep-reinforcement learning to optimise decisions on a week-by-week basis and look ahead to maximise their chances of success.By doing so, they rank in the top 1% of human players. In our work, https://fantasy.premierleague.com/help/rules. we can take inspiration from this in the real-world and help humancoaches and managers make decisions on human footballers.We also see examples of tactical papers for sport in [12] explor-ing different risk strategies for play-calling in American Football.As well as some key football papers to help improve human perfor-mance and identify high-performing players and pairs of playersare shown in [4, 7, 9].To provide more intuitions around long-term decision-making,in the next subsection we give a background to football tactics andtheir importance to the game as well as the league structure. In football, individual games are incredibly important, but what isoften overlooked tactically is the impact that each game has over alonger period of time and on the overall league standings. The finalleague standings is the final position of all teams in a league basedon the points they have gained over an 𝑁 game season. In a standardfootball league (e.g., English Premier League or German Bundesliga),across a season each team plays each other twice (once home andonce away) a win is worth 3 points, a draw 1 point and a loss nopoints. There are huge intrinsic and financial gains to be madeby finishing higher up the table and there are certain milestonesthat teams aim for to boost their success such as qualification forEuropean competitions. The season is often broken down into given “game-weeks" whereall teams play a game within the week. We can therefore breakdownthe season into these game-weeks as incremental steps in a game. Ineach week our team plays a game and a number of other games alsotake place. We therefore, want to maximise our own performancein our game and learn from other games for the future when weplay those teams (see Figure 1).Therefore, in this paper we aim to model teams tactical decisionsbased on the overall league environment and use fluent objectives toadd context to our decisions and prior games knowledge to imitateother successful teams. In the next section, we discuss the modelthat this paper builds on for optimising tactical decision-making inindividual games.

The modelling presented in this paper extends on the formal modelfor football that is presented in [3] for optimising the tactics in anindividual game. In [3] the authors use a multi-step game to repre-sent the pre-match tactical decisions that are made using a Bayesiangame (representing the unknowns of opposition decisions), thisthen feeds into the in-match decisions made which is modelled asa stochastic game (representing the score-line states in a game).Using these models teams are able to optimise their tactics by upto 16.1%.In this paper, we extend that model by adding context of thewider environment of the league. By using our fluent objective andprior game weightings we can further optimise these tactics to noonly improve the chances of a positive result in the individual gamebut improve the long-term performance of the team in the leaguestandings. http://eightyfivepoints.blogspot.com/2018/03/show-me-money-how-much-is-each-premier.html. MODELLING LONG TERM TEAMPERFORMANCE

In this section, we discuss how we model the long-term performanceof football teams over a season and identify how we can use fluentobjectives and learn from games to optimise long-term performanceof a team. At the start of a given season or competition, a teamwill have some aim of how well they want to do and what theywant to achieve. In a knockout style cup competition such as theFIFA World Cup or English FA Cup, every team is aiming to winevery game as this is the only way to win overall; there are noprizes for second place. Across a full season, however, there are anumber of objectives that a team can have that will help maximisetheir financial gains and reputation of the team. For example, asdiscussed in Section 2.2, in the English Premier League there isalways only one winner but there are also benefits to finishing inthe top 4, top 7 and avoiding finishing in the bottom 3. We therefore,model an entire season in football that could be applied to helpoptimise teams’ long-term performance in any league across theworld and at any level.

In Figure 1 we show the structure of our model for an entire seasonin football. This style of model could also be applied in securitygames or for emergency response where we aim to optimise theperformance of teams of agents in evolving environments withever-changing objectives [20, 23].We build on the multi-step (Bayesian into stochastic) games foroptimising single game tactics to help teams achieve their objectivesin an 𝑁 game season. There is a sequence of steps that we highlightand show how each one feeds into the next. We also show how ateams’ pre-season objective can be fed into the first game whichin-turn can use this to aid the tactical decision-making processas well as the parameters we learn while playing each game (e.g.,certain tactics that work well against certain teams).Both the pre-match Bayesian game and the in-match stochasticgame can use the objective to help set the risk parameters and selectthe tactics that will best help the team in the overall environmentof the league. This objective then changes as the season progressesand teams aim for different levels of achievement, therefore makingthis a fluent objective ; e.g., a team may have had high hopes at thestart of the season of winning the league, but if they have a poorstart they may have to update their objective to ensure they finish inthe top 4. As we show in Figure 1, the pre-season objective is set as 𝑂 , this then changes each game-week as the environment aroundthe team develops, changing to 𝑂 after game-week 1, 𝑂 aftergame-week 2 and so on until the final in-season objective the weekbefore the final game of the season 𝑁 −

1. The final fluent objective, 𝑂 𝑁 , corresponds to the overall end of season outcome ( 𝑆 𝑂 ), whichwe can compare to the fluent objective at each game-week to assessthe team performance across the season. As discussed in Section3.2, the 𝑂 𝑥 and 𝑆 𝑂 variables might not have distinct values (i.e.,maybe 𝑂 = 𝑂 and so on).We also consider how we can learn from the games that areplayed as the season progresses. As we play each game we learnsomething new, both about what works for our own team and whatworks against a given opposition. We therefore learn parameters from each game that we can carry forward through each game-week and similarly to the fluent objective we update each week. Forexample, we may find that when our team uses a given formationagainst a certain style of opponent we see better results. As we showin Figure 1, this is encapsulated by a prior knowledge parameter 𝑃 , which is updated after each game we play where 𝑃 is aftergame-week 1, 𝑃 after game-week 2 and so on until the penultimategame-week of the season 𝑂 𝑁 − . We explain the precise form of the 𝑃 parameter in Section 3.3 below.Finally, we must consider the other games that are happeningeach week in the league environment, G 𝑁 is the set of other gamesin game-week 𝑁 and G = { 𝐺 , 𝐺 , ..., 𝐺 𝑧 } where 𝑧 is the number ofother games played in that week. Within each game-week, all otherteams also play one another, so that at the end of the season, eachteam has played every other team twice (once at home and onceaway). For example, in the EPL there are 20 teams in the league,each team plays the other 19 teams’ twice which is 38 games. Inthe EPL there are a total of 380 games, and so there are 342 that donot involve the team that we are focused on for our optimisation.These games are observable so we can learn from each one, whichin turn affects our fluent objective 𝑂 and what we learn after eachgame-week 𝑃 . As discussed in Section 2.2, the outcomes of the othergames affect the league table with teams gaining 3 points for a winand 1 point for a draw. We therefore must consider the other teams’performances when setting 𝑂 . We can also observe other gamestactically to learn what styles and formations work best againstgiven teams, this is how we can learn 𝑃 from prior games.In the following subsections, we go into more detail regardinghow we model the fluent objective 𝑂 and how we can learn fromprior games 𝑃 . At the start of each season, a team will have some objective forwhat they are looking to achieve in the next season. These goalsare decided based on several factors such as previous season per-formance and money invested into the team. The goals are usuallyset by the owners/directors of the team and are based on their sub-jective opinions of how their team should perform and where theyshould place in the league against the other teams. The opinions ofwhat the team should achieve then changes over the season whichcan drive key decisions such as a change in coach/manager foran under-performing team or investing more money into an over-performing team so they achieve a European place which comeswith huge financial gains. In other settings, these type of objectivescould be the defence of a given target or the rescue of a person.Our model for the fluent objective can objectively evaluate howwe expect a team to perform over a season and allow teams tochange their tactical decision-making based on this. There twodifferent objectives that can be set: a more granular objective ofthe expected league position and an objective of what could beachieved in terms of broader incentives in the league (e.g., avoidingrelegation or qualifying for European competitions). In this paper,we focused on the latter and can define the set of possible objectivesas O = { 𝑜 , 𝑜 , ..., 𝑜 𝑘 } where 𝑘 is the number of different objectives.An example of the set of objectives — more accurately, the set of ameweek 1Game 1 Pre-MatchBayesian GameIn-Match StochasticGameG G G z ... Gameweek 2Game 2

Pre-MatchBayesian GameIn-Match StochasticGameG G G z ... O P O P ... O N-1 P N-1

Gameweek NGame N

Pre-MatchBayesian GameIn-Match StochasticGameG G G z ... Season (N Games) O Pre-SeasonObjective S o Post-SeasonOutcome

Figure 1: Sequence of Multi-Games Across a Season values that an 𝑂 𝑥 objective variable can take — in the EPL wouldbe: • Winning the League ( 𝑜 ): Awarded to the team who fin-ishes top of the league. • Qualifying for the Champions League ( 𝑜 ): Awarded tothe top 4 teams, so in this case the objective relates to teamsfinishing 2nd-4th. • Qualifying for the Europa League ( 𝑜 ): Another Euro-pean competition usually awarded to teams who finish be-tween 5th-7th. • Top Half Finish ( 𝑜 ): The financial benefit of finishinghigher in the league are huge and therefore teams oftenaim to finish in the top half of the table (higher than 10th). • Avoiding Relegation ( 𝑜 ): The bottom 3 (18th-20th) teamsin the EPL are relegated into the English Football League(EFL) Championship which is the second division of theEnglish football leagues.To set the objective we can simulate how we expect the sea-son to unfold and create a distribution D that allows us to use aMaximum a Posteriori (MAP) estimation [10] for the probabilityof the team finishing in each position. This then allows us to cal-culate a set of probabilities for of a team achieving each objective P = { 𝑝 ( 𝑜 ) , 𝑝 ( 𝑜 ) , ..., 𝑝 ( 𝑜 𝑘 )} . We then set the 𝑂 𝑜 (for a pre-seasonobjective) as the most likely objective that can be achieved by ateam that season.This process can then be re-run after each game-week is com-pleted to give the fluent objective 𝑂 to 𝑂 𝑁 − . Our simulation ofthe league will include the real-results which will get more accurateas the season progresses and we learn more about each team. Thiswill then mean we have a fluent objective that will change as the season progresses. At the end of the season, we can compare 𝑂 to 𝑂 𝑁 − to the final outcome 𝑆 𝑂 that the team achieves. As well as the fluent objective, we can also improve the tacticaldecision-making in our Bayesian and stochastic games by addingprior knowledge 𝑃 that we learn after each game we play andobserve. In more general terms we aim to observe and learn fromother successful agents and our own actions. This could also beapplicable in swarms of UAVs or imitating other agents trading inthe financial markets settings.We can learn a set of weights W that relate to how effectivegiven style/formation pairs (actions that are made in the multi-stepgames) that we select in our games are against given oppositionsstyle/formation pairs. These weights are initially set to 1 and arethen increased if found to be effective and decrease if found tobe ineffective. These can be updated after each game-week andalso updated from the other games that we observe. Our 𝑃 value isdefined in Equation 1. 𝑃 = (cid:169)(cid:173)(cid:173)(cid:173)(cid:173)(cid:171) 𝑤 𝑤 𝑤 . . . 𝑤 𝑗 𝑤 𝑤 𝑤 . . . 𝑤 𝑗 ... ... ... . . . ...𝑤 𝑖 𝑤 𝑖 𝑤 𝑖 . . . 𝑤 𝑖 𝑗 (cid:170)(cid:174)(cid:174)(cid:174)(cid:174)(cid:172) (1)Where 𝑤 ∈ W and 𝑖 / 𝑗 is the number of possible style/formationpairs. The columns represent the style/formation pair selected byour team and the rows represent the style/formation selected bythe opposition (e.g., 𝑤 𝑖 𝑗 is how effective our style/formation pair 𝑖 is against an opposition using style/formation pair 𝑗 ).In the following sections, we give more details into how wecalculate our fluent objective 𝑂 and how we can learn the weightsthat make up 𝑃 . We explore how these are used in the individualootball match multi-step game (discussed in Section 2.3) to furtheroptimise the tactical decision-making process. In this section, we discuss how we simulate seasons, calculate thefluent objective, and how this can be used to optimise game tactics.

When we simulate the season outcomes and calculate the distri-butions of where we expect the team to finish we are interestedin predicting all remaining games in the season for both our teamand all other teams in the league. To do this we first look at thesingle-game prediction which is discussed in the next subsection.

To predict the outcomes of singlegames in the league we use the model that is defined in [3] whichis used for calculating the single-game payoffs. The model uses theteam’s tactical style, potential formation and team strength to giveprobabilities of a team winning the game. The set of features usedare: home team style, away team style, home team formation, awayteam formation and then team strengths are calculated by usingthe outputs from the model described in [8]. The target class is thefinal result of the game: home team win, away team win or a draw.Using these features, we train a multi-class classification deepneural network. The neural network is trained using stochasticgradient descent using a categorical cross-entropy loss function(Equation 2) and a soft-max activation function. − 𝑁 𝑁 ∑︁ 𝑖 = log 𝑝 model [ 𝑦 𝑖 ∈ 𝑂 𝑦 𝑖 ] (2)where, 𝑁 is the number of games that we are using to train themodel and 𝑝 model [ 𝑦 𝑖 ∈ 𝑂 𝑦 𝑖 ] is the probability that 𝑦 𝑖 is in the class 𝑂 . This model takes the given teams, possible playing styles andpossible formations to estimate the probability of winning, drawingor losing the game. Using these probabilities we can simulate theoutcome of the entire season, this is discussed in the next sub-section. To simulate the remaining games of theseason, we use the real-world fixture list to ensure that the orderingof the games is correct. We then find the probability of a home win,away win and draw in each game and use a Markov chain MonteCarlo simulation [26] to simulate all remaining games and total upthe points that each team will gain (3 points for a win, 1 for a drawand 0 for a loss). This works well as it emulates the randomness thatwe see in real-world football games. We repeat this process 100,000times for each simulation which allows us to derive a distributionfor the probability that a team will finish in each place in the leaguein the final standings. An example of this distribution is shown inFigure 2.

Once we have calculated the distributions of possible place out-comes form the MCMC simulation, we use a Maximum a Posteriori(MAP) estimation [10] to set the fluent objective. To do this, we canuse the posterior distribution to find interval estimates of the finalposition for the team in the league. We use the position intervals 5 10 15 2001020 Final League Position P r o b a b i l i t y ( % ) Figure 2: Example League Outcome Probability Distribution. for the objectives discussed in Section 3.2 and can find the 𝑜 𝑘 ∈ O that maximises the posterior PDF. This then sets the objective 𝑂 𝑛 that is used in game-week 𝑛 and is updated after each game-week. Once we have set the fluent objective we can now use this whenoptimising the team tactics in the multi-step game for optimis-ing individual game tactics in that game-week. In the pre-matchBayesian game outlined in [3], Beal et al. present 3 options that canbe used depending on the overall environment. Here we presentmodified, novel notions of these options, which now employ thefluent objective. • Best Response:

Used to maximise the chances of winninga game. This option is selected if a team is currently not ontrack to achieve their objective for the season and must wingames to be able to achieve their goals. • Spiteful:

Used to minimise the chances of the oppositionwinning the game (and therefore improve your chances ofdrawing/winning). This option is selected if a team is wellahead of their objective and that by preventing losing thegame they are more likely to stay on track for their objectiveacross the season. • Expectimax:

This is a mixture of the two above and factorsboth into account (mathematically defined in [3] where ref-ered to as “minmax"). This is selected if a team is on trackfor their objective and is aiming to stay that way.In terms of the in-match stochastic game that is also definedin [3] there are two options that can be selected when makingin-match decisions. • Aggressive Approach:

This is set if a team is losing/drawinga game and wants to win. It will maximise the chance of ateam moving to a more positive state. Therefore, if we knowthat the objective is to win and gain three points we willselect this approach. • Reserved Approach:

This is set if a team is winning/drawingand is happy with their current state. It is used to maximisethe chances of staying in the current state. Therefore this isused if winning or if a point is a good result in the overallenvironment in relation to the objective. We rename since the approach does not align with the usual meaning of the term“minimax” or “minmax” in Game Theory. n the next section, we move on to assess how we can learn fromprior games and other games in the environment and how this canbe added to our optimising decisions model.

In this section, we discuss how we can learn from completed priorgames that we play and that other teams in the league play. This al-lows us to find formation/style combinations that work best againsta given formation/style combination that an opposition team mayuse. To do this we learn a matrix of weights 𝑃 that corresponds to es-timated successes of the formation/style combinations. To estimateeach of the weights 𝑤 ∈ 𝑃 we factor in both the games that we haveplayed as well as the games that we have observed. Each weight 𝑤 corresponds to how effective a given formation/style combinationis against a given opposition formation/style. These are computedusing Equation 3 where we look at the games won when using theformation/style ( 𝑥 ) against the given opposition formation/style( 𝑦 ), both in games we have played (first fraction) and in games wehave observed (second fraction). 𝑤 𝑥𝑦 = (cid:32) 𝑔𝑎𝑚𝑒𝑠𝑤𝑜𝑛𝑔𝑎𝑚𝑒𝑠𝑝𝑙𝑎𝑦𝑒𝑑 + 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑𝑔𝑎𝑚𝑒𝑠𝑤𝑜𝑛𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑𝑔𝑎𝑚𝑒𝑠 (cid:33) ÷ 𝑃 are updated after each game-week so shouldbecome more accurate across the season. In game-week 1 all weightscan either be set to 1 or be carried over from the previous season.In the next season, we outline how 𝑃 is used to optimise the pre-game tactics in the Bayesian Game and in-match decisions in thestochastic game. Once we have computed the weights that we use in 𝑃 , these canbe used when making our pre-match decisions in our Bayesiangame. In the optimisation model, a payoff table is computed foreach combination of opposition actions to give the probability ofthe match outcomes based on their selected action of styles 𝑆 andformations 𝑓 , where ℎ is home win, 𝑑 is a draw and 𝑎 is an awaywin. The payoff for the team is the weighted sum of win and drawprobabilities that we store in payoff table made up from the differentdecision that we can make.We can then apply the computed weights in 𝑃 to the payoff tableto weigh each payoff depending on how successful these have beenin prior games and in observed games. Therefore, we can optimisethe tactical decision based on the weighted payoffs in these tablesusing either the best approach, spiteful or expectimax approacheswhich are decided based on our fluent objective. This means thatif a formation/style combination has never worked in games wehave played or observed the payoff will be weighted by 0 and notbe selected. The same approach can be applied when changingthe formation and style in the in-match stochastic game and eachchange made can be weighted by the corresponding element in 𝑃 .In the next section, we perform a number of experiments on ourmodels and assess the performance over a whole given season aswell as how the inclusion of 𝑂 and 𝑃 each game-week can be usedto help teams improve their performance and meet their objectives. To evaluate our models we use a dataset collected from two seasons(2017/18 and 2018/19) from the English Premier League (EPL). Thedataset breaks down each of the games from the tournament intoan event-by-event analysis where each event gives different metricsincluding event type (e.g., pass, shot, tackle), the pitch coordinatesof the event and the event outcome. This type of dataset is industry-leading in football and used by top professional teams. Thus, thisis a rich real-world dataset that allows us to rigorously assess thevalue of our model.

Here, we test our fluent objective model in each game-week. Firstly,we evaluate the individual game prediction model that is usedto feed the probabilities of outcomes into our season simulation.Secondly, we evaluate our season simulation prediction model usinga Markov-chain Monte-Carlo (MCMC) simulation with respect toits accuracy as the season progresses. In Experiment 2, we test ourMAP estimator for setting fluent objectives at each game-week.To predict the outcome probabilities of individual games we athe deep learning neural network model that calculates pay-offs inthe Bayesian game. Over the past two EPL seasons the accuracy ofthe model is 72.99% with a precision of 69.48%, recall of 59.5% andF1 Score of 59.82 %. This model is used to calculate the probabilitydistribution used in our MCMC model for the entire season.We then run a number of experiments of our MCMC simulationof a season. We predict all remaining games 100,000 times andfind the most likely league standings after 38 game-weeks. We cancompare this to the final league ranks and compare the differences.In Figure 3, we show an average of all clubs’ absolute difference intheir actual finishing position and their predicted finishing position.This is run after each game-week so we have more informationabout the games that have already been completed. Week 0 is theprediction before any games have been played and week 37 is thefinal prediction after 37 out of 38 games have been played. . . D i ff e r e n c e s Ave DifferenceMoving Average

Figure 3: 2018/19 EPL Actual League Standings vs MCMCPredictions We use a fully-connected feed-forward NN with 3 layers & a ReLU activation function. s shown in Figure 3, we can see how in the first half of theseason the league standings remain fairly unpredictable due to thenumber of different possible combinations that we are attempting topredict — there are a total of 2 . × different combinations ofteam order that the league could finish in. We do see however thatas the season unfolds and we have a better idea of team performancethe simulation accuracy improves. This is also to be expected aswe are simulating fewer games later into the season and we havemore evidence from those having taken place in the real world.This shows that we have a suitable method to extract a distributionof where we expect a team to finish and can therefore derive thefluent objective using a MAP estimation to get our objective. Thisis shown in the next experiment.

To test our MAP estimation, after each game-week simulation weset the fluent objective for all 20 EPL teams. We then assess if theirobjective was met at that game-week and show the percentageof teams that were successful in meeting their objectives. This isshown in Figure 4 where week 0 is the prediction before any gamesand week 37 is the final prediction. A cc u r a c y % % AccuracyMoving Average Figure 4: Accuracy of Setting the Fluent Objective (2018/19EPL Season).

As we can see in Figure 4 the fluent objective accuracy rises as theseason progresses and from week 15 onwards we see the accuracyof the fluent objective setting rise more clearly. This shows that wecan set realistic to aim for as the season progresses in relation tothe actual league outcomes and what was achieved by the teams.One thing to note in this experiment is that not every team in theleague can meet their objective as there may be more teams aimingfor something than can achieve it (e.g., 3 teams aiming to win theleague). Also, 3 teams must always be relegated which the minimumobjective is to avoid, meaning that even in the best case only 85%of teams will achieve their objective. We find that in weeks 36 and37, we reach the maximum 85% of teams meeting their objectives.

To test the impact of the addition of the weights 𝑤 that we estimatein 𝑃 , we evaluate how the weights are able to boost our ability to The vast number of possible combination is why we use position differences ratherthan the overall accuracy of the entire standings after each game-week. predict the outcomes of games based on the tactical decisions andtherefore improve our payoff model. To evaluate our 𝑃 weights, wecompare the accuracy of the predictions of the model presentedin [8] both with and without 𝑃 (this model makes up part of thefeature set that is used for calculating the payoffs). We then assessthe differences in terms of the models’ ability to be able to accuratelypredict the outcome of the game running the tests over 1046 games.In both cases, the prediction is the highest probability that is givento one of the results (home win, away win and draw). The resultsfrom this experiment are shown in Figure 5. Accuracy Precision Recall F1 Score5055606570 P e r c e n t a g e ( % ) Without 𝑃 With 𝑃 Figure 5:

Payoff Model Performance Comparison.

As we can see in Figure 5 by using the weights in 𝑃 we are ableto boost in the accuracy of the model, and therefore the accuracyof our payoffs, achieving a boost of 1.76%. We also see that there isan increase in the precision, recall and F1 of our model by 1.50%,1.72% and 1.27% respectively. Even though this represents a fairlysmall increase to the results of the model in [8], it shows that bylearning from what tactics have worked (both for your team andothers), we can boost our ability to calculate the tactical decisionpay-off and therefore our ability to optimise decisions made. Overa large scale of time such as a 38 game-week season, a 1.76% boostin performance could be the difference between finishing a placehigher in the league which can have huge financial gain and helpto achieve our set fluent objective. Our final experiment assesses how we incorporate the fluent ob-jective 𝑂 and weights in 𝑃 into the tactical decision-making opti-misation model presented in [3] and evaluate how this improvesteam performance to help them meet their objective. To test thiswe simulate an entire season week by week and apply our model toa single team in the simulation. After each game-week we simulatethe remaining games and recalculate 𝑂 and 𝑃 as outlined in Figure1. We then compare our results using the new model across a simu-lated season against a simulation where we do not use the 𝑂 and 𝑃 .We show the results from this when running separate simulationsfor a set of different teams (the team we use is the only teamusing the new model in each simulation) in Figure 6. We show theaverage difference in the mean-expected finishing position from The precision, recall and F1 score are computed as a weighted average of the abilityto predict each outcome using SciKit Learns’ multi-class support. We use the bottom 8 teams in the 2018/19 EPL season to show we can improve theirperformance. he distribution of each team that we run our season simulation for,both using the new model and without. 𝑃 and 𝑂 Without 𝑃 and 𝑂 Average Difference in Final Position

Figure 6:

Payoffs of Real-World vs. Optimised Decisions

This shows how our model can improve the probability of teams’finishing positions and see that on average there is a 2.90 positionimprovement when using 𝑂 and 𝑃 compared to without for our testset of teams. This is achieved as by using 𝑂 and 𝑃 teams can addmore context to their decisions, also by selecting the optimal tacticseach week in the simulation using the model in [3] we would alsoexpect to see a boost to the performance. Below, we highlight anexample of the distribution improvement of the simulation whenaiming to optimise the performance of Southampton FC (only teamusing the optimisation model in the simulation). Figure 7 showsthe distribution with 𝑂 and 𝑃 applied and not applied.5 10 15 200102030 𝜇 = . 𝜇 = . Final League Position P r o b a b i l i t y ( % ) WithoutWith

Figure 7: Example League Outcome Probability Distributionfor Southampton FC in 2018/19.

As we can see from the example shown in Figure 7, we canuse the fluent objectives to help teams boost their probabilities ofwinning games that matter, and thus boost their expected finishingposition, increasing the mean of the expected finishing distributionby up to 35.6%. We see similar improvements to this across ourtest set of teams. In the next section, we will further discuss theseresults, the real-world implications and some further findings.

One interesting finding from further experiments is when we sim-ulate the season with all teams using the model discussed in thispaper to select their tactics. When we run this simulation, we findthat the results cancels itself out and the final standings are verysimilar to what we see when we run the simulation without thenew fluent objective and prior game weights. We see that there is aboost of under 1 position on average per team when every teamuses the model in the same season. This shows that teams can gaina boost in their performance over the season but only if they utilisethe game theoretic approaches while all others are not. Another observation we see in our results is when we comparethe increase in the positional distribution using the model betweenthe stronger top-half teams and the teams who are in the lower halfof the league and aiming to stay in the division. When using themodel for the latter, we observe a substantial boost of up to 35.6% inlong-term performance. This may be due to the algorithm helpingteams using the new model gain positive results in the closer gamesat the bottom of the table when playing teams of similar ability andthus preventing them gaining points by taking all 3 for yourself.In turn, higher up the league teams often win the games they areexpected to against weaker teams so the performance boost is lower.It is also worth noting that across the season there are also anumber of other variables that can affect team decision-makingboth tactically and off the pitch. As teams re-assess their objectivesin the season, there are decisions off the pitch that can help boosttheir performing as well as the tactical decision optimisation thathelps on it. One example is a change in managers/coaches, this isoften a measure taken for an underperforming team and can helpboost performance. If a team is doing well and wants to push higherup the table or is struggling and needs new players, then duringJanuary teams are able to invest money into new players to improvetheir team and again improve. These types of decisions could beadded into the model to help decision makers at clubs subjectivelydecide when to invest more money or make changes.

This paper presents a novel model for the long-term tactical deci-sions that are made in football and helps teams to optimise theirdecisions by adding more long-term context. We introduce theconcept of a fluent objective that allows us to re-evaluate team per-formance and base decisions based on a wider environment. We findthat we can build models that are able to predict the final outcomeof the table on a regular basis, and then using a MAP estimation toeffectively set the fluent objective each week. We also learn fromother games that happen in the overall environment and find thatthis can boost the performance of pay-off models in our multi-stepgames. Overall, we find that our model can be used for footballteams who are looking to improve their overall expected leagueposition (on average improves teams by 2.90 positions) and, showthat the concept of a fluent objective can help to optimise long-termperformance in a competitive league setting.Due to the success we show when using fluent objectives for anapplication to football in this paper, in future work we intend to testour approach in other domains. For example, they could be used insecurity games and UAV swarms as the objective also often changeover a given time frame. This testing will help to further verifyhow the modelling of objectives can aid long-term performance.We also aim to further improve our 𝑃 weights with applications ofthe observational learning and reinforcement learning as presentedin [6]. Finally, the reinforcement learning techniques presented in[17, 24] could be used to further optimise team performance. ACKNOWLEDGMENTS

We would like to thank the reviewers for their comments. Thisresearch is supported by the AXA Research Fund and the EPSRCNPIF doctoral training grant number EP/S515590/1.

EFERENCES [1] Per Bak, Maya Paczuski, and Martin Shubik. 1997. Price variations in a stockmarket with many agents.

Physica A: Statistical Mechanics and its Applications

The international encyclopedia ofcommunication (2008).[3] Ryan Beal, Georgios Chalkiadakis, Timothy J Norman, and Sarvapali D Ramchurn.2020. Optimising Game Tactics for Football. In

Proceedings of the 19th InternationalConference on Autonomous Agents and MultiAgent Systems . 141–149.[4] Ryan Beal, Narayan Changder, Timothy J Norman, and Sarvapali D Ramchurn.2020. Learning the Value of Teamwork to Form Efficient Teams. In

Proceedingsof AAAI 2020 . 7063–7070.[5] Ryan Beal, Timothy J. Norman, and Sarvapali D. Ramchurn. 2019. Artificialintelligence for team sports: a survey.

The Knowledge Engineering Review

Proceedings of the 18th International Conference on Autonomous Agents andMultiAgent Systems . 1117–1124.[7] Tom Decroos, Lotte Bransen, Jan Van Haaren, and Jesse Davis. 2020. VAEP: AnObjective Approach to Valuing On-the-Ball Actions in Soccer. In

Proceedings ofthe 29th International Joint Conference on Artificial Intelligence .[8] Mark Dixon and Stuart Coles. 1997. Modelling Association Football Scores andInefficiencies in the Football Betting Market.

Journal of the Royal StatisticalSociety: Series C (Applied Statistics)

46, 2 (1997), 265–280.[9] Javier Fernández, Luke Bornn, and Dan Cervone. 2019. Decomposing the Immea-surable Sport: A deep learning expected possession value framework for soccer.

MIT Sloan Sports Analytics Conference

IEEE transactionson speech and audio processing

2, 2 (1994), 291–298.[11] Min Jang and Sungzoon Cho. 1999. Ensemble learning using observationallearning theory. In

IJCNN’99. International Joint Conference on Neural Networks.Proceedings (Cat. No. 99CH36339) , Vol. 2. IEEE, 1287–1292.[12] Jeremy D Jordan, Sharif H Melouk, and Marcus B Perry. 2009. Optimizing footballgame play calling.

Journal of Quantitative Analysis in Sports

5, 2 (2009).[13] Gary Kagan, Herbert Mayo, and Robert Stout. 1995. Risk-adjusted returns andstock market games.

The Journal of Economic Education

26, 1 (1995), 39–50.[14] Christopher Kiekintveld, Manish Jain, Jason Tsai, James Pita, Fernando Ordóñez,and Milind Tambe. 2009. Computing optimal randomized resource allocationsfor massive security games. In

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems-Volume 1 . 689–696.[15] Fangzhen Lin. 2008. Situation calculus.

Foundations of Artificial Intelligence

Nature

Twenty-Sixth AAAI Conference on Artificial Intelligence .[18] Praveen Paruchuri, Jonathan P Pearce, Janusz Marecki, Milind Tambe, FernandoOrdonez, and Sarit Kraus. 2008. Playing games for security: An efficient exactalgorithm for solving Bayesian Stackelberg games. In

Proceedings of the 7thinternational joint conference on Autonomous agents and multiagent systems-Volume 2 . International Foundation for Autonomous Agents and MultiagentSystems, 895–902.[19] Bilal Piot, Matthieu Geist, and Olivier Pietquin. 2013. Learning from demonstra-tions: Is it worth estimating a reward function?. In

Joint European Conference onMachine Learning and Knowledge Discovery in Databases . Springer, 17–32.[20] Sarvapali D Ramchurn, Trung Dong Huynh, Feng Wu, Yukki Ikuno, Jack Flann,Luc Moreau, Joel E Fischer, Wenchao Jiang, Tom Rodden, Edwin Simpson, et al.2016. A disaster response system based on human-agent collectives.

Journal ofArtificial Intelligence Research

57 (2016), 661–708.[21] Anand Ranganathan and Roy H Campbell. 2003. An infrastructure for context-awareness based on first order logic.

Personal and Ubiquitous Computing

7, 6(2003), 353–364.[22] Stuart Russell. 1998. Learning agents for uncertain environments. In

Proceedingsof the eleventh annual conference on Computational learning theory . 101–103.[23] Eric Anyung Shieh, Bo An, Rong Yang, Milind Tambe, Craig Baldwin, JosephDiRenzo, Ben Maule, and Garrett Meyer. 2012. PROTECT: An application ofcomputational game theory for the security of the ports of the United States. In

Twenty-Sixth AAAI Conference on Artificial Intelligence .[24] David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, GeorgeVan Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershel-vam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neuralnetworks and tree search.

Nature

IEEE Transactions on Systems, Man, and Cybernetics, Part B(Cybernetics)

33, 2 (2003), 188–201.[26] Jasper A Vrugt, James M Hyman, Bruce A Robinson, Dave Higdon, Cajo JFTer Braak, and Cees GH Diks. 2008.