[PDF] Adaptive Music Composition for Games

Abstract

The generation of music that adapts dynamically to content and actions has an important role in building more immersive, memorable and emotive game experiences. To date, the development of adaptive music systems for video games is limited by both the nature of algorithms used for real-time music generation and the limited modelling of player action, game world context and emotion in current games. We propose that these issues must be addressed in tandem for the quality and flexibility of adaptive game music to significantly improve. Cognitive models of knowledge organisation and emotional affect are integrated with multi-modal, multi-agent composition techniques to produce a novel Adaptive Music System (AMS). The system is integrated into two stylistically distinct games. Gamers reported an overall higher immersion and correlation of music with game-world concepts with the AMS than with the original game soundtracks in both games.

Full PDF

IIEEE TRANSACTIONS ON GAMES (PREPRINT) 1

Adaptive Music Composition for Games

Patrick Hutchings, Jon McCormack

Abstract —The generation of music that adapts dynamicallyto content and actions has an important role in building moreimmersive, memorable and emotive game experiences. To date,the development of adaptive music systems for video games islimited by both the nature of algorithms used for real-time musicgeneration and the limited modelling of player action, game worldcontext and emotion in current games. We propose that theseissues must be addressed in tandem for the quality and ﬂexibilityof adaptive game music to signiﬁcantly improve. Cognitive modelsof knowledge organisation and emotional affect are integratedwith multi-modal, multi-agent composition techniques to producea novel Adaptive Music System (AMS). The system is integratedinto two stylistically distinct games. Gamers reported an overallhigher immersion and correlation of music with game-worldconcepts with the AMS than with the original game soundtracksin both games.

I. I

NTRODUCTION V IDEO GAME experiences are increasingly dynamic andplayer directed, incorporating user-generated content andbranching decision trees that result in complex and emer-gent narratives [1]. Player-directed events can be limitless inscope but their signiﬁcance to the player may be similar orgreater than those directly designed by the game’s developers.Game music, however continues to have far less structuraldynamism than many other aspects of the gaming experience,limiting the effects that unique, context speciﬁc music scorescan add to gameplay. Why shouldn’t unpredicted, emergentmoments have unique musical identities? As the soundtrackis an inseparable facet of the shower scene in Hitchcock’s‘Psycho’, context-speciﬁc music can help create memorableand powerful experiences that actively engage multiple senses.Signiﬁcant, documented effects on memory [2], immersion[3] and emotion perception [4] can be achieved by combiningvisual content and narrative events with music containingspeciﬁc emotive qualities and associations. In games, musiccan also be assessed in terms of narrative ﬁt and functional ﬁt -how sound supports playing [5]. The current standard practiceof constructing music for games by starting music tracks orlayers with explicitly deﬁned event triggers, allows for a rangeof expected situations to be scored with a high conﬁdence ofmusical quality. However, this introduces extensive repetitionand inability to produce unique musical moments for a rangeof unpredicted events.In this paper we present an adaptive music system (AMS)based on cognitive models of emotion and knowledge organ-isation in combination with a multi-agent algorithmic musiccomposition and arranging system. We propose that a missing,essential step in producing affective music scores for videogames is a communicative model for relating events, contentsand moods of game situations to the emotional language of

P. Hutchings and J. McCormack are with Monash University. music. This would support the creation of unique musicalmoments that reference past situations and established rela-tionships without excessive repetition. Moreover, player-drivenevents and situations not anticipated by the game’s developersare accommodated.The spreading activation model of semantic content organi-sation used in cognitive science research is adopted as a gener-alised model that combines explicit and inferred relationshipsbetween emotion and objects and environments in the gameworld. Context descriptions generated by the model are fed to amulti-agent music composition system that combines recurrentneural networks with evolutionary classiﬁers to arrange anddevelop melodic contents sourced from a human composer(see Figure 1). II. R

ELATED W ORK

This research is primarily concerned with ‘Experience-Driven Procedural Content Generation’, a term coined by Yan-nakakis and Togelius [6] and applied to many aspects of gamedesign [7]. As such, the research process and methodology aredriven by questions of the experience of music and games,individually and in combination. We frame video games asvirtual worlds and model game contents from the perspectiveof the gamer. The virtual worlds of games include their ownrules of interaction and behaviour that overlap and contrastwith the physical world around us, and cognitive models ofknowledge organisation, emotion and behaviour give cluesabout how these worlds are perceived and navigated. Thisframing builds on current academic and commercial systemsof generating music for games based on gameworld events,contents and player actions.Music in most modern video games displays some adaptivebehaviour. Dynamic layering of instrument parts has becomeparticularly common in commercial games, where instanta-neous environment changes, such as speed of movement,number of competing agents and player actions, add or removesonic layers of the score. Tracks can be triggered to playbefore or during a scripted event or modiﬁed with ﬁlters tocommunicate gameplay concepts such as player health. Detailsof the music composition systems from two representativepopular games, as revealed through creator interviews, showthe state of generative music in typical, large-budget releases.

Red Dead Redemption [8] is set in a re-imagined wild Amer-ican west and Mexico. The game is notable in its detailed lay-ering of pre-recorded instrument tracks to accompany differentgameplay events. To accommodate various recombinations ofinstrument tracks, most of the music sequences were writtenin the key of A minor and a tempo of 130 beats per minute[9]. Triggers such as climbing on a horse, enemies appearingand time of day in the game world add instrumental layers tothe score to build up energy, tension or mood.

Preprint of: P. Hutchings and J. McCormack, ‘Adaptive Music Composition for Games’, IEEE Transactions on Games, July 2019, pp. 1-11,doi: 10.1109/TG.2019.2921979 a r X i v : . [ c s . MM ] J u l EEE TRANSACTIONS ON GAMES (PREPRINT) 2

Fig. 1. Architecture of the Adaptive Music System (AMS) for modelling game-world context and generating context-speciﬁc music.

No Man’s Sky [10] utilises a generative music systemand procedural content generation of game worlds. Audiodirector Paul Weir has described the system as drawing fromhundreds of small carefully isolated fragments of tracks thatare recombined following a set of rules built around game-world events. Weir mentions the use of stochastic choices ofmelodic fragments but has explicitly denied the use of geneticalgorithms or Markov chains [11].Adaptive and generative music systems for games have anextensive history in academic research. From 2007 to 2010 anumber of books were written on the topic [12]–[14] but overthe last decade, translation from research to wide adoption incommercial games has been slow.Music systems have been developed as components ofmulti-functional proceedural content generation systems. Ex-tensions of the dynamic layering approach were implementedin

Sonancia [15], which triggers audio tracks by enteringrooms generated with associated affect descriptions.Recent works in academia have aimed to introduce the useof emotion metrics to direct music composition in games,but typically with novel contributions in either compositionalgorithms or modelling context in games.Eladhari et al. utilised a simple spreading activation modelwith emotion and mood nodes in the

Mind Music systemthat was used to inform musical arrangements for a singlecharacter in a video game [16], but like many adaptive musicsystems for video games presented in the academic literature[15], [17]–[19] there is no evaluation through user studies orpredeﬁned metrics of quality to judge the effectiveness of theimplementation or underlying concepts.Recent work by Ib´a˜nez et al. [19] used an emotion modelto mediate music scoring decisions based on game-dialoguemediated with primary, secondary and tertiary emotion activa-tions of discrete emotion categories. The composition systemused pre-recorded tracks that were triggered by the speciﬁcemotion description. In contrast Scirea et al. [20] developed anovel music composition model, MetaCompose, that combinesgraph traversal-based chord sequence generation with an evo-lutionary system of melody generation. Their system producesmusic for general affective states – either negative or positive– through the use of dissonance.This research treats the modelling of relationships betweenelements of game-worlds and emotion as fundamental to musiccomposition, as they are key aspects of the compositionalprocess of expert human composers. As the famed Hollywoodﬁlm composer Howard Shore states: “I want to write and feel the drama. Music is essentially an emotional language, so youwant to feel something from the relationships and build musicbased on those feelings.” [21].III. A

LGORITHMIC T ECHNIQUES

Algorithmic music composition and arranging with comput-ers has developed alongside, but mostly separately from videogames. While algorithms have become more advanced, mosttechniques lack the expressive control and musical consistencydesired in a full composition system for use in commercialvideo games. Algorithmic composition techniques exhibit arange of strengths and weaknesses, particularly in regards toconsistency, transparency and ﬂexibility.Evolutionary systems have been popular in music generationsystems, with a number of new evolutionary music systemsdeveloped in the 1990s and early 2000s in particular [22], [23],allowing for optimised solutions for problems that cannot besolved efﬁciently or have changing, unpredictable dynamics.Evolutionary music systems often utilise expert knowledge ofmusic composition in the design of ﬁtness functions and canproduce complexity from a small set of rules. These propertiesresult in greater ﬂexibility, but with reduced predictability orconsistency.Wilson’s

XCS is an eXtended learning Classiﬁer System thatuses an evolutionary algorithm to evolve rules, or classiﬁers,that describe actions to be taken in a given situation [24].In XCS, classiﬁers are given a ﬁtness score based on thedifference between the reward expected for a particular actioncompared to the reward received. By adapting to minimisepredictive error rather than just maximising reward, classiﬁersthat are only rewarded well in speciﬁc and rare contexts canremain in the population. This helps prevent convergence tolocal minima – particularly important in the highly dynamiccontext of music. XCS has been successfully used in creativeapplications to generate visuals and sound for dynamic artiﬁ-cial environments [25] to take advantage of these properties.Neural networks have been used for music generationsince the early 1990s [26]–[28] but have recently grown inpopularity due to improvements in model design and growthof parallel compute power in PCs with graphical processingunits (GPUs). Surprisingly musical results have come outof applying NLP neural network models to music sequencegeneration [29], [30]. These models have proven effectiveat structuring temporal events, including monophonic melodyand chord sequence generation, but suffer from a lack of con-sistency, especially in polyphonic composition. Deep learning

EEE TRANSACTIONS ON GAMES (PREPRINT) 3 with music audio recordings is a growing area of analysis andcontent generation [31] but is currently too resource intensiveto be used for real-time generation.IV. T HE E XPERIENCE OF I NTERACTION

Most games are interactive and infer some intention or as-pect of the gamer experience through assumptions of how thegame action might be affecting the gamer. Such assumptionsare informed by internal models that game designers haveestablished about how people experience interactive contentand can be formalised into usable models. Established method-ologies for analysing experience have lead to demonstratedmeasurable psychological effects when interacting with virtualenvironments.Sundar et al. [32] outlined an agency based theory ofinteractive media and found empirical evidence suggesting thatagency to change interactive media has strong psychologicaleffects connected to identity, sense of control and persuasive-ness of messages. The freedoms of the environment resultin behavioural change. The user is a source of experience,knowledge and control in the virtual world, not just anobserver.Ryan et al. [33] listed four key challenges for interactiveemergent narrative: modular content , compositional represen-tational strategies , story recognition , and story support . Thesechallenges also affect dynamic music composition and can allbe assisted by effective knowledge organisation. By organisinggame content and events as units of knowledge connected bytheir shared innate properties and by their contextual role inthe player’s experience, new game states can be analysed as aweb of dynamic contents and relationships. A. Knowledge Organisation

Models of knowledge organisation aim to represent knowl-edge structures. Buckley [34] outlined properties of knowledgestructures within the context of games: “ ... knowledge struc-tures (a) develop from experience; (b) inﬂuence perception;(c) can become automated; (d) can contain affective states,behavioural scripts, and beliefs; and (e) are used to guideinterpretations and responses to the environment. ”.The spreading activation model [12] is a knowledge organ-isation model that was ﬁrst developed and validated [35] toanalyse semantic association [36], but has also been used asa model for other cognitive systems including imagery andemotion [37], supporting its suitability for use in video games.The model is constructed as a graph of concept nodes thatare connected by weighted edges representing the strength ofthe association between the concepts. When a person thinks ofa concept, concepts connected to it are activated to a degreeproportional to the weight of the connecting edge. Activationspreads as a function of the number of mediating edges andtheir weights. The spreading activation model conforms to theobserved phenomena where words are recalled more quicklywhen they are primed with a similar word directly beforehand.However, it suffers as a predictive tool because it posits that themodel network structure for each person is unique, so without mapping out an individual’s associations it isn’t possible toaccurately predict how activation will spread.Spreading activation models don’t require logical structuringof concepts into classes or deﬁning features, making it possibleto add content based on context rather than structure. Forexample, if a player builds a house out of blocks in

Minecraft ,it does not need to be identiﬁed as a house. Instead, itsposition in the graph could be inferred by time charactersspend near it, activities carried out around it, or known objectsstored inside it. Heuristics based on known mechanisms ofassociation can be used to turn observations into associationsof concepts. For example, co-presentation of words or objectscan create associations through which spreading activation hasbeen observed [38].While the spreading activation model was originally appliedto semantic contents, it has been successfully used to modelactivation of non-semantic concepts, including emotion [39].

B. Emotion

Despite a general awareness of the emotive quality thatvideo games can possess [40], it is uncommon for games toinclude any formal metrics of the intended emotional effectof the game, or inferred emotional state of the gamer. Yetemotion perception plays a pivotal role in composing musicfor games [41], so having a suitable model for representingemotion is critical for any responsive composition system.The challenge of implementing an emotion model in a gameto assist in directing music scoring decisions is made greaterby fundamental issues of emotion modelling itself: there isno universally accepted model of human emotion amongcognitive science researchers. Generalised emotion models canbe overly complex or lacking in expressive detail and domainspeciﬁc models lack validation in other domains. Severalmusic-speciﬁc models and measures of emotion have beenproduced [42]–[44]. The Geneva Emotions in Music Scales(GEMS) [44] were developed for describing emotions thatpeople feel and perceive when listening to music, but havebeen shown to have reduced consistency with music stylesoutside of the western classical tradition [45]. Music-speciﬁcemotion models lack general applicability in describing awhole game context as they are based on a music listening-only modality.A spreading activation network that contains vertices repre-senting emotions can be used to model how recalling certainobjects can stimulate affect. In video games this can beexploited by presenting particular objects to try and triggeran affective response or by trying to affect the user in a waythat activates their memory of speciﬁc objects. An object thatthe user associates with threat, such as a ﬁerce enemy, couldnot only activate an affect of threat for the player but alsoother threatening objects in their memory through mediatedactivation.V. E

MOTION P ERCEPTION E XPERIMENT

For our system, a basic emotion model that uses ﬁve affectcategories of happiness , fear , anger , tenderness and sadness was adopted based on Juslin’s observation of these being EEE TRANSACTIONS ON GAMES (PREPRINT) 4 the most consistently used labels in describing music acrossmultiple music listener studies [46], as well as commonlyappearing in a range of basic emotion models. Unlike thefrequently utilised Russell’s Circumplex model of affect [47],this model allows for representations of mixed concurrentemotions which is a known affective property of music [48].

Excitement was added as a category following consultationwith professional game composers that revealed excitementto be an important aspect of emotion for scoring video gamesnot covered in the basic affect categories. These categories arenot intended to represent a complete cognitive model, rathera targeted selection of terms that balance expressive rangewith generalisability within the context of scoring music forvideo games. Although broken down into discrete categories,their use in a spreading activation model requires an activationscale, so can be understood as a six dimensional modelof emotion removing issues of reduced resolution found instandard discrete models, highlighted in Eerola and Vuoski’sstudy on discrete and dimensional emotion models in music[49].A listener study was conducted to inform the AMS pre-sented and evaluated in this paper. The study was designed tolook for correlations between music properties and perceptionof emotion in listeners over a range of compositions withdifferent instrumentation and styles. It was shared publicly ona website and completed by 134 participants.Thirty original compositions were created, in jazz, rock,electronic and classical music styles with diverse instrumen-tations, each 20-40 seconds in length. The compositions weremanually notated and performed using sampled and synthe-sised instruments in

Sibelius and

Logic Pro software.Participants were asked to listen to two tracks and identifythe track in which they perceived the higher amount of oneemotion from the six discrete affect categories, presented astext in the middle of the interface with play buttons foreach track. Pairwise assessment is an established approachfor ranking music tracks by emotional properties [50]. Whenboth tracks were listened to in their entirety, participants couldrespond by clicking one button: Track A, Track B or Draw.Track A was selected randomly from a pool of tracks that hadnot been played in the last 10 comparisons.The Glicko-2 chess rating system [51] was adopted forranking tracks in each of the six affect categories. WithGlicko-2 ranking, players that are highly consistent have alow rating volatility and are less likely to move up or downin rank due to the results of a single match. Each track hada rating, rating deviation and rating volatility, initialised withthe default Glicko-2 values of 1500, 350 and 0.06 respectively.The system adjusts the ratings, deviation and volatility oftracks after they have been compared with each other andranks them by their rating.Features of the note by note symbolic representations ofthe tracks were analysed for correlation with rankings in eachcategory, summarised in Appendix Table A1.Analysis of track rankings and musical content of each trackrevealed a range of music properties (see Appendix TableA2) correlate with perception of speciﬁc emotions. These correlations were used in the design of the AMS to help itproduce mood-appropriate music.VI. AMS A

RCHITECTURE

An AMS for scoring video games with context-speciﬁc,affective music was developed and integrated into two stylis-tically distinct games. For testing with different, pre-builtgames, the AMS was developed as a stand-alone package thatreceives messages from video games in real-time to generate amodel of the game-state and output music. The system consistsof three key components:1) A spreading activation model of game context;2) The six category model of emotion for describing gamecontext;3) An adaptive system for composing and arranging expertcomposed music fragments using these two models.

A. Spreading Activation

A model of spreading activation was implemented as aweighted, undirected graph G = ( V, E ) , where V is a setof vertices and E a set of edges, using the Python library NetworkX . Fig. 2. Visualisation of spreading activation model during testing with a role-playing video game. Vertex activation is represented by greyscale shading,linearly ramping from white to dark grey. Edge weights were represented byedge width.

Each vertex represents a concept or affect, with the weightof the vertex w V : V ⇒ A (for activation A ∈ IR ) representingactivation between the values 0 and 100. Normalised edgeweights w E : E ⇒ C (for association strength C ∈ IR )represent the level of association between vertices and areused to facilitate the spreading of activation. Three node typesare used to represent affect (A), objects (O) and environments(N) such that V = A ∪ O ∪ N . The graph is not necessarilyconnected and edges never form between affect vertices.A new instance of the graph with the six affect categories of sadness , happiness , threat , anger , tenderness and excitement is spawned whenever a new game is started. The model is usedto represent game context in real-time, regular messages fromthe game are used to update the graph. These updates can add EEE TRANSACTIONS ON GAMES (PREPRINT) 5

Fig. 3. System Diagram for the AMS. new vertices, new edges, or update edge weights. An OSC[52] client was added to the game engines to communicatethe game state as lists of content activations and relationshipsfor this purpose.Every 30ms, the list of messages received by the OSCserver is used to update the activation values of vertices inthe graph. If a message showing activation of a concept isreceived and the concept does not yet exist in the graph, a newvertex representing that concept is created. At each update ofthe graph, edges from any vertex with activation above 0 aretraversed and connected vertices are activated proportionally tothe connecting edge weights. For example, if vertex A has anactivation of 50 and is connected to vertex B with an edge ofweight 0.25 then vertex B would be activated to a minimumvalue of × .

25 = 12 . . If vertex B is already activatedabove the this value, then no change is made to its activation.Determining activation of concepts and affect categoriesfrom gameplay depends on the mechanics of the game, andwould ideally be a consideration at all stages of game devel-opment.Edges are formed between vertices using one of twomethods. Inferred edges are developed from co-activation ofconcept vertices. When two concepts have an activation above50 at the same time, an edge is formed between them. Edgescan also be assigned from game messages, allowing game creators to explicitly deﬁne associations. For example, an edgebetween the vertices for the character ‘Grandma’ and affectcategory ‘Tenderness’ can be assigned by the game creatorsand communicated in a single message.Activation of concepts, and edge weights, fade over time.Fading rate is determined by the game length, number ofconcepts or game style as desired. For the games tested, edgeweights would not fade when the association was explicitlydeﬁned and inferred associations would fade at a rate of 0.01per second, while vertex activations faded at a rate of 0.1 persecond.Melodic themes were precomposed for a subset of knownobjects in the game. These themes were implemented asproperties of object vertices (Eqn. 1). v t = theme i ∀ v ∈ O (1)At every frame, the knowledge graph is exposed to themusic composition system and new music is generated toaccompany gameplay. B. Music Composition

A multi-agent composition system (Figure 3) was developedbased on observations of group dynamics of improvisingmusicians and the relative strengths and weakness of differentalgorithmic composition techniques for distinct roles in thecomposition process. The agent architecture was developedusing a design science methodology [53] with iterative designcycles. A combination of music theory, including descriptive‘rules’ of harmonic tension and frequently used melody manip-ulation techniques, and calibration based on aural assessmentof real-world performance guided these cycles. The agentswere assessed through a series of user studies, which pro-vided feedback to support or reject more general aspects ofapproaches used.

1) Multi-agent Dynamics:

Co-agency is the concept ofhaving individual agents that work towards a common goalbut the agency to decide how to work towards that goal. Itis observed in team sports, where players rely on their ownskills, knowledge and situational awareness to decide whatactions to take, but work as a team to win the game. Someteams have a ‘game plan’ which give soft rules for how co-agency will be facilitated, but still leave room for independentthought and action. Co-agency is also evident in improvisingmusic ensembles [54]. Musicians add their own creative inputto create original compositions as a group, using a startingmelody, chord progression or rhythmic feel as their ‘gameplan’.In the AMS, multiple software agents are used to generatemusical arrangements. Agents do not represent performers,instead roles that a player or multiple players in an improv-ing group may take. Agents were designed to have either

Harmony , Melody or Percussive Rhythm roles for developingpolyphonic compositions with a mixture of percussive andpitched instruments. Agents generate two measures worth ofcontent at a time.

EEE TRANSACTIONS ON GAMES (PREPRINT) 6

TABLE IH

ARMONIC CONTEXT REPRESENTATION AS A D MATRIX FOR ANEXAMPLE CHORD PROGRESSION .C7 E7B 0.3 0.3 0.3 0.3 0.8 0.8 0.8 0.8A

2) Harmony Agent:

The harmony agent utilised a RNNmodel designed as an extension to the tested harmony systempresented by Hutchings and McCormack [55]. The RNNcontains two hidden layers, and an extensive hyper parametersearch lead to the use of gated recurrent units (GRU) with 192units per layer.Chord symbols were extracted from 1800 folk com-positions from the Nottingham Music Database found athttp://abc.sourceforge.net/NMD/, 2000 popular songs in rock,electronic, rhythm and blues and pop styles from the

Wikifonia database (no longer publicly available) and 2986 jazz standardsfrom collection of the ‘Real Book’ series of jazz standards[56]. To aid the learning of chord patterns by style and encodetime signature information into the training data, barlines werereplaced with word tokens representing the style of music,either pop , rock , jazz or folk .Dynamic unrolling produced signiﬁcantly faster trainingtimes, and a lower perplexity was achieved through the im-plementation of peep-hole mechanisms. Encoding of chordsymbol tokens aided training, with best results achieved using100 dimension encoding. The ﬁnal layer of the networkutilised softmax activation to give a likelihood value for anyof the chord symbols in the chord dictionary to come nextfor a given input sequence. This allows the harmony agent toﬁnd likely next chords, add them to the chord progression of acomposition and feed them back into itself to further generatenew chords for a composition.The output of the softmax layer is used as an agentconﬁdence metric as used by Hutchings and McCormack [55].The conﬁdence of the agent is compared with other agents inthe AMS to establish a ‘leader’ agent at any given time. Whenthe harmony agent has a higher conﬁdence score than theﬁrst melody agent, the harmony agent becomes the leader andthe melody agent searches for a melody that is harmonicallysuitable. If the harmony agent conﬁdence is lower, less likelychords are progressively considered if needed to harmonicallysuit the melody.The harmony agent does not add any notes to the gamescore, instead it produces a harmonic context which is usedby multiple melody agents to write phrases for all the pitchedinstruments in the score.The harmonic context is a 2D matrix of values that represent which pitches most strongly signify a chord and key over thecourse of a musical measure I. For example, a row representingthe pitch ‘C’ would be populated with high values when thetoken output of the harmony agent is a C major chord. In thissituation the row representing the pitch ‘E ﬂat’ would havelow values as hearing an E ﬂat would suggest an E minorchord. These values are formulated as resource scores thatthe melody agent can use to decide which combination ofmelodic notes will best reﬂect the chord and key. The matrixhas 12 rows representing the 12 tones in the western equaltemperament tuning system, from ‘C’ to ‘B’, and the columnsrepresent subdivisions of beats across four measures. When theharmony agent generates chords for two measures the matrixis populated with values that represent the available harmonicresource for the melody agents to use.Root tones are assigned a resource value of 1.0 and chordtones a value of 0.8. All other values carry over from theprevious measure to help establish tonality and are clampedto a range of 0-0.5 to prioritise chord tones. The ﬁrst measureis initialised with all values set to 0.3.The harmonic ﬁtness score, H , is given by the averageresource value at each cell K i a note from the melody fragmentinhabits: H = 1 L L (cid:88) i =1 f ( K i ) (2)

3) Melody Agents:

At any point of gameplay, the knowl-edge graph represents the activation of emotion, object andenvironment concepts. The melodic theme assigned to thehighest activated object is selected for use by melody agentswhen melodic content is needed in the score. A melody agentexists for each instrument deﬁned in the score by the humancomposer and each agent adds melodic content every twomeasures.XCS is used to evolve rules for modifying pre-composedmelodic fragments using melody operators as actions. Itis used to reduce the search space of melody operationsto support real-time responsiveness of the system. Melodyfragments of one to four measures in length are composedby expert composers as themes for different concepts in thegame. A list of eight melody operators were used based ontheir prevalent use in music composition: Reverse, Diminish,Augment, Invert, Reverse-Diminish, Reverse-Augment, Invert-Diminish and Invert-Augment. Reversal reverses the order ofnotes, augmentation and diminution extend and shorten thelength of each note element-wise, respectively by a constantfactor and inversion inverts the pitch steps between notes. Theactivation of each affect category is represented with a 2 bitbinary number, to show concept activations of 0-24, 25-49,50-74 or 75-100 (see Section VI-A). A 6-bit unique theme id is appended to each of the emotion binary representations tocreate an 18-bit string representation of the environment.Calculations of reward can be modiﬁed based on experimen-tation or adjusted to the taste of the composer. The equationsimplemented as the default rewards were established usingthe results of the emotion survey (see Table A2) to determinenegative or positive correlation and then calibrated by ear (by

EEE TRANSACTIONS ON GAMES (PREPRINT) 7 the lead author, who has professional experience composingfor games), to ﬁnd reasonable bounding ranges and coefﬁcientsfor each property. Notes per second ( n s ), mean pitch interval( ¯ n ) and ratio of diatonic to non-diatonic notes (d) in eachphrase were used for the calculation of rewards: R e = 0 . − | ( e/ − ( n s − . / | R h = 0 . − | h − d | R s = | ( s/ − ( n s − . / | R te = | ( te/ − ( n s − . / | R th = 0 . − | ( th/ − (¯ p/ / | R = R e + R h + R s + R te + R th Only melodic fragments resulting from rules with rewardsestimated above a threshold value (

R > . by default)are considered and the resulting modiﬁed melody fragmentsare put through an exhaustive search of transpositions andtemporal shifts to ﬁnd a suitable ﬁt in the score based onthe harmonic context.If a suitable score is achieved, the melody fragment isplaced into the music score and modiﬁcations are madeto the harmonic context. Notes consume all resources fromthe context where they are added, and consume half of theresources of tones directly above and below and a diminishedﬁfth away. By consuming resources from pitches with a highlevel of tension, notes with unpleasant clashes are avoided byother melody agents.This process is repeated for each melody agent every twomeasures. The ﬁrst melody agent produces the melody linewith the highest pitch, the second melody agent produces thelowest pitch melodic lines and subsequent melody agents grad-ually get higher above the lowest melody line until the highestmelody line is reached. This follows common harmonisationand counter-point composition techniques.The maximum range, M r , in semi-tones, between the high-est note of the ﬁrst melody agent and the lowest note of thesecond melody agent is determined by the number of melodyagents in the system ( N ) and the style of music (see Eqn. 3).A ‘Style range-factor’, S r , was set to default values of 1.0,0.8 and 0.7 respectively for jazz, pop and folk music styles,but can be adjusted by composers as desired. M r = 12 × S r × N (3)By setting a minimum harmonic ﬁtness threshold it ispossible that not all agents will be used at all points of time.This is good orchestration practice in any composition, and isoften observed in group improvisations, where musicians willwait until they feel they have something to contribute to thecomposition before playing.The harmony agent can generate chord progressions takingstyle labels as input (jazz, folk, rock and pop). In contrast, themelody agent generates musical suggestions and rates themusing explicit metrics designed to represent different styles.There are many composition factors that inﬂuence style,including instrumentation, rhythmic emphasis, phrasing, har-monic complexity and melodic shape. Instrumentation andperformance components such as swing can be managed byhigher level processes of the AMS but the melody agents them-selves should consider melodic shape, harmony and rhythmic qualities when adding to the composition. In this system asimple style score ( P ) is introduced that rates rhythmic densityemphasis based on style using ad-hoc parameters tuned by ear.Parameters of notes per beat ( n b ) and binary value for phrasestarting off the beat (1 = true, 0 = false) represented by o b . Forjazz, P = | − n b | + o b . For rock and pop, P = | (1 /N ) − n b | .For folk, P = | − n b | .A ﬁnal melody ﬁtness score, M , is calculated as the sumof the style and harmonic ﬁtness scores, i.e. M = H + P .For concepts that do not have a pre-composed theme, atheme is evolved when the ﬁrst edge is connected to its vertex.Themes from the two closest object vertices, as calculatedusing Dijkstra’s algorithm [57], are used to produce a poolof parents, with manipulations used by the XCS classiﬁersapplied to create the population in the pool. Parents are spliced,with a note pitch or rhythm single point mutation probabilityof 0.1. The mutation rate is set to this relatively high value asthe splicing and recombination manipulations of themes occursthrough the operation of melody agents, meaning mutationis the only method of adding new melodic content to thegenerated theme.

4) Percussive Rhythm:

A simple percussive rhythm agentbased on the RNN model used by Hutchings [58] was imple-mented with the lowest melody agent rhythmically doubledby the lowest percussive instrument and fed into the neuralnetwork to produce the other percussive parts in the speciﬁedstyle. VII. I

MPLEMENTATION IN VIDEO GAMES

To test the effectiveness of the AMS, for the intended taskof real-time scoring of game-world events in video games,the model was implemented in two games:

Zelda Mysteryof Solarus (MoS) [59] and

StarCraft II [60]. The gameswere selected for having different mechanics, visual stylesand settings, as well as having open source code or exposedgame-states that could be used to model game-world contents.The activation of objects and environments (Fig. 2), throughexplicit programmatic rules, spreads to affect categories, whichin turn affects parameters of the musical compositions.A list of key concepts that players would be likely to engagewith over the ﬁrst thirty minutes of gameplay was establishedfor each game and activations for concepts and emotions werecalibrated. In Zelda: MoS, concepts such as ‘Grandma’ and‘Bakery’ were given an activation of 100 when on screen,because there was only one of each of these object types in thegame. Other objects such as ‘Heart’ and ‘Big Green Knight’were given activation levels of 20 for appearing on screen andincreased by 10 for each interaction the player character hadwith them. Attacking a knight, or being attacked by knightcounted as an interaction.For StarCraft II, threat was set by the proportion of en-emy units in view compared to friendly units. The rate ofwealth growth was chosen as an activator of happiness with5 activation points added for each 100 units of resourcesmined. Sadness activation increased 5 points by death offriendly units, tenderness activation increased 5 points byunits performing healing functions and excitement activation

EEE TRANSACTIONS ON GAMES (PREPRINT) 8

TABLE IIA

CCEPTED TERMS FOR ESTABLISHED CONCEPTS

Associated concept Accepted termsZelda Outdoors town, village, outdoors, outsideZelda AMS Sword sword, ﬁghting, battleStarcraft No conﬂict building, constructing, mining, peaceStarcraft AMS SCV SCV, building, constructing, mining increased 10 points by the number of enemy units visible.Three environment nodes were created: the player’s base,enemy base and wasteland in-between.Participants were invited to participate in the study on publicsocial media pages for students at Monash University, gamersand game developers in Melbourne, Australia and divided intotwo groups: Group A and Group B, with seventeen participantsin each group. Experiments were conducted in an isolated,quiet room and questionnaires were completed unobserved andanonymously. The reasoning behind this study design was toavoid experience conﬂation by having the same game playedwith different music conditions. Group A played Zelda: MoSwith its original score and StarCraft with a score generatedwith the AMS. Group B played Zelda: MoS with the AMSand StarCraft with the original score. The order that thegames were presented in alternated for each participant andparticipants were not given any information about the musiccondition or system in use.Players would play their ﬁrst game for twenty minutes. Aftertwenty minutes the game would automatically stop and the ﬁrstgame questionnaire would appear. Participants were asked ifthey had played the game before, to describe the game andmusic with words, if they notice the mood of the music changeduring gameplay and to list any game related associations theyhave with a played music theme.One question of the questionnaire included a 10 secondaudio clip that the participant would use to report any game-world contents that they associated with that music. Before thestudy began a list of key words was deﬁned (see Table II) foreach condition to classify a ‘correct’ association, meaning anassociation between music and content that had co-occurrenceduring gameplay or content that was known to trigger themusic. For Starcraft, a recording of the original score usedduring a conﬂict-free segment of gameplay was used forparticipants in group B and the composer written theme for‘SCV’ units played for participants in group A. For Zelda:MoS, a recording of the original score used in outdoor areaswas played to participants in group B and the composer writtentheme for ‘Sword’ was played to participants in group A.Participants were then asked to provide ratings of immer-sion, correlation of music with actions and questions relatedto the quality of the music itself for the gaming session witheach game. After submitting answers participants were invitedto take a rest before completing another game session andquestionnaire with the other game.

TABLE IIIM

ANN -W HITNEY U ANALYSIS OF DIFFERENCE IN REPORTED IMMERSIONAND CORRELATION OF EVENTS AND MUSIC BETWEEN ORIGINAL GAMESCORE AND

AMS

GAMEPLAY CONDITIONS .Immersion Correlationz p η z p η Zelda -1.17 0.121 0.042 -1.58 0.057 0.075Starcraft -1.62 0.053 0.079 -1.60 0.055 0.077

A. Results

Example videos of each game being played with musicgenerated by the AMS are provided for reference. Most of the participants were unfamiliar with the gamesused in the study. For Starcraft II, 17% of users reportedhaving played the game before and only one participant,representing 3% of participants had played Zelda: MoS.A small positive effect (Table III) was observed in bothreported immersion and correlation of music with events inAMS conditions compared to original score conditions for thesame game, for both games.Using a Wilcoxon Signed-rank non-directional test, a sig-niﬁcant positive increase in rank sum ( p < . ) was observedfor both groups in reporting perceived effect of music onimmersion when AMS was used, independent of game choice.Further, in no cases did participants report that AMS generatedmusic ‘greatly’ or ‘somewhat’ reduced immersion.Despite the increase in self-reported correlation of eventsand music, fewer ‘correct’ concept terms were listed for theshort audio samples in the questionnaire in AMS conditionsfor both games. For Zelda correct terms were reported by 70%of participants in original and 53% in AMS conditions. ForStarcraft this rate was 59% for original and 41% for AMS. B. Discussion

The study involved short interactions with a game that doesnot represent the extendend periods of time that many peopleplay games for. This short period also limits the complexityof the spreading activation model that can be created andpersonalisation of the music experience that such complexitywould bring.However, the study shows that the AMS can be successfullyimplemented in games and that self-reported immersion andcorrelation of events with music is increased using the AMSfor short periods. This result supports the use of the spreadingactivation model to combine object, emotion and environmentconcepts. Listening to the music tracks it becomes apparentthat the overall music quality of the AMS is not that of askilled composer, but this goal is outside the scope of thisproject. Musical quality could be enhanced through improve-ments to the composition techniques within the frameworkpresented. VIII. C

ONCLUSIONS

In this paper we have documented a process of identifyingkey issues in scoring dynamic musical content in video games, https://monash.ﬁgshare.com/s/1d863d9aa90ca4a97aab EEE TRANSACTIONS ON GAMES (PREPRINT) 9 ﬁnding suitable emotion models and implementing them in avideo game scoring system. The journey through considerationof alternate modelling approaches, exploratory study, novellistener study and modelling approach, implementation andtesting in games demonstrates the multitude of academic andpragmatic concerns developers can expect to face in producingnew emotion-driven, adaptive scoring systems.By focusing on the criteria of the emotion model forthe speciﬁc application, additional development was requiredin establishing a novel listener study design, but greaterconﬁdence in the suitability of the model was gained andpositive real-world implementation results were achieved. Theﬁeld of affective computer music could beneﬁt from furtherexploration of designed-for-purpose data collection techniqueswith regards to emotion.The results of the game implementation study suggest thatusing affect categories within a knowledge activation modelas a communicative layer between game-world events and themusic composition system can lead to improved immersionand perceived correlation of events with music for gamers,compared to current standard video-game scoring approaches.We argue for the need of a fully integrated, end-to-endapproach to adaptive game music that sees the music engineintegrated at an early stage of development and built onmodels of cognition, music psychology and narrative. Thisapproach has a broad and deep scope. As research in this areacontinues, expert knowledge and pragmatic design decisionswill continue to be needed to produce actual music systemsfrom models and theoretical frameworks. Individual design de-cisions in the implementation presented in this paper, includingthe choice of generative models and calibration of parameters,raise individual research questions that can be pursued withinthe general framework presented or in isolation.

A. Future Work

We intend to undertake further listener and gamer studiesto establish more generalised correlation between music andemotion. These studies support developing more advancedadaptive music systems. The listener study design presentedin this paper gives us a suitable platform for collecting furtherdata for a larger range of music compositions. Positive resultsfrom the gamer survey based on modifying existing games notdesigned for the AMS encourage ground-up implementationsin new games for tighter integration with the mechanics ofspeciﬁc gameplay.The AMS presented assumes that knowledge of upcomingevents is not present. While this scenario may be the casein some open-world games like

Minecraft , many games havewritten narratives that can provide this knowledge a priori .Work by Scirea et al. has demonstrated beneﬁts of musicalforeshadowing in games [61], an important addition that couldbe implemented in this AMS.A

CKNOWLEDGEMENTS

This research was supported by an Australian GovernmentResearch Training Program (RTP) Scholarship. A

PPENDIXTABLE A1L

ISTENER STUDY RANK OF TRACKS BY EMOTION

Track Hap. Exc. Ang. Sad. Ten. Thr.1 25 22 7 1 15 42 7 6 25 28 14 263 8 25 26 11 1 274 4 10 10 27 26 35 9 14 17 10 5 236 14 5 1 29 29 147 15 7 22 23 18 138 2 11 30 20 3 289 13 16 8 8 22 2010 17 9 5 25 25 111 22 21 20 5 7 2412 18 23 19 2 4 1713 29 20 16 4 8 2114 6 27 12 15 11 1915 16 12 27 19 16 916 28 30 21 17 6 2917 5 3 6 30 28 1518 24 28 11 9 20 1619 11 15 18 21 21 1220 20 18 4 18 24 621 10 4 28 22 9 3022 26 19 3 16 19 223 19 17 15 3 13 724 21 1 14 26 30 1025 30 13 24 7 2 2226 27 2 2 13 27 527 1 8 9 24 23 1128 23 26 13 6 10 1829 3 24 29 14 17 2530 12 29 23 12 12 8TABLE A2P

EARSON ’ S CORRELATION COEFFICIENT ( R ) OF MUSIC PITCH PROPERTIESBY EMOTION CATEGORY . C

ORRELATIONS WITH P < IN BOLD . M =M

EAN , SD = S

TANDARD DEVIATION , ST = S

EMI - TONES

Phraselength(M) Phraselength(SD) Pitchrange(ST) Pitchinterval(M) Diatonic(%) Notes/s Notevelocity(M)Ang. 0.032 0.249 0.200 0.326 -0.046 0.270

Exc. 0.256 0.184 -0.110 0.044 0.074 -0.779 -0.121Ten. -0.143 -0.378 -0.011 -0.240 0.016 -0.639 -0.465Thr. 0.105 0.267 0.239 -0.191 0.214 R EFERENCES[1] S. J.-J. Louchart, J. Truesdale, N. Suttie, and R. Aylett, “Emergent narra-tive, past, present and future of an interactive storytelling approach,” in

Interactive Digital Narrative: History, Theory and Practice . Routledge,2015, pp. 185–200.[2] M. G. Boltz, “The cognitive processing of ﬁlm and musical soundtracks,”

Memory & Cognition , vol. 32, no. 7, pp. 1194–1205, 2004.[3] T. Sanders and P. Cairns, “Time perception, immersion and music invideogames,” in

Proceedings of the 24th BCS interaction specialistgroup conference . British Computer Society, 2010, pp. 160–167.[4] J. A. Sloboda, “Music structure and emotional response: Some empiricalﬁndings,”

Psychology of music , vol. 19, no. 2, pp. 110–120, 1991.[5] I. Ekman, “A cognitive approach to the emotional function of gamesound,” in

The Oxford Handbook of Interactive Audio , 2014.

EEE TRANSACTIONS ON GAMES (PREPRINT) 10 [6] G. N. Yannakakis and J. Togelius, “Experience-driven procedural contentgeneration,”

IEEE Transactions on Affective Computing , vol. 2, no. 3,pp. 147–161, 2011.[7] N. Shaker, J. Togelius, and M. J. Nelson,

Procedural content generationin games . Springer, 2016.[8] “Red dead redemption,” Rockstar Games, 2010.[9] XboxViewTV, “Red dead redemption - making of music vidoc.”[10] “No man’s sky,” Hello Games, 2016.[11] M. Epstein, “How ‘no man’s sky’ composes completely original musicfor every player,”

Digital Trends

Psychological review , vol. 82, no. 6, p. 407, 1975.[13] M. Grimshaw,

Game Sound Technology and Player Interaction: Con-cepts and Developments: Concepts and Developments . IGI Global,2010.[14] A. Farnell, “An introduction to procedural audio and its application incomputer games,” in

Audio mostly conference , vol. 23, 2007.[15] P. Lopes, A. Liapis, and G. N. Yannakakis, “Sonancia: Soniﬁcationof procedurally generated game levels,” in

Proceedings of the 1stcomputational creativity and games workshop , 2015.[16] M. Eladhari, R. Nieuwdorp, and M. Fridenfalk, “The soundtrack of yourmind: mind music-adaptive audio for game characters,” in

Proceedingsof the 2006 ACM SIGCHI international conference on Advances incomputer entertainment technology . ACM, 2006, p. 54.[17] A. K. Hoover, W. Cachia, A. Liapis, and G. N. Yannakakis, “Au-dioinspace: Exploring the creative fusion of generative audio, visualsand gameplay,” in

International Conference on Evolutionary and Bio-logically Inspired Music and Art . Springer, 2015, pp. 101–112.[18] N. I. Holtar, M. J. Nelson, and J. Togelius, “Audioverdrive: Exploringbidirectional communication between music and gameplay,” in

ICMC .Citeseer, 2013.[19] M. L. Ib´a˜nez, N. ´Alvarez, and F. Peinado, “Towards an emotion-drivenadaptive system for video game music,” in

International Conference onAdvances in Computer Entertainment . Springer, 2017, pp. 360–367.[20] M. Scirea, J. Togelius, P. Eklund, and S. Risi, “Affective evolutionarymusic composition with metacompose,”

Genetic Programming andEvolvable Machines , vol. 18, no. 4, pp. 433–465, 2017.[21] M. Morton, “Howard shore discusses the passion-play within the twilightsaga: Eclipse score,” https://bit.ly/2SpYWst, 2010.[22] F. F. de Vega, C. Cotta, and E. R. Miranda, “Special issue on evolutionarymusic,” Berlin, 2012.[23] E. R. Miranda and J. Al Biles,

Evolutionary computer music . Springer,2007.[24] S. W. Wilson, “Classiﬁer ﬁtness based on accuracy,”

Evolutionarycomputation , vol. 3, no. 2, pp. 149–175, 1995.[25] J. McCormack, “Artiﬁcial ecosystems for creative discovery,” in

Pro-ceedings of the 9th annual conference on Genetic and evolutionarycomputation . ACM, 2007, pp. 301–307.[26] M. C. Mozer, “Neural network music composition by prediction:Exploring the beneﬁts of psychoacoustic constraints and multi-scaleprocessing,”

Connection Science , vol. 6, no. 2-3, pp. 247–280, 1994.[27] G. Papadopoulos and G. Wiggins, “Ai methods for algorithmic composi-tion: A survey, a critical view and future prospects,” in

AISB Symposiumon Musical Creativity . Edinburgh, UK, 1999, pp. 110–117.[28] J. McCormack, A. C. Eldridge, A. Dorin, and P. McIlwain, “Generativealgorithms for making music: Emergence, evolution, and ecosystems,”in

The Oxford Handbook of Computer Music , R. T. Dean, Ed. NewYork; Oxford: Oxford University Press, 2009, pp. 354–379.[29] D. Eck and J. Schmidhuber, “A ﬁrst look at music composition usingLSTM recurrent neural networks,”

Istituto Dalle Molle Di Studi SullIntelligenza Artiﬁciale , vol. 103, 2002.[30] B. L. Sturm, J. F. Santos, O. Ben-Tal, and I. Korshunova, “Musictranscription modelling and composition using deep learning,” arXivpreprint arXiv:1604.08723 , 2016.[31] A. Van Den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals,A. Graves, N. Kalchbrenner, A. W. Senior, and K. Kavukcuoglu,“Wavenet: A generative model for raw audio.” in

SSW , 2016, p. 125.[32] S. S. Sundar, H. Jia, T. F. Waddell, and Y. Huang, “Toward a theoryof interactive media effects (time),”

The handbook of the psychology ofcommunication technology , pp. 47–86, 2015.[33] J. O. Ryan, M. Mateas, and N. Wardrip-Fruin, “Open design challengesfor interactive emergent narrative,” in

International Conference onInteractive Digital Storytelling . Springer, 2015, pp. 14–26.[34] K. E. Buckley and C. A. Anderson, “A theoretical model of the effectsand consequences of playing video games,”

Playing video games:Motives, responses, and consequences , pp. 363–378, 2006. [35] R. F. Lorch, “Priming and search processes in semantic memory: A testof three models of spreading activation,”

Journal of verbal learning andverbal behavior , vol. 21, no. 4, pp. 468–492, 1982.[36] W. F. Battig and W. E. Montague, “Category norms of verbal items in56 categories a replication and extension of the connecticut categorynorms.”

Journal of experimental Psychology , vol. 80, p. 1, 1969.[37] T. H. Carr, C. McCauley, R. D. Sperber, and C. Parmelee, “Words,pictures, and priming: on semantic activation, conscious identiﬁcation,and the automaticity of information processing,”

Journal of Experimen-tal Psychology: Human Perception and Performance , vol. 8, no. 6, pp.757–777, 1982.[38] R. Ratcliff and G. McKoon, “Priming in item recognition: Evidence forthe propositional structure of sentences,”

Journal of verbal learning andverbal behavior , vol. 17, no. 4, pp. 403–417, 1978.[39] G. H. Bower, “Mood and memory.”

American psychologist , vol. 36,no. 2, p. 129, 1981.[40] J. Sykes and S. Brown, “Affective gaming: measuring emotion throughthe gamepad,” in

CHI’03 extended abstracts on Human factors incomputing systems . ACM, 2003, pp. 732–733.[41] S. Cunningham, V. Grout, and R. Picking, “Emotion, content, andcontext in sound and music,” in

Game sound technology and playerinteraction: Concepts and developments . IGI Global, 2011, pp. 235–263.[42] A. Gabrielsson, “Emotion perceived and emotion felt: Same or differ-ent?”

Musicae Scientiae , vol. 5, no. 1 suppl, pp. 123–147, 2001.[43] P. N. Juslin and P. Laukka, “Expression, perception, and induction ofmusical emotions: A review and a questionnaire study of everydaylistening,”

Journal of New Music Research , vol. 33, no. 3, pp. 217–238,2004.[44] M. Zentner, D. Grandjean, and K. R. Scherer, “Emotions evoked bythe sound of music: characterization, classiﬁcation, and measurement.”

Emotion , vol. 8, no. 4, p. 494, 2008.[45] A. Lykartsis, A. Pysiewicz, H. von Coler, and S. Lepa, “The emotionalityof sonic events: testing the geneva emotional music scale (GEMS) forpopular and electroacoustic music,” in

The 3rd International Conferenceon Music & Emotion, Jyv¨askyl¨a, Finland, June 11-15, 2013 . Universityof Jyv¨askyl¨a, Department of Music, 2013.[46] P. N. Juslin, “What does music express? basic emotions and beyond,”

Frontiers in psychology , vol. 4, p. 596, 2013.[47] J. A. Russell, “A circumplex model of affect.”

Journal of personalityand social psychology , vol. 39, no. 6, p. 1161, 1980.[48] P. G. Hunter, E. G. Schellenberg, and U. Schimmack, “Mixed affectiveresponses to music with conﬂicting cues,”

Cognition & Emotion , vol. 22,no. 2, pp. 327–352, 2008.[49] T. Eerola and J. K. Vuoskoski, “A comparison of the discrete anddimensional models of emotion in music,”

Psychology of Music , vol. 39,no. 1, pp. 18–49, 2011.[50] J. Madsen, B. S. Jensen, and J. Larsen, “Predictive modeling of ex-pressed emotions in music using pairwise comparisons,” in

InternationalSymposium on Computer Music Modeling and Retrieval . Springer,2012, pp. 253–277.[51] M. E. Glickman, “Example of the glicko-2 system,”

Boston University ,2012.[52] M. Wright, “Open sound control: an enabling technology for musicalnetworking,”

Organised Sound , vol. 10, no. 3, pp. 193–200, 2005.[53] A. R. Hevner, “A three cycle view of design science research,”

Scandi-navian journal of information systems , vol. 19, no. 2, p. 4, 2007.[54] R. K. Sawyer, “Group creativity: Musical performance and collabora-tion,”

Psychology of Music , vol. 34, no. 2, pp. 148–165, 2006.[55] P. Hutchings and J. McCormack, “Using autonomous agents to impro-vise music compositions in real-time,” in

International Conference onEvolutionary and Biologically Inspired Music and Art . Springer, 2017,pp. 114–127.[56] H. L. Corp, “The real book - volume vi,” 2017.[57] E. W. Dijkstra, “A note on two problems in connexion with graphs,”

Numerische mathematik , vol. 1, no. 1, pp. 269–271, 1959.[58] P. Hutchings, “Talking drums: Generating drum grooves with neuralnetworks,” arXiv preprint arXiv:1706.09558 , 2017.[59] “Zelda: Mystery of solarus,” Solarus Team, 2011.[60] “Starcraft ii: Wings of liberty,” Blizzard Entertainment, 2010.[61] M. Scirea, Y.-G. Cheong, M. J. Nelson, and B.-C. Bae, “Evaluatingmusical foreshadowing of videogame narrative experiences,” in