A Comparism of the Performance of Supervised and Unsupervised Machine Learning Techniques in evolving Awale/Mancala/Ayo Game Player
IInternational Journal of Game Theory and Technology (IJGTT) , Vol.1,No.1,June 2013 9
A Comparism of the Performance of Supervisedand Unsupervised Machine Learning Techniquesin evolving Awale/Mancala/Ayo Game Player
Randle, O.A ., Ogunduyile, O.O ., Zuva T Fashola N.A Tshwane University of Technology
College Campus Department of Computer Science , Department of Computer Engineering Soshanguve
ABSTRACT
Awale games have become widely recognized across the world, for their innovative strategies andtechniques which were used in evolving the agents (player) and have produced interesting results undervarious conditions. This paper will compare the results of the two major machine learning techniques byreviewing their performance when using minimax, endgame database, a combination of both techniques orother techniques, and will determine which are the best techniques.
KEYWORDS
Awale game ,Supervised, Unsupervised, Minimax, Endgame
1. Introduction
Games are activities of interest to every individual both adults and children. Games are used tolearn skills, prepare for tactical activities such as military training and give individuals the abilityto compete against each other [1,2]. Computer games are an aspect of machine learning, otheraspects of machine learning include robotics, computer vision [3] and machine learning is anaspect of Artificial Intelligence (AI). Computer games include Baganom, Awale, and Chess [4].African board games have assisted children in counting [5] and thinking intelligently andforecasting. Awale as a game comes from the family of MANCALA and can be referred to byvarious names such as Ayo, Ayoayo, Awele, Oware [2]. The aim of the game is to capture moreseeds than the opponent and win the game.Awale is a two – person-zero-sum board game consists of 12 pits on two rows called as usual,North and South, with 4 seeds in each pit at the beginning of a game [6]. The rules appliedinclude a player selects all seeds from a non-empty pit on his row and sows them counter-clockwise into each pit excluding the starting pit [6]. If the last seed is sown into a pit on the opponent’s row, leaving that pit with 2 or 3 seeds, the player captures the seeds in the pit and nternational Journal of Game Theory and Technology (IJGTT) , Vol.1,No.1,June 2013 10 seeds in preceding pits on the opponent’s row that contain 2 or 3 seeds (this is called the 2 -3capture rule). Figure 1. A digital version of Awale Game [7]
A player cannot capture all the seeds on the opponent’s row, so he is obliged to make a move that will give his opponent a move and this is called the golden rule. A controversial rule of Awale,yet to be resolved, is when a player cannot move in such a way that he gives his opponent a legalmove, then either the game is cancelled or the player that caused this stalemate loses the game nomatter his score. The game comes to a conclusion if one of the 3 events occur: • when a player has captured more than 24 seeds, or • when both players have captured 24 seeds leading to a draw or • when fewer seeds circulate endlessly on the board. Case (3) has the followingspecialisation: if there are fewer seeds on the board that neither player can ever capture,but both players will always have a legal move, the game ends and each player isawarded the seeds on his row.Machine Learning can be divided into supervised learning and unsupervised, section 2 willdiscuss supervised machine learning techniques , section 3 will discuss unsupervised machinelearning techniques , section 4 compares the results of various techniques which have been usedto evolve Awale game player[2,7] to see the performance of both to see what components makeor enhance the performance .Various techniques have been implemented to evolve awale playerand these techniques can be classified either based on endgames or search techniqueimplemented.
2. Supervised Machine learning Techniques(SMLT)
Supervised machine learning is based on the idea of creating a machine that can think or reasonoutside the box and be able to produce hypothesis or results [8]. All supervised machine learningtechniques follow a set of designed principles or models which contain problems, identification ofrequired data[9],pre-processing[10], algorithm selection[11], training[12], andevaluation[7].These supervised techniques can be divided or grouped as logical based algorithmssuch as decision trees[13], Perceptron based techniques such as single layered perceptron[14],Statistical learning algorithms such as Bayesian Networks[15],Instance based learning andSupport vector machines[8]. nternational Journal of Game Theory and Technology (IJGTT) , Vol.1,No.1,June 2013 11
Supervised machine techniques have been implemented in evolving Awale game player usingvarious techniques such as Case Based Reasoning(CBR)[16], Linear DiscriminateAlgorithm(LDA)[12], Re-Assisted-Minimax algorithm(RAM)[4], Genetic Algorithm(GA)[17],Co-Evolution (Co-evo)[18]and have produced several results from their performance against theawale shareware and have been sufficiently discussed in previous studies such as[2,7] which hasanalyzed and investigated the limitations of the aforementioned techniques, and the performanceof these techniques are anaysed in table 3.The best performance for supervised learning has come from Case Based Reasoning(CBR)[16]and the refinement procedure called “ casing ” [14] .They were able to defeat Awale sufficiently atthe grandmaster level or stage. Casing is a combination of case-based reasoning [4] andperceptron learning which acted as the basic move classification algorithm. This method assistedthe evolved player by determining the source episodes which are the closest neighborhood to thetarget episode at the training phase [7]. The similarity, sim ( i x and j y ) between two episodes i x and j y was calculated using equation 1 which is the product-moment formula for linearcorrelation coefficient[19]. ( ) ( ) ( ) ( ) ( ) , (cid:229)(cid:229) (cid:229) == = -- --= mk ajjkmk aiik ajjkmk aiikji yyxx yyxxyx sim (1)The evolved player OPON (the name of the evolved player) defeated Awale at all stages butTable 1 shows the performance at the amateur and grandmaster stage/levels. At the grandmasterstage it defeated Awale grandmaster by 25.17 points [14]. Table I. The refinement process (Casing)
CASINGLEVEL AVERAGE(MOVES) SEEDSCAPTURED BYEVOLVEDPLAYER(STD) SEEDSCAPTURED BYAWALE (STD)AMATEUR 48.33(18.8) 25.17(0.41) 14.17(1.60)GRANDMASTER 41.50(2.74) 25.50(0.55) 15.00(1.00)The other successful technique is a combination of minimax search and Case based reasoning(CBR) which is designed or based on the concept of using old technique or ideas to solve a newproblem. This technique uses a reasoner which assists by remembering the previous problem andthe solution which was used to solve the problem [19]. It furthermore combines equation 1 withthe minimax search technique. At the testing phase new episode is discovered and its similarities nternational Journal of Game Theory and Technology (IJGTT) , Vol.1,No.1,June 2013 12 to the source episodes are calculated, where the similarity between ith target episode ( ) xxxxx imiiii ,....,,, = and the source episode ( ) yyyyy jmjjjj ,......,,, = of the Jthclass is computed. Note - That the target episode with game value £ and similarity measure ‡ was selected. The similarity was denoted by Sim ( ) yx ji , between x i and y j wascalculated using the product-moment formula for the linear correlation coefficient.In minimax search the value of a leaf is determined by the evaluator and represents thenumber in proportion to the probability of winning the game. The evaluator can be extended tothe minimax function, which determines the value for each player in a node and is formally givenin (1) as follows [30,31]: ( ) ( )( ) { } ( ) { } (cid:239)(cid:238)(cid:239)(cid:237)(cid:236)= nodeaisnifnofnodechildaisccf nodeaisnifnofnodechildaisccf nodeleafaisnifnevalnf min,min max,max , (2)The function eval(n) scores the resulting board position at each leaf node n. The standard methodof scoring is in terms of a linear polynomial [32]. It has been shown that every game treealgorithm constructs a superposition of a max (T + ) and a min(T _ ) solution tree. The equivalentevaluator is the following Stockman equality [33]: ( ) ( ) { } ( ) { } (cid:239)(cid:238)(cid:239)(cid:237)(cid:236)= ++ -- ninrootedtreeaisTTg ninrootedtreeaisTTgnf maxmin minmax (3)Where the function g is defined by [18]: ( ) ( ){ } ( ) ( ) { } -- ++ == TinalteraisccfTg TinalteraisccfTg minmin minmax (4)Conventionally, the basic idea of minimax algorithm is synonymously related to the followingoptimization procedure. Max player tries as much as possible to increase the minimum value ofthe game, while Min tends to decrease its maximum value at node n as both players play towardsoptimality. The entire process can be formally described by the following extended Stockmanformula (4) below: nternational Journal of Game Theory and Technology (IJGTT) , Vol.1,No.1,June 2013 13 ( ) ( ) { } ( )( ) { } ( ) (cid:239)(cid:238)(cid:239)(cid:237)(cid:236) +-= nodeaisnifnfnofnodechildaisccf nodeaisnifnfnofnodechildaisccfnf max,min min,max (5)The minimax search equation combines with equation 1 to evolve the player where x ai and y aj are the average values of x i and y j , respectively, and m is the number of pits on the Ayo board.Furthermore a tournament was conducted between Minimax, Minimax-CBR and Awale(grandmaster) and the results are shown in Table 2 [16].The results furthermore show that CBRdefeated all its opponents successfully. Table 2.Case Based Reasoning
MINIMAX(STD) AWALE(STD) MOVES(STD) 0VERRIDES16.00(5.27) 26.50(0.53) 68.00(45.33) NOT APPLICABLEMINIMAX MINIMAX-CBR MOVES OVERRIDES7.00(3.16) 28.00(3.16) 38.50(11.92) 10.10(2.23)MINIMAX-CBR AWALE MOVES OVERRIDES25.50(0.53) 15.00(1.05) 42.70(2.31) 24.00(2.11) .
3. Unsupervised machine Learning technique(UMLT)
This form of learning is best based on pattern recognition and clustering, it does not necessitateor need the correct results during training. Its unique characteristic is to find unreavealed patternsor hidden clusters in data sets which assist it in getting the right results. In can be used to clusterthe input data in classes on the basis of their statistical properties only. There is significantclustering presence in unsupervised learning. Unsupervised learning refers to the problem oftrying to find hidden structure in unlabelled data some of the examples include clustering (k-means, mixture models, hierarchical clustering)[20,21,22] and blind signal separation. Thegeneral technique used in unsupervised learning is described in Figure 2.The process can begrouped into 5 stages which are Training, Feature vector, Machine learning algorithm, Model andBetter clustering classification[23]. nternational Journal of Game Theory and Technology (IJGTT) , Vol.1,No.1,June 2013 14Figure 2. Process of Unsupervised Machine Learning [23]
Most researches have been investiging supervised machine learning techniques and paying littleattention to unsupervised learning techniques[8]but some researchers have been able to use thesetechniques to investigate, evolve or develop players to compete against the Awale sharewaresuch as Probabilistic Distance Clustering(PDC)[1], Aggregate Mahalanobis DistanceFunction(ADMF)[24], Retrograde Analysis(RA)[25,26] and have all produced amazing results[2]but retrograde analysis is the only known unsupervised learning technique that has defeatedAwale grandmaster conveniently.Retrograde analysis technique is applicable to search spaces which can be completely enumeratedwithin the memory of a computer system [27].RA first marks all end points such as checkmate,and then by making moves from the end positions works its way back to the positions farthestfrom the end positions, on the way determining the game-theoretical value of all positions in thesearch space. Retrograde analysis searches from bottom-up whereas other algorithms search fromtop-down such as Alpha-beta pruning, Breath-first and depth-first search.the advantage of RA isthe fact that each position in the state space the optimal solution is determined [28], while othertechniques which used top-down search technique only provided the optimal solution for a singlestarting point and the positions on the solution path.The study constructed a database using Godelnumbers of the positions [29] which showed all the available positions in the database. Godelnumbers were further modified to take unreachable positions into report. Each of the databaseentries stored scores between -48 and +48 and occupied 7 bits.The database created by [26] was used to replay and analyse the games (Awale) at the computerOlympiad 2002 where the two strongest Awari/Awale programs competed against each other.The database performed very well and also overtook the playing strength of the world championat the time, due to the fact that it stored scores rather than the best moves in the database [25].The study realized that it was not always clear which move to take since there were multiplemoves available that had very good scores. RA[25,26] performed well against Awale sharewareusings 889,063,398,406 positions enumerating all the possible states that can occur in the game.The database took 51 hours to construct on a 144 processor 1 GHz Pentium III cluster which wasequipped with 72GB main memory and a 2 Gbs network. To ensure that the verification of the nternational Journal of Game Theory and Technology (IJGTT) , Vol.1,No.1,June 2013 15 database the results were compared with results from two algorithms, different number ofprocessors and the results obtained from other researchers and all these were done consistently.This technique has 2 major disadvantages (1) that it was too expensive to implement sinceAwari/Awale positions occurred in Billions and therefore such methods cannot be easilyimplemented on a small memory device like wireless handset [16] and . (2) The techniquerequires a huge amount of CPU time and internal memory which is caused by several expensiveoperations that are applied at each entry.
4. Analysis of supervised and unsupervised machine learningtechniques used in evolving Awale player
Table 3 provides a performance analysis of popular machine learning techniques at various stagesof the game, the table indicates what techniques were being used to evolve the agent/player,Table 3 shows the performance of the various agents(evolved players) and informs if the evolvedplayer used minimax search technique or used endgame database or both. In the table the sign ( √ ) represents the fact that the process was successful or the technique or method was used while thesign(×) stands for unsuccessful or that technique was not used.
Table 3.performance of various Awale game playersnternational Journal of Game Theory and Technology (IJGTT) , Vol.1,No.1,June 2013 16METHOD MLT TECHNIQUE STAGESEVOLVED SMLT UMLT MINIMAX ENDGAME INITIATION BEGINNER AMATEUR GRANDMASTERCBR √ × √ √ √ √ √ √ RAM-BPR √ × √ √ √ √ √ ×RAM-PRIORITY √ × √ √ √ √ √ ×RAM-CASING √ × √ √ √ √ √ √ ADMF × √ √ √ √ √ √ ×PDC × √ √ √ √ √ √ ×RA × √ × √ √ √ √ √ CO-EVO √ × √ × √ √ √ ×GA √ × √ × √ √ √ ×NN √ × × × √ √ √ ×LDA √ × √ √ √ √ √ × All unsupervised machine learning techniques employed the use of databases which supportedand enhanced their performance unlike some supervised techniques which did not employ the useof an endgame database, also there was no evolved player that was able to defeat the Awaleshareware (grandmaster) conveniently without using the endgame databases to improve itsperformance. There is room for further improvement for the unsupervised machine learningtechniques provided they improve their endgame database so as to enhance their performanceagainst the Awale shareware. nternational Journal of Game Theory and Technology (IJGTT) , Vol.1,No.1,June 2013 17
Conclusion
This has been an interesting study and the comparism of the various popular machine learningtechniques in evolving Awale game player. Further studies and investigations will take an in-depth look at the various algorithms which have been used and what were the issues limiting theirperformance.
References
1. Randle, O.A., Zuva K. (2012), Familiarising Probabilistic Distance Clustering System of EvolvingAwale Player: International Journal on Applications of Graph Theory in Wireless Ad hocNetworks and Sensor Networks (GRAPH-HOC), Volume 4 , Number 2/3 , pages: 29-372. Randle, O.A., Queen Sello Mirian, Ngungu Mercy (2013). An Overview of Unsupervisedmachine learning Techniques in Evolving mancala game Player, 3 rd International Conference onAdvances in Information and Mobile Communication .pp 453-462. Publisher Springer LectureNotes in Computer Science(LNCS)[INPRINT]3. Zuva, K., Modupe, A., Pretorious A.,Zuva, T.,Mapuka T ( 2012)Effective and EfficientDis(Similarity) and Retrival Algorithms Used to Access Large Image Database. InternationalConference on Computer Science,Enginnering and Technology (ICCSET)4. Olugbara O.O, Adigun, M.O.,Ojo, S.O and Adewoye,T.O(2006) An Investigation of MinimaxSearch Techniques for Evolving Ayo/Awari Player. Proceeding of IEEE-ICICT 4 th InternationalConference on Information and Communication technology, Cairo, Egypt5. Agbinya, J.I 2010 Computer Board games of Africa( Algorithms, strategies, Rules)6. Adewoye, T.O. 1990. On certain Combinatorial Number Theoretic aspects of the African Game ofAyo, AMSE Review, 14(2), 41-637. Randle O.A, Olugbara, O.O and Lall, M(2013). An Overview of Supervised machine LearningTechniques in Evolving Mancala Game Player. Proceedings of IEEE-ICCSEE 2 nd InternationalConference on Computer Science and Electronic Engineering, Hangzhou, China. pp 0978-0983.8. Kotsiantis, 2007.Supervised machine learning :A review of classification techniques, informatica,vol 31, pp 249-2689. Zhang , s.,Zhang,C., Yang,Q (2002). Data preparation for data mining. Applied ArtificialIntelligence, Volume 17, pp 375-38110. Hodge ,Vand Austin,J 2004) A survey of outlier detection methodologies, Artificial intelligencereview, vol 22, issue 2, pp 85-12611. Japkowick , N and Stephen,S 2002. The class imbalance problem; A systematic study IntelligentData Analysis, Vol 6, No 5.12. Olugbara ,O.O., Gbadeyan, J.A., Adewoye, T.O and Mbatu, K.(2006). Formal Characteristics ofAyo Game Using Linear Discriminant Function and Equivalence Relations. NTMCS ConferenceProceedings13. Murthy, 1998. Automatic Construction of decision Trees from Data: A Multi-Disciplinary Survey,Data Mining and Knowlwedge discovery 2:345-38914. Rosenblatt,F 1962. Principles of neurodynamics, Spartan, newYork15. Jensen, F1996,An Introduction to Bayesian networks, Springer16. Olugbara, Adigun, M.O.,Ojo,S.O and Adewoye,T.O(2007). An efficient heuristic for evolving anagent in the strategy game of Ayo.International Computer Games Association journal, 30,92-9617. Daoud, M., Kharma, N., Haidar, A. and Popoola, J. (2004) . Ayo the Awari Player, or How BetterRepresentation Trumps Deeper Search,
Proceedings of the 2004 IEEE Congress on EvolutionaryComputation , 1001-1006.18. Davis, J.E. and Kendall, G.(2002) An Investigation, using co-evolution, to evolve an AwariPlayer. In proceedings of Congress on Evolutionary Computation (CEC 2002), pp. 1408-1413nternational Journal of Game Theory and Technology (IJGTT) , Vol.1,No.1,June 2013 1819. koloner, J.L., Case – Base Reasoning. Morgan Kaufmann, 1993
MACQUEEN, J. 1967. Some methods for classification and analysis of multivariate observations.
In proceedings of the fifth Berkeley Symposium on mathematical Statistics and probability.
21. Hartigan, J . . Clustering algorithms. Newyork.
JARDINE, N., SIBSON,R. 1971. Mathematical taxonomy . Newyork, John Wiley and Sons.
23. KHALED, A. K., JAMIE, A., TEIXERA DE SILVA. 2006. Biology of Calendula Officinalislinn:Focus on Pharmacology.
Biological Activities and Agronomic Practices . Journal of Medicinaland Aromatic Plant Sciences and Biotechnology, pp 12-25.24. Randle, O.A.(2012). An Efficient hybrid Algorithm to Evolve an Awale Player. Proceedings ofACM-CCSEIT 2nd International Conference on Computational Science, Engineering andInformation Technology, pp 434-438 , ISBN: 978-1-4503-1310-0 Coimbatore, India.25. Romein J.W and Bal,H.C(2002). Awari is Solved. International Computer Games AssociationJournal, 25(3), 162-16526. Romein, J. W and Bal,H.C 2003.Solving the game of Awari using Parallel Retrograde Analysis.In Proceedings of IEEE Computer Society, Los Angeles, USA, 36(10), 23-3327. BAL, H., ALLIS, L.V 1995. Parallel Retrograde Analysis on a Distributed System .
Proceedings ofACM/IEEE conference on Supercomputing
Springer LNCS , 2063:89-98.29. ALLIS, V., HERIK, J. V. D. & HERSCHBERG, B. 1991. Which Games Will Survive? Heuristicprogramming in Artificial Intelligence 2.
The second Computer Olympiad, ,,