Is this you? Create Your Porfile

Marcin Szubert

Poznań University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marcin Szubert is active.

Explore More

Publication

Featured researches published by Marcin Szubert.

IEEE Transactions on Computational Intelligence and Ai in Games | 2013

On Scalability, Generalization, and Hybridization of Coevolutionary Learning: A Case Study for Othello

Marcin Szubert; Wojciech Jaskowski; Krzysztof Krawiec

This study investigates different methods of learning to play the game of Othello. The main questions posed concern scalability of algorithms with respect to the search space size and their capability to generalize and produce players that fare well against various opponents. The considered algorithms represent strategies as n-tuple networks, and employ self-play temporal difference learning (TDL), evolutionary learning (EL) and coevolutionary learning (CEL), and hybrids thereof. To assess the performance, three different measures are used: score against an a priori given opponent (a fixed heuristic strategy), against opponents trained by other methods (round-robin tournament), and against the top-ranked players from the online Othello League. We demonstrate that although evolutionary-based methods yield players that fare best against a fixed heuristic player, it is the coevolutionary temporal difference learning (CTDL), a hybrid of coevolution and TDL, that generalizes better and proves superior when confronted with a pool of previously unseen opponents. Moreover, CTDL scales well with the size of representation, attaining better results for larger n-tuple networks. By showing that a strategy learned in this way wins against the top entries from the Othello League, we conclude that it is one of the best 1-ply Othello players obtained to date without explicit use of human knowledge.

genetic and evolutionary computation conference | 2013

Improving coevolution by random sampling

Wojciech Jaśkowski; Paweł Liskowski; Marcin Szubert; Krzysztof Krawiec

Recent developments cast doubts on the effectiveness of coevolutionary learning in interactive domains. A simple evolution with fitness evaluation based on games with random strategies has been found to generalize better than competitive coevolution. In an attempt to investigate this phenomenon, we analyze the utility of random opponents for one and two-population competitive coevolution applied to learning strategies for the game of Othello. We show that if coevolution uses two-population setup and engages also random opponents, it is capable of producing equally good strategies as evolution with random sampling for the expected utility performance measure. To investigate the differences between analyzed methods, we introduce performance profile, a tool that measures the players performance against opponents of various strength. The profiles reveal that evolution with random sampling produces players coping well with mediocre opponents, but playing relatively poorly against stronger ones. This finding explains why in the round-robin tournament, evolution with random sampling is one of the worst methods from all those considered in this study.

genetic and evolutionary computation conference | 2011

Learning n-tuple networks for othello by coevolutionary gradient search

Krzysztof Krawiec; Marcin Szubert

We propose Coevolutionary Gradient Search, a blueprint for a family of iterative learning algorithms that combine elements of local search and population-based search. The approach is applied to learning Othello strategies represented as n-tuple networks, using different search operators and modes of learning. We focus on the interplay between the continuous, directed, gradient-based search in the space of weights, and fitness-driven, combinatorial, coevolutionary search in the space of entire n-tuple networks. In an extensive experiment, we assess both the objective and relative performance of algorithms, concluding that the hybridization of search techniques improves the convergence. The best algorithms not only learn faster than constituent methods alone, but also produce top ranked strategies in the online Othello League.

International Journal of Applied Mathematics and Computer Science | 2011

Evolving small-board Go players using coevolutionary temporal difference learning with archives

Krzysztof Krawiec; Wojciech Jaśkowski; Marcin Szubert

Evolving small-board Go players using coevolutionary temporal difference learning with archives We apply Coevolutionary Temporal Difference Learning (CTDL) to learn small-board Go strategies represented as weighted piece counters. CTDL is a randomized learning technique which interweaves two search processes that operate in the intra-game and inter-game mode. Intra-game learning is driven by gradient-descent Temporal Difference Learning (TDL), a reinforcement learning method that updates the board evaluation function according to differences observed between its values for consecutively visited game states. For the inter-game learning component, we provide a coevolutionary algorithm that maintains a sample of strategies and uses the outcomes of games played between them to iteratively modify the probability distribution, according to which new strategies are generated and added to the sample. We analyze CTDLs sensitivity to all important parameters, including the trace decay constant that controls the lookahead horizon of TDL, and the relative intensity of intra-game and inter-game learning. We also investigate how the presence of memory (an archive) affects the search performance, and find out that the archived approach is superior to other techniques considered here and produces strategies that outperform a handcrafted weighted piece counter strategy and simple liberty-based heuristics. This encouraging result can be potentially generalized not only to other strategy representations used for small-board Go, but also to various games and a broader class of problems, because CTDL is generic and does not rely on any problem-specific knowledge.

genetic and evolutionary computation conference | 2015

High-Dimensional Function Approximation for Knowledge-Free Reinforcement Learning: a Case Study in SZ-Tetris

Wojciech Jaśkowski; Marcin Szubert; Paweł Liskowski; Krzysztof Krawiec

SZ-Tetris, a restricted version of Tetris, is a difficult reinforcement learning task. Previous research showed that, similarly to the original Tetris, value function-based methods such as temporal difference learning, do not work well for SZ-Tetris. The best performance in this game was achieved by employing direct policy search techniques, in particular the cross-entropy method in combination with handcrafted features. Nonetheless, a simple heuristic hand-coded player scores even higher. Here we show that it is possible to equal its performance with CMA-ES (Covariance Matrix Adaptation Evolution Strategy). We demonstrate that further improvement is possible by employing systematic n-tuple network, a knowledge-free function approximator, and VD-CMA-ES, a linear variant of CMA-ES for high dimension optimization. Last but not least, we show that a large systematic n-tuple network (involving more than 4 million parameters) allows the classical temporal difference learning algorithm to obtain similar average performance to VD-CMA-ES, but at 20 times lower computational expense, leading to the best policy for SZ-Tetris known to date. These results enrich the current understanding of difficulty of SZ-Tetris, and shed new light on the capabilities of particular search paradigms when applied to representations of various characteristics and dimensionality.

genetic and evolutionary computation conference | 2013

Shaping fitness function for evolutionary learning of game strategies

Marcin Szubert; Wojciech Jaśkowski; Paweł Liskowski; Krzysztof Krawiec

In evolutionary learning of game-playing strategies, fitness evaluation is based on playing games with certain opponents. In this paper we investigate how the performance of these opponents and the way they are chosen influence the efficiency of learning. For this purpose we introduce a simple method for shaping the fitness function by sampling the opponents from a biased performance distribution. We compare the shaped function with existing fitness evaluation approaches that sample the opponents from an unbiased performance distribution or from a coevolving population. In an extensive computational experiment we employ these methods to learn Othello strategies and assess both the absolute and relative performance of the elaborated players. The results demonstrate the superiority of the shaping approach, and can be explained by means of performance profiles, an analytical tool that evaluate the evolved strategies using a range of variably skilled opponents.

european conference on applications of evolutionary computation | 2014

Multi-Criteria Comparison of Coevolution and Temporal Difference Learning on Othello

Wojciech Jaskowski; Marcin Szubert; Paweł Liskowski

We compare Temporal Difference Learning (TDL) with Coevolutionary Learning (CEL) on Othello. Apart from using three popular single-criteria performance measures: (i) generalization performance or expected utility, (ii) average results against a hand-crafted heuristic and (iii) result in a head to head match, we compare the algorithms using performance profiles. This multi-criteria performance measure characterizes player’s performance in the context of opponents of various strength. The multi-criteria analysis reveals that although the generalization performance of players produced by the two algorithms is similar, TDL is much better at playing against strong opponents, while CEL copes better against weak ones. We also find out that the TDL produces less diverse strategies than CEL. Our results confirms the usefulness of performance profiles as a tool for comparison of learning algorithms for games.

congress on evolutionary computation | 2010

Coevolutionary Temporal Difference Learning for small-board Go

Krzysztof Krawiec; Marcin Szubert

In this paper we apply Coevolutionary Temporal Difference Learning (CTDL), a hybrid of coevolutionary search and reinforcement learning proposed in our former study, to evolve strategies for playing the game of Go on small boards (5×5). CTDL works by interlacing exploration of the search space provided by one-population competitive coevolution and exploitation by means of temporal difference learning. Despite using simple representation of strategies (weighted piece counter), CTDL proves able to evolve players that defeat solutions found by its constituent methods. The results of the conducted experiments indicate that our algorithm turns out to be superior to pure coevolution and pure temporal difference learning, both in terms of performance of the elaborated strategies and the computational cost. This demonstrates the existence of synergistic interplay between components of CTDL, which we also briefly discuss in this study.

International Journal of Applied Mathematics and Computer Science | 2016

The performance profile: A multi-criteria performance evaluation method for test-based problems

Wojciech Jaśkowski; Paweł Liskowski; Marcin Szubert; Krzysztof Krawiec

Abstract In test-based problems, solutions produced by search algorithms are typically assessed using average outcomes of interactions with multiple tests. This aggregation leads to information loss, which can render different solutions apparently indifferent and hinder comparison of search algorithms. In this paper we introduce the performance profile, a generic, domain-independent, multi-criteria performance evaluation method that mitigates this problem by characterizing the performance of a solution by a vector of outcomes of interactions with tests of various difficulty. To demonstrate the usefulness of this gauge, we employ it to analyze the behavior of Othello and Iterated Prisoner’s Dilemma players produced by five (co)evolutionary algorithms as well as players known from previous publications. Performance profiles reveal interesting differences between the players, which escape the attention of the scalar performance measure of the expected utility. In particular, they allow us to observe that evolution with random sampling produces players coping well against the mediocre opponents, while the coevolutionary and temporal difference learning strategies play better against the high-grade opponents. We postulate that performance profiles improve our understanding of characteristics of search algorithms applied to arbitrary test-based problems, and can prospectively help design better methods for interactive domains.

IEEE Transactions on Computational Intelligence and Ai in Games | 2016

Coevolutionary CMA-ES for Knowledge-Free Learning of Game Position Evaluation

Wojciech Jaskowski; Marcin Szubert

One weakness of coevolutionary algorithms observed in knowledge-free learning of strategies for adversarial games has been their poor scalability with respect to the number of parameters to learn. In this paper, we investigate to what extent this problem can be mitigated by using Covariance Matrix Adaptation Evolution Strategy, a powerful continuous optimization algorithm. In particular, we employ this algorithm in a competitive coevolutionary setup, denoting this setting as Co-CMA-ES. We apply it to learn position evaluation functions for the game of Othello and find out that, in contrast to plain (co)evolution strategies, Co-CMA-ES learns faster, finds superior game-playing strategies and scales better. Its advantages come out into the open especially for large parameter spaces of tens of hundreds of dimensions. For Othello, combining Co-CMA-ES with experimentally-tuned derandomized systematic n-tuple networks significantly improved the current state of the art. Our best strategy outperforms all the other Othello 1-ply players published to date by a large margin regardless of whether the round-robin tournament among them involves a fixed set of initial positions or the standard initial position but randomized opponents. These results show a large potential of CMA-ES-driven coevolution, which could be, presumably, exploited also in other games.

Explore More