Stephen Dignum | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stephen Dignum is active.

Explore More

Publication

Featured researches published by Stephen Dignum.

genetic and evolutionary computation conference | 2007

Generalisation of the limiting distribution of program sizes in tree-based genetic programming and analysis of its effects on bloat

Stephen Dignum; Riccardo Poli

Recent research [1] has found that standard sub-tree crossover with uniform selection of crossover points, in the absence of fitness pressure, pushes a population of GP trees towards a Lagrange distribution of tree sizes. However, the result applied to the case of single arity function plus leaf node combinations, e.g., unary, binary, ternary, etc trees only. In this paper we extend those findings and show that the same distribution is also applicable to the more general case where the function set includes functions of mixed arities. We also provide empirical evidence that strongly corroborates this generalisation. Both predicted and observed results show a distinct bias towards the sampling of shorter programs irrespective of the mix of function arities used. Practical applications and implications of this knowledge are investigated with regard to search efficiency and program bloat. Work is also presented regarding the applicability of the theory to the traditional 90%-function 10%-terminal crossover node selection policy.

european conference on genetic programming | 2007

On the limiting distribution of program sizes in tree-based genetic programming

Riccardo Poli; William B. Langdon; Stephen Dignum

We provide strong theoretical and experimental evidence that standard sub-tree crossover with uniform selection of crossover points pushes a population of a-ary GP trees towards a distribution of tree sizes of the form: Pr{n} = (1 - apa) (an+1 n) (1-pa)(a-1)n+1 pan where n is the number of internal nodes in a tree and pa is a constant. This result generalises the result previously reported for the case a = 1.

european conference on genetic programming | 2008

Operator equalisation and bloat free GP

Stephen Dignum; Riccardo Poli

Research has shown that beyond a certain minimum program length the distributions of program functionality and fitness converge to a limit. Before that limit, however, there may be program-length classes with a higher or lower average fitness than that achieved beyond the limit. Ideally, therefore, GP search should be limited to program lengths that are within the limit and that can achieve optimum fitness. This has the dual benefits of providing the simplest/smallest solutions and preventing GP bloat thus shortening run times. Here we introduce a novel and simple technique, which we call Operator Equalisation, to control how GP will sample certain length classes. This allows us to finely and freely bias the search towards shorter or longer programs and also to search specific length classes during a GP run. This gives the user total control on the program length distribution, thereby completely freeing GP from bloat. Results show that we can automatically identify potentially optimal solution length classes quickly using small samples and that, for particular classes of problems, simple length biases can significantly improve the best fitness found during a GP run.

Genetic Programming and Evolvable Machines | 2012

Operator equalisation for bloat free genetic programming and a survey of bloat control methods

Sara Silva; Stephen Dignum; Leonardo Vanneschi

Bloat can be defined as an excess of code growth without a corresponding improvement in fitness. This problem has been one of the most intensively studied subjects since the beginnings of Genetic Programming. This paper begins by briefly reviewing the theories explaining bloat, and presenting a comprehensive survey and taxonomy of many of the bloat control methods published in the literature through the years. Particular attention is then given to the new Crossover Bias theory and the bloat control method it inspired, Operator Equalisation (OpEq). Two implementations of OpEq are described in detail. The results presented clearly show that Genetic Programming using OpEq is essentially bloat free. We discuss the advantages and shortcomings of each different implementation, and the unexpected effect of OpEq on overfitting. We observe the evolutionary dynamics of OpEq and address its potential to be extended and integrated into different elements of the evolutionary process.

Information Processing and Management | 2012

Automatically structuring domain knowledge from text: An overview of current research

Malcolm Clark; Yunhyong Kim; Udo Kruschwitz; Dawei Song; Dyaa Albakour; Stephen Dignum; Ulises Cerviño Beresi; Maria Fasli; Anne N. De Roeck

This paper presents an overview of automatic methods for building domain knowledge structures (domain models) from text collections. Applications of domain models have a long history within knowledge engineering and artificial intelligence. In the last couple of decades they have surfaced noticeably as a useful tool within natural language processing, information retrieval and semantic web technology. Inspired by the ubiquitous propagation of domain model structures that are emerging in several research disciplines, we give an overview of the current research landscape and some techniques and approaches. We will also discuss trade-offs between different approaches and point to some recent trends.

european conference on genetic programming | 2009

Extending Operator Equalisation: Fitness Based Self Adaptive Length Distribution for Bloat Free GP

Sara Silva; Stephen Dignum

Operator equalisation is a recent bloat control technique that allows accurate control of the program length distribution during a GP run. By filtering which individuals are allowed in the population, it can easily bias the search towards smaller or larger programs. This technique achieved promising results with different predetermined target length distributions, using a conservative program length limit. Here we improve operator equalisation by giving it the ability to automatically determine and follow the ideal length distribution for each stage of the run, unconstrained by a fixed maximum limit. Results show that in most cases the new technique performs a more efficient search and effectively reduces bloat, by achieving better fitness and/or using smaller programs. The dynamics of the self adaptive length distributions are briefly analysed, and the overhead involved in following the target distribution is discussed, advancing simple ideas for improving the efficiency of this new technique.

european conference on genetic programming | 2008

Crossover, sampling, bloat and the harmful effects of size limits

Stephen Dignum; Riccardo Poli

Recent research [9,2] has enabled the accurate prediction of the limiting distribution of tree sizes for Genetic Programming with standard sub-tree swapping crossover when GP is applied to a flat fitness landscape. In that work, however, tree sizes are measured in terms of number of internal nodes. While the relationship between internal nodes and length is one-to-one for the case of a-ary trees, it is much more complex in the case of mixed arities. So, practically the length bias of subtree crossover remains unknown. This paper starts to fill this theoretical gap, by providing accurate estimates of the limiting distribution of lengths approached by tree-based GP with standard crossover in the absence of selection pressure. The resulting models confirm that short programs can be expected to be heavily resampled. Empirical validation shows that this is indeed the case. We also study empirically how the situation is modified by the application of program length limits. Surprisingly, the introduction of such limits further exacerbates the effect. However, this has more profound consequences than one might imagine at first. We analyse these consequences and predict that, in the presence of fitness, size limits may initially speed up bloat, almost completely defeating their original purpose (combating bloat). Indeed, experiments confirm that this is the case for the first 10 or 15 generations. This leads us to suggest a better way of using size limits. Finally, this paper proposes a novel technique to counteract bloat, sampling parsimony, the application of a penalty to resampling.

european conference on genetic programming | 2008

The effects of constant neutrality on performance and problem hardness in GP

Edgar Galván-López; Stephen Dignum; Riccardo Poli

The neutral theory of molecular evolution and the associated notion of neutrality have interested many researchers in Evolutionary Computation. The hope is that the presence of neutrality can aid evolution. However, despite the vast number of publications on neutrality, there is still a big controversy on its effects. The aim of this paper is to clarify under what circumstances neutrality could aid Genetic Programming using the traditional representation (i.e. treelike structures). For this purpose, we use fitness distance correlation as a measure of hardness. In addition we have conducted extensive empirical experimentation to corroborate the fitness distance correlation predictions. This has been done using two test problems with very different landscape features that represent two extreme cases where the different effects of neutrality can be emphasised. Finally, we study the distances between individuals and global optimum to understand how neutrality affects evolution (at least with the one proposed in this paper).

european conference on genetic programming | 2010

Sub-tree swapping crossover and arity histogram distributions

Stephen Dignum; Riccardo Poli

Recent theoretical work has characterised the search bias of GP sub-tree swapping crossover in terms of program length distributions, providing an exact fixed point for trees with internal nodes of identical arity. However, only an approximate model (based on the notion of average arity) for the mixed-arity case has been proposed. This leaves a particularly important gap in our knowledge because multi-arity function sets are commonplace in GP and deep lessons could be learnt from the fixed point. In this paper, we present an accurate theoretical model of program length distributions when mixed-arity function sets are employed. The new model is based on the notion of an arity histogram, a count of the number of primitives of each arity in a program. Empirical support is provided and a discussion of the model is used to place earlier findings into a more general context.

parallel problem solving from nature | 2008

Sub-tree Swapping Crossover, Allele Diffusion and GP Convergence

Stephen Dignum; Riccardo Poli

We provide strong evidence that sub-tree swapping crossover when applied to tree-based representations will cause alleles node labels to diffuse within length classes. For a-ary trees we provide further confirmation that all programs are equally likely to be sampled within any length class when sub-tree swapping crossover is applied in the absence of selection and mutation. Therefore, we propose that this form of search is unbiased - within length classes - for a-ary trees. Unexpectedly, however, for mixed-arity trees this is not found and a more complicated form of search is taking place where certain tree shapes, hence programs, are more likely to be sampled than others within each class. We examine the reasons for such shape bias in mixed arity representations and provide the practitioner with a thorough examination of sub-tree swapping crossover bias. The results of this, when combined with crossover length bias research, explain Genetic Programmings lack of structural convergence during later stages of an experimental run. Several operators are discussed where a broader form of convergence may be detected in a similar way to that found in Genetic Algorithm experimentation.

Explore More