Julia Handl
University of Manchester
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Julia Handl.
IEEE Transactions on Evolutionary Computation | 2007
Julia Handl; Joshua D. Knowles
The framework of multiobjective optimization is used to tackle the unsupervised learning problem, data clustering, following a formulation first proposed in the statistics literature. The conceptual advantages of the multiobjective formulation are discussed and an evolutionary approach to the problem is developed. The resulting algorithm, multiobjective clustering with automatic k-determination, is compared with a number of well-established single-objective clustering algorithms, a modern ensemble technique, and two methods of model selection. The experiments demonstrate that the conceptual advantages of multiobjective clustering translate into practical and scalable performance benefits
Artificial Life | 2006
Julia Handl; Joshua D. Knowles; Marco Dorigo
Ant-based clustering and sorting is a nature-inspired heuristic first introduced as a model for explaining two types of emergent behavior observed in real ant colonies. More recently, it has been applied in a data-mining context to perform both clustering and topographic mapping. Early work demonstrated some promising characteristics of the heuristic but did not extend to a rigorous investigation of its capabilities. We describe an improved version, called ATTA, incorporating adaptive, heterogeneous ants, a time-dependent transporting activity, and a method (for clustering applications) that transforms the spatial embedding produced by the algorithm into an explicit partitioning. ATTA is then subjected to the most rigorous experimental evaluation of an ant-based clustering and sorting algorithm undertaken to date: we compare its performance with standard techniques for clustering and topographic mapping using a set of analytical evaluation functions and a range of synthetic and real data collections. Our results demonstrate the ability of ant-based clustering and sorting to automatically identify the number of clusters inherent in a data collection, and to produce high quality solutions; indeed, we show that it is particularly robust for clusters of differing sizes and for overlapping clusters. The results obtained for topographic mapping are, however, disappointing. We provide evidence that the solutions generated by the ant algorithm are barely topology-preserving, and we explain in detail why results havein spite of thisbeen misinterpreted (much more positively) in previous research.
Swarm Intelligence | 2007
Julia Handl; Bernd Meyer
Abstract Clustering with swarm-based algorithms is emerging as an alternative to more conventional clustering methods, such as hierarchical clustering and k-means. Ant-based clustering stands out as the most widely used group of swarm-based clustering algorithms. Broadly speaking, there are two main types of ant-based clustering: the first group of methods directly mimics the clustering behavior observed in real ant colonies. The second group is less directly inspired by nature: the clustering task is reformulated as an optimization task and general purpose ant-based optimization heuristics are utilized to find good or near-optimal clusterings. This papers reviews both approaches and places these methods in the wider context of general swarm-based clustering approaches.
Metabolomics | 2005
Marie Brown; Warwick B. Dunn; David I. Ellis; Royston Goodacre; Julia Handl; Joshua D. Knowles; Steve O'Hagan; Irena Spasic; Douglas B. Kell
Metabolomics, like other omics methods, produces huge datasets of biological variables, often accompanied by the necessary metadata. However, regardless of the form in which these are produced they are merely the ground substance for assisting us in answering biological questions. In this short tutorial review and position paper we seek to set out some of the elements of “best practice” in the optimal acquisition of such data, and in the means by which they may be turned into reliable knowledge. Many of these steps involve the solution of what amount to combinatorial optimization problems, and methods developed for these, especially those based on evolutionary computing, are proving valuable. This is done in terms of a “pipeline” that goes from the design of good experiments, through instrumental optimization, data storage and manipulation, the chemometric data processing methods in common use, and the necessary means of validation and cross-validation for giving conclusions that are credible and likely to be robust when applied in comparable circumstances to samples not used in their generation.
parallel problem solving from nature | 2004
Julia Handl; Joshua D. Knowles
A new approach to data clustering is proposed, in which two or more measures of cluster quality are simultaneously optimized using a multiobjective evolutionary algorithm (EA). For this purpose, the PESA-II EA is adapted for the clustering problem by the incorporation of specialized mutation and initialization procedures, described herein. Two conceptually orthogonal measures of cluster quality are selected for optimization, enabling, for the first time, a clustering algorithm to explore and improve different compromise solutions during the clustering process. Our results, on a diverse suite of 15 real and synthetic data sets – where the correct classes are known – demonstrate a clear advantage to the multiobjective approach: solutions in the discovered Pareto set are objectively better than those obtained when the same EA is applied to optimize just one measure. Moreover, the multiobjective EA exhibits a far more robust level of performance than both the classic k-means and average-link agglomerative clustering algorithms, outperforming them substantially on aggregate.
international conference on evolutionary multi criterion optimization | 2005
Julia Handl; Joshua D. Knowles
In previous work, we have proposed a novel approach to data clustering based on the explicit optimization of a partitioning with respect to two complementary clustering objectives [6]. Here, we extend this idea by describing an advanced multiobjective clustering algorithm, MOCK, with the capacity to identify good solutions from the Pareto front, and to automatically determine the number of clusters in a data set. The algorithm has been subject to a thorough comparison with alternative clustering techniques and we briefly summarize these results. We then present investigations into the mechanisms at the heart of MOCK: we discuss a simple example demonstrating the synergistic effects at work in multiobjective clustering, which explain its superiority to single-objective clustering techniques, and we analyse how MOCKs Pareto fronts compare to the performance curves obtained by single-objective algorithms run with a range of different numbers of clusters specified.
congress on evolutionary computation | 2005
Julia Handl; Joshua D. Knowles
In previous work, the authors have introduced a novel and highly effective approach to data clustering, based on the explicit optimization of a partitioning with respect to two complementary clustering objectives (Handl, et. al., 2004, 2005). In this paper, three modifications were made to the algorithm that improved its scalability to large data sets with high dimensionality and large numbers of clusters. Specifically, new initialization and mutation schemes that enable a more efficient exploration of the search space were introduced, and the null data model that is used as a basis for selecting the most significant solution from the Pareto front was modified. The high performance of the resulting algorithm is demonstrated on a newly developed clustering test suite.
congress on evolutionary computation | 2005
Julia Handl; Joshua D. Knowles
The large majority of existing clustering algorithms are centered around the notion of a feature, that is, individual data items are represented by their intrinsic properties, which are summarized by (usually numeric) feature vectors. However, certain applications require the clustering of data items that are defined by exclusively extrinsic properties: only the relationships between individual data items are known (that is, their similarities or dissimilarities). This paper develops a straightforward and efficient adaptation of our existing multiobjective clustering algorithm to such a scenario. The resulting algorithm is demonstrated on a range of data sets, including a dissimilarity matrix derived from real, non-feature-based data
Proteins | 2012
Julia Handl; Joshua D. Knowles; Robert B. Vernon; David Baker; Simon C. Lovell
In fragment‐assembly techniques for protein structure prediction, models of protein structure are assembled from fragments of known protein structures. This process is typically guided by a knowledge‐based energy function and uses a heuristic optimization method. The fragments play two important roles in this process: they define the set of structural parameters available, and they also assume the role of the main variation operators that are used by the optimiser. Previous analysis has typically focused on the first of these roles. In particular, the relationship between local amino acid sequence and local protein structure has been studied by a range of authors. The correlation between the two has been shown to vary with the window length considered, and the results of these analyses have informed directly the choice of fragment length in state‐of‐the‐art prediction techniques. Here, we focus on the second role of fragments and aim to determine the effect of fragment length from an optimization perspective. We use theoretical analyses to reveal how the size and structure of the search space changes as a function of insertion length. Furthermore, empirical analyses are used to explore additional ways in which the size of the fragment insertion influences the search both in a simulation model and for the fragment‐assembly technique, Rosetta. Proteins 2012.
genetic and evolutionary computation conference | 2006
Julia Handl; Joshua D. Knowles
Semi-supervised classification uses aspects of both unsupervised and supervised learning to improve upon the performance of traditional classification methods. Semi-supervised clustering, in particular, explicitly integrates both information about the data distribution and about class memberships into the clustering process. In this paper, the potential of a multiobjective formulation of the semi-supervised clustering problem is explored, and two evolutionary multiobjective approaches to the problem are outlined. Experimental results demonstrate practical performance benefits of this methodology, including an improved classification performance and an increased robustness towards annotation errors.