Marcin Budka
Bournemouth University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Marcin Budka.
Artificial Intelligence Review | 2015
Christiane Lemke; Marcin Budka; Bogdan Gabrys
Metalearning attracted considerable interest in the machine learning community in the last years. Yet, some disagreement remains on what does or what does not constitute a metalearning problem and in which contexts the term is used in. This survey aims at giving an all-encompassing overview of the research directions pursued under the umbrella of metalearning, reconciling different definitions given in scientific literature, listing the choices involved when designing a metalearning system and identifying some of the future research challenges in this domain.
privacy security risk and trust | 2011
Krzysztof Juszczyszyn; Katarzyna Musial; Marcin Budka
We propose a new method for characterizing the dynamics of complex networks with its application to the link prediction problem. Our approach is based on the discovery of network sub graphs (in this study: triads of nodes) and measuring their transitions during network evolution. We define the Triad Transition Matrix (TTM) containing the probabilities of transitions between triads found in the network, then we show how it can help to discover and quantify the dynamic patterns of network evolution. We also propose the application of TTM to link prediction with an algorithm (called TTM-predictor) which shows good performance, especially for sparse networks analyzed in short time scales. The future applications and research directions of our approach are also proposed and discussed.
Scientific Reports | 2016
Matthew R. Bennett; Sally C. Reynolds; Sarita A. Morse; Marcin Budka
The Laetoli site (Tanzania) contains the oldest known hominin footprints, and their interpretation remains open to debate, despite over 35 years of research. The two hominin trackways present are parallel to one another, one of which is a composite formed by at least two individuals walking in single file. Most researchers have focused on the single, clearly discernible G1 trackway while the G2/3 trackway has been largely dismissed due to its composite nature. Here we report the use of a new technique that allows us to decouple the G2 and G3 tracks for the first time. In so doing we are able to quantify the mean footprint topology of the G3 trackway and render it useable for subsequent data analyses. By restoring the effectively ‘lost’ G3 track, we have doubled the available data on some of the rarest traces directly associated with our Pliocene ancestors.
Neurocomputing | 2015
Indre Žliobaite; Marcin Budka; Frederic T. Stahl
Our digital universe is rapidly expanding, more and more daily activities are digitally recorded, data arrives in streams, it needs to be analyzed in real time and may evolve over time. In the last decade many adaptive learning algorithms and prediction systems, which can automatically update themselves with the new incoming data, have been developed. The majority of those algorithms focus on improving the predictive performance and assume that model update is always desired as soon as possible and as frequently as possible. In this study we consider potential model update as an investment decision, which, as in the financial markets, should be taken only if a certain return on investment is expected. We introduce and motivate a new research problem for data streams - cost-sensitive adaptation. We propose a reference framework for analyzing adaptation strategies in terms of costs and benefits. Our framework allows us to characterize and decompose the costs of model updates, and to asses and interpret the gains in performance due to model adaptation for a given learning algorithm on a given prediction task. Our proof-of-concept experiment demonstrates how the framework can aid in analyzing and managing adaptation decisions in the chemical industry.
Entropy | 2011
Marcin Budka; Bogdan Gabrys; Katarzyna Musial
Generalisation error estimation is an important issue in machine learning. Cross-validation traditionally used for this purpose requires building multiple models and repeating the whole procedure many times in order to produce reliable error estimates. It is however possible to accurately estimate the error using only a single model, if the training and test data are chosen appropriately. This paper investigates the possibility of using various probability density function divergence measures for the purpose of representative data sampling. As it turned out, the first difficulty one needs to deal with is estimation of the divergence itself. In contrast to other publications on this subject, the experimental results provided in this study show that in many cases it is not possible unless samples consisting of thousands of instances are used. Exhaustive experiments on the divergence guided representative data sampling have been performed using 26 publicly available benchmark datasets and 70 PDF divergence estimators, and their results have been analysed and discussed.
World Wide Web | 2013
Katarzyna Musial; Marcin Budka; Krzysztof Juszczyszyn
Social networks are an example of complex systems consisting of nodes that can interact with each other and based on these activities the social relations are defined. The dynamics and evolution of social networks are very interesting but at the same time very challenging areas of research. In this paper the formation and growth of one of such structures extracted from data about human activities within online social networking system is investigated. Dynamics of both local and global characteristics are studied. Analysis of the dynamics of the network growth showed that it changes over time—from random process to power-law growth. The phase transition between those two is clearly visible. In general, node degree distribution can be described as the scale-free but it does not emerge straight from the beginning. Social networks are known to feature high clustering coefficient and friend-of-a-friend phenomenon. This research has revealed that in online social network, although the clustering coefficient grows over time, it is lower than expected. Also the friend-of-a-friend phenomenon is missing. On the other hand, the length of the shortest paths is small starting from the beginning of the network existence so the small-world phenomenon is present. The unique element of the presented study is that the data, from which the online social network was extracted, represents interactions between users from the beginning of the social networking site existence. The system, from which the data was obtained, enables users to interact using different communication channels and it gives additional opportunity to investigate multi-relational character of human relations.
IEEE Transactions on Neural Networks | 2013
Marcin Budka; Bogdan Gabrys
Estimation of the generalization ability of a classification or regression model is an important issue, as it indicates the expected performance on previously unseen data and is also used for model selection. Currently used generalization error estimation procedures, such as cross-validation (CV) or bootstrap, are stochastic and, thus, require multiple repetitions in order to produce reliable results, which can be computationally expensive, if not prohibitive. The correntropy-inspired density-preserving sampling (DPS) procedure proposed in this paper eliminates the need for repeating the error estimation procedure by dividing the available data into subsets that are guaranteed to be representative of the input dataset. This allows the production of low-variance error estimates with an accuracy comparable to 10 times repeated CV at a fraction of the computations required by CV. This method can also be used for model ranking and selection. This paper derives the DPS procedure and investigates its usability and performance using a set of public benchmark datasets and standard classifiers.
international conference on conceptual structures | 2010
Marcin Budka; Bogdan Gabrys
Abstract Traditional methods of assessing chemical toxicity of various compounds require tests on animals, which raises ethical concerns and is expensive. Current legislation may lead to a further increase of demand for laboratory animals in the next years. As a result, automatically generated predictions using Quantitative Structure–Activity Relationship (QSAR) modelling approaches appear as an attractive alternative. Due to sparsity of the chemical space, making this kind of predictions is however a difficult task. In this paper we propose a purely data–driven, rigorous and universal methodology of QSAR modelling, based on ensemble of relatively simple ridge regressors trained in various subspaces of the chemical space, selected using an iterative optimization procedure. The model described has been developed without using any domain knowledge and has been evaluated within the Environmental Toxicity Prediction Challenge CADASTER 2009, which has attracted over 100 participants from 25 countries. The presented approach was chosen as one of the First–Pass Winners, with predictive power non-significantly different to the highest ranked method, developed by the experts in the area of QSAR modelling and toxicology.
international symposium on neural networks | 2010
Marcin Budka; Bogdan Gabrys
Estimation of the generalization ability of a predictive model is an important issue, as it indicates expected performance on previously unseen data and is also used for model selection. Currently used generalization error estimation procedures like cross-validation (CV) or bootstrap are stochastic and thus require multiple repetitions in order to produce reliable results, which can be computationally expensive if not prohibitive. The correntropy-based Density Preserving Sampling procedure (DPS) proposed in this paper eliminates the need for repeating the error estimation procedure by dividing the available data into subsets, which are guaranteed to be representative of the input dataset. This allows to produce low variance error estimates with accuracy comparable to 10 times repeated cross-validation at a fraction of computations required by CV, which has been investigated using a set of publicly available benchmark datasets and standard classifiers.
privacy security risk and trust | 2012
Marcin Budka; Katarzyna Musial; Krzysztof Juszczyszyn
This study investigates the data preparation process for predictive modelling of the evolution of complex networked systems, using an e -- mail based social network as an example. In particular, we focus on the selection of optimal time window size for building a time series of network snapshots, which forms the input of chosen predictive models. We formulate this issue as a constrained multi -- objective optimization problem, where the constraints are specific to a particular application and predictive algorithm used. The optimization process is guided by the proposed Windows Incoherence Measures, defined as averaged Jensen-Shannon divergences between distributions of a range of network characteristics for the individual time windows and the network covering the whole considered period of time. The experiments demonstrate that the informed choice of window size according to the proposed approach allows to boost the prediction accuracy of all examined prediction algorithms, and can also be used for optimally defining the prediction problems if some flexibility in their definition is allowed.