Andrés R. Masegosa
Norwegian University of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Andrés R. Masegosa.
intelligent data analysis | 2015
Hanen Borchani; Ana M. Martinez; Andrés R. Masegosa; Helge Langseth; Thomas Dyhre Nielsen; Antonio Salmerón; Antonio Fernández; Anders L. Madsen; Ramón Sáez
An often used approach for detecting and adapting to concept drift when doing classification is to treat the data as i.i.d. and use changes in classification accuracy as an indication of concept drift. In this paper, we take a different perspective and propose a framework, based on probabilistic graphical models, that explicitly represents concept drift using latent variables. To ensure efficient inference and learning, we resort to a variational Bayes inference scheme. As a proof of concept, we demonstrate and analyze the proposed framework using synthetic data sets as well as a real financial data set from a Spanish bank.
IEEE Computational Intelligence Magazine | 2016
Andrés R. Masegosa; Ana M. Martinez; Hanen Borchani
In this paper, we discuss software design issues related to the development of parallel computational intelligence algorithms on multi-core CPUs, using the new Java 8 functional programming features. In particular, we focus on probabilistic graphical models (PGMs) and present the parallelization of a collection of algorithms that deal with inference and learning of PGMs from data. Namely, maximum likelihood estimation, importance sampling, and greedy search for solving combinatorial optimization problems. Through these concrete examples, we tackle the problem of defining efficient data structures for PGMs and parallel processing of same-size batches of data sets using Java 8 features. We also provide straightforward techniques to code parallel algorithms that seamlessly exploit multicore processors. The experimental analysis, carried out using our open source AMIDST (Analysis of MassIve Data STreams) Java toolbox, shows the merits of the proposed solutions.
scandinavian conference on ai | 2015
Hanen Borchani; Ana M. Martinez; Andrés R. Masegosa; Helge Langseth; Thomas Dyhre Nielsen; Antonio Salmerón; Antonio Fernández; Anders L. Madsen; Ramón Sáez
Hanen BORCHANI a,1, Ana M. MARTINEZ a,2,1, Andres R. MASEGOSA b,1, Helge LANGSETH b, Thomas D. NIELSEN a, Antonio SALMERON c, Antonio FERNANDEZ d, Anders L. MADSEN a,e and Ramon SAEZ d aDepartment of Computer Science, Aalborg University, Denmark bDepartment of Computer and Information Science, The Norwegian University of Science and Technology, Norway cDepartment of Mathematics, University of Almeŕia, Spain dBanco de Credito Cooperativo, Spain eHUGIN EXPERT A/S, Aalborg, Denmark
International Journal of Approximate Reasoning | 2016
Andrés R. Masegosa; Ad Feelders; Linda C. van der Gaag
Domain experts can often quite reliably specify the sign of influences between variables in a Bayesian network. If we exploit this prior knowledge in estimating the probabilities of the network, it is more likely to be accepted by its users and may in fact be better calibrated with reality. We present two algorithms that exploit prior knowledge of qualitative influences in learning the parameters of a Bayesian network from incomplete data. The isotonic regression EM, or irEM, algorithm adds an isotonic regression step to standard EM in each iteration, to obtain parameter estimates that satisfy the given qualitative influences. In an attempt to reduce the computational burden involved, we further define the qirEM algorithm that enforces the constraints imposed by the qualitative influences only once, after convergence of standard EM. We evaluate the performance of both algorithms through experiments. Our results demonstrate that exploitation of the qualitative influences improves the parameter estimates over standard EM, and more so if the proportion of missing data is relatively large. The results also show that the qirEM algorithm performs just as well as its computationally more expensive counterpart irEM. Learning Bayesian networks with qualitative influences from incomplete data.Domain knowledge specified in terms of sign of influences between variables.Prior knowledge exploited in combination with partially observed data.Two EM-like procedures are theoretically and empirically evaluated.
Proceedings of the 16th Conference of the Spanish Association for Artificial Intelligence on Advances in Artificial Intelligence - Volume 9422 | 2015
Antonio Salmerón; Darío Ramos-López; Hanen Borchani; Ana M. Martinez; Andrés R. Masegosa; Antonio Fernández; Helge Langseth; Anders L. Madsen; Thomas Dyhre Nielsen
In this paper we analyse the problem of probabilistic inference in CLG networks when evidence comes in streams. In such situations, fast and scalable algorithms, able to provide accurate responses in a short time are required. We consider the instantiation of variational inference and importance sampling, two well known tools for probabilistic inference, to the CLG case. The experimental results over synthetic networks show how a parallel version importance sampling, and more precisely evidence weighting, is a promising scheme, as it is accurate and scales up with respect to available computing resources.
Knowledge Based Systems | 2018
Andrés R. Masegosa; Ana M. Martinez; Darío Ramos-López; Rafael Cabañas; Antonio Salmerón; Helge Langseth; Thomas Dyhre Nielsen; Anders L. Madsen
Abstract The AMIDST Toolbox is an open source Java software for scalable probabilistic machine learning with a special focus on (massive) streaming data. The toolbox supports a flexible modelling language based on probabilistic graphical models with latent variables. AMIDST provides parallel and distributed implementations of scalable algorithms for doing probabilistic inference and Bayesian parameter learning in the specified models. These algorithms are based on a flexible variational message passing scheme, which supports discrete and continuous variables from a wide range of probability distributions.
international conference on data mining | 2016
Rafael Cabañas; Ana M. Martinez; Andrés R. Masegosa; Darío Ramos-López; Antonio Sameron; Thomas Dyhre Nielsen; Helge Langseth; Anders L. Madsen
The AMIDST Toolbox an open source Java 8 library for scalable learning of probabilistic graphical models (PGMs) based on both batch and streaming data. An important application domain with streaming data characteristics is the banking sector, where we may want to monitor individual customers (based on their financial situation and behavior) as well as the general economic climate. Using a real financial data set from a Spanish bank, we have previously proposed and demonstrated a novel PGM framework for performing this type of data analysis with particular focus on concept drift. The framework is implemented in the AMIDST Toolbox, which was also used to conduct the reported analyses. In this paper, we provide an overview of the toolbox and illustrate with code examples how the toolbox can be used for setting up and performing analyses of this particular type.
International Journal of Approximate Reasoning | 2017
Andrés R. Masegosa; Ana M. Martinez; Helge Langseth; Thomas Dyhre Nielsen; Antonio Salmerón; Darío Ramos-López; Anders L. Madsen
Abstract In this paper we present an approach for scaling up Bayesian learning using variational methods by exploiting distributed computing clusters managed by modern big data processing tools like Apache Spark or Apache Flink, which efficiently support iterative map-reduce operations. Our approach is defined as a distributed projected natural gradient ascent algorithm, has excellent convergence properties, and covers a wide range of conjugate exponential family models. We evaluate the proposed algorithm on three real-world datasets from different domains (the Pubmed abstracts dataset, a GPS trajectory dataset, and a financial dataset) and using several models (LDA, factor analysis, mixture of Gaussians and linear regression models). Our approach compares favorably to stochastic variational inference and streaming variational Bayes, two of the main current proposals for scaling up variational methods. For the scalability analysis, we evaluate our approach over a network with more than one billion nodes and approx. 75 % latent variables using a computer cluster with 128 processing units (AWS). The proposed methods are released as part of an open-source toolbox for scalable probabilistic machine learning ( http://www.amidsttoolbox.com ) Masegosa et al. (2017) [29] .
Progress in Artificial Intelligence | 2017
Darío Ramos-López; Andrés R. Masegosa; Ana M. Martinez; Antonio Salmerón; Thomas Dyhre Nielsen; Helge Langseth; Anders L. Madsen
In this paper, we study the maximum a posteriori (MAP) problem in dynamic hybrid Bayesian networks. We are interested in finding the sequence of values of a class variable that maximizes the posterior probability given evidence. We propose an approximate solution based on transforming the MAP problem into a simpler belief update problem. The proposed solution constructs a set of auxiliary networks by grouping consecutive instantiations of the variable of interest, thus capturing some of the potential temporal dependences between these variables while ignoring others. Belief update is carried out independently in the auxiliary models, after which the results are combined, producing a configuration of values for the class variable along the entire time sequence. Experiments have been carried out to analyze the behavior of the approach. The algorithm has been implemented using Java 8 streams, and its scalability has been evaluated.
european conference on artificial intelligence | 2016
Antonio Salmerón; Anders L. Madsen; Frank Jensen; Helge Langseth; Thomas Dyhre Nielsen; Darío Ramos-López; Ana M. Martinez; Andrés R. Masegosa
In this paper we propose a method for scaling up filterbased feature selection in classification problems. We use the conditional mutual information as filter measure and show how the required statistics can be computed in parallel avoiding unnecessary calculations. The distribution of the calculations between the available computing units is determined based on balanced incomplete block designs, a strategy first developed within the area of statistical design of experiments. We show the scalability of our method through a series of experiments on synthetic and real-world datasets.