Ana M. Martinez | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ana M. Martinez is active.

Explore More

Publication

Featured researches published by Ana M. Martinez.

intelligent data analysis | 2015

Modeling concept drift: A probabilistic graphical model based approach

Hanen Borchani; Ana M. Martinez; Andrés R. Masegosa; Helge Langseth; Thomas Dyhre Nielsen; Antonio Salmerón; Antonio Fernández; Anders L. Madsen; Ramón Sáez

An often used approach for detecting and adapting to concept drift when doing classification is to treat the data as i.i.d. and use changes in classification accuracy as an indication of concept drift. In this paper, we take a different perspective and propose a framework, based on probabilistic graphical models, that explicitly represents concept drift using latent variables. To ensure efficient inference and learning, we resort to a variational Bayes inference scheme. As a proof of concept, we demonstrate and analyze the proposed framework using synthetic data sets as well as a real financial data set from a Spanish bank.

IEEE Computational Intelligence Magazine | 2016

Probabilistic Graphical Models on Multi-Core CPUs Using Java 8

Andrés R. Masegosa; Ana M. Martinez; Hanen Borchani

In this paper, we discuss software design issues related to the development of parallel computational intelligence algorithms on multi-core CPUs, using the new Java 8 functional programming features. In particular, we focus on probabilistic graphical models (PGMs) and present the parallelization of a collection of algorithms that deal with inference and learning of PGMs from data. Namely, maximum likelihood estimation, importance sampling, and greedy search for solving combinatorial optimization problems. Through these concrete examples, we tackle the problem of defining efficient data structures for PGMs and parallel processing of same-size batches of data sets using Java 8 features. We also provide straightforward techniques to code parallel algorithms that seamlessly exploit multicore processors. The experimental analysis, carried out using our open source AMIDST (Analysis of MassIve Data STreams) Java toolbox, shows the merits of the proposed solutions.

scandinavian conference on ai | 2015

Dynamic Bayesian modeling for risk prediction in credit operations

Hanen Borchani; Ana M. Martinez; Andrés R. Masegosa; Helge Langseth; Thomas Dyhre Nielsen; Antonio Salmerón; Antonio Fernández; Anders L. Madsen; Ramón Sáez

Hanen BORCHANI a,1, Ana M. MARTINEZ a,2,1, Andres R. MASEGOSA b,1, Helge LANGSETH b, Thomas D. NIELSEN a, Antonio SALMERON c, Antonio FERNANDEZ d, Anders L. MADSEN a,e and Ramon SAEZ d aDepartment of Computer Science, Aalborg University, Denmark bDepartment of Computer and Information Science, The Norwegian University of Science and Technology, Norway cDepartment of Mathematics, University of Almeŕia, Spain dBanco de Credito Cooperativo, Spain eHUGIN EXPERT A/S, Aalborg, Denmark

Proceedings of the 16th Conference of the Spanish Association for Artificial Intelligence on Advances in Artificial Intelligence - Volume 9422 | 2015

Parallel Importance Sampling in Conditional Linear Gaussian Networks

Antonio Salmerón; Darío Ramos-López; Hanen Borchani; Ana M. Martinez; Andrés R. Masegosa; Antonio Fernández; Helge Langseth; Anders L. Madsen; Thomas Dyhre Nielsen

In this paper we analyse the problem of probabilistic inference in CLG networks when evidence comes in streams. In such situations, fast and scalable algorithms, able to provide accurate responses in a short time are required. We consider the instantiation of variational inference and importance sampling, two well known tools for probabilistic inference, to the CLG case. The experimental results over synthetic networks show how a parallel version importance sampling, and more precisely evidence weighting, is a promising scheme, as it is accurate and scales up with respect to available computing resources.

pacific-asia conference on knowledge discovery and data mining | 2014

Highly scalable attribute selection for averaged one-dependence estimators

Shenglei Chen; Ana M. Martinez; Geoffrey I. Webb

Averaged One-Dependence Estimators (AODE) is a popular and effective approach to Bayesian learning. In this paper, a new attribute selection approach is proposed for AODE. It can search in a large model space, while it requires only a single extra pass through the training data, resulting in a computationally efficient two-pass learning algorithm. The experimental results indicate that the new technique significantly reduces AODE’s bias at the cost of a modest increase in training time. Its low bias and computational efficiency make it an attractive algorithm for learning from big data.

Knowledge Based Systems | 2018

AMIDST: A Java toolbox for scalable probabilistic machine learning

Andrés R. Masegosa; Ana M. Martinez; Darío Ramos-López; Rafael Cabañas; Antonio Salmerón; Helge Langseth; Thomas Dyhre Nielsen; Anders L. Madsen

Abstract The AMIDST Toolbox is an open source Java software for scalable probabilistic machine learning with a special focus on (massive) streaming data. The toolbox supports a flexible modelling language based on probabilistic graphical models with latent variables. AMIDST provides parallel and distributed implementations of scalable algorithms for doing probabilistic inference and Bayesian parameter learning in the specified models. These algorithms are based on a flexible variational message passing scheme, which supports discrete and continuous variables from a wide range of probability distributions.

international conference on data mining | 2016

Financial Data Analysis with PGMs Using AMIDST

Rafael Cabañas; Ana M. Martinez; Andrés R. Masegosa; Darío Ramos-López; Antonio Sameron; Thomas Dyhre Nielsen; Helge Langseth; Anders L. Madsen

The AMIDST Toolbox an open source Java 8 library for scalable learning of probabilistic graphical models (PGMs) based on both batch and streaming data. An important application domain with streaming data characteristics is the banking sector, where we may want to monitor individual customers (based on their financial situation and behavior) as well as the general economic climate. Using a real financial data set from a Spanish bank, we have previously proposed and demonstrated a novel PGM framework for performing this type of data analysis with particular focus on concept drift. The framework is implemented in the AMIDST Toolbox, which was also used to conduct the reported analyses. In this paper, we provide an overview of the toolbox and illustrate with code examples how the toolbox can be used for setting up and performing analyses of this particular type.

International Journal of Approximate Reasoning | 2017

Scaling up Bayesian variational inference using distributed computing clusters

Andrés R. Masegosa; Ana M. Martinez; Helge Langseth; Thomas Dyhre Nielsen; Antonio Salmerón; Darío Ramos-López; Anders L. Madsen

Abstract In this paper we present an approach for scaling up Bayesian learning using variational methods by exploiting distributed computing clusters managed by modern big data processing tools like Apache Spark or Apache Flink, which efficiently support iterative map-reduce operations. Our approach is defined as a distributed projected natural gradient ascent algorithm, has excellent convergence properties, and covers a wide range of conjugate exponential family models. We evaluate the proposed algorithm on three real-world datasets from different domains (the Pubmed abstracts dataset, a GPS trajectory dataset, and a financial dataset) and using several models (LDA, factor analysis, mixture of Gaussians and linear regression models). Our approach compares favorably to stochastic variational inference and streaming variational Bayes, two of the main current proposals for scaling up variational methods. For the scalability analysis, we evaluate our approach over a network with more than one billion nodes and approx. 75 % latent variables using a computer cluster with 128 processing units (AWS). The proposed methods are released as part of an open-source toolbox for scalable probabilistic machine learning ( http://www.amidsttoolbox.com ) Masegosa et al. (2017) [29] .

IEEE Transactions on Knowledge and Data Engineering | 2017

Sample-Based Attribute Selective A

Shenglei Chen; Ana M. Martinez; Geoffrey I. Webb; Limin Wang

More and more applications have come with large data sets in the past decade. However, existing algorithms cannot guarantee to scale well on large data. Averaged n-Dependence Estimators (AnDE) allows for flexible learning from out-of-core data, by varying the value of n (number of super parents). Hence, AnDE is especially appropriate for large data learning. In this paper, we propose a sample-based attribute selection technique for AnDE. It needs one more pass through the training data, in which a multitude of approximate AnDE models are built and efficiently assessed by leave-one-out cross validation. The use of a sample reduces the training time. Experiments on 15 large data sets demonstrate that the proposed technique significantly reduces AnDEs error at the cost of a modest increase in training time. This efficient and scalable out-of-core approach delivers superior or comparable performance to typical in-core Bayesian network classifiers.

Progress in Artificial Intelligence | 2017

n

Darío Ramos-López; Andrés R. Masegosa; Ana M. Martinez; Antonio Salmerón; Thomas Dyhre Nielsen; Helge Langseth; Anders L. Madsen

In this paper, we study the maximum a posteriori (MAP) problem in dynamic hybrid Bayesian networks. We are interested in finding the sequence of values of a class variable that maximizes the posterior probability given evidence. We propose an approximate solution based on transforming the MAP problem into a simpler belief update problem. The proposed solution constructs a set of auxiliary networks by grouping consecutive instantiations of the variable of interest, thus capturing some of the potential temporal dependences between these variables while ignoring others. Belief update is carried out independently in the auxiliary models, after which the results are combined, producing a configuration of values for the class variable along the entire time sequence. Experiments have been carried out to analyze the behavior of the approach. The algorithm has been implemented using Java 8 streams, and its scalability has been evaluated.

Explore More