Javier Muguerza | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Javier Muguerza is active.

Explore More

Publication

Featured researches published by Javier Muguerza.

Pattern Recognition | 2013

An extensive comparative study of cluster validity indices

Olatz Arbelaitz; Ibai Gurrutxaga; Javier Muguerza; Jesús M. Pérez; Iñigo Perona

The validation of the results obtained by clustering algorithms is a fundamental part of the clustering process. The most used approaches for cluster validation are based on internal cluster validity indices. Although many indices have been proposed, there is no recent extensive comparative study of their performance. In this paper we show the results of an experimental work that compares 30 cluster validity indices in many different environments with different characteristics. These results can serve as a guideline for selecting the most suitable index for each possible application and provide a deep insight into the performance differences between the currently available indices.

Journal of Physics: Condensed Matter | 2012

Time-dependent density-functional theory in massively parallel computer architectures: the octopus project

Xavier Andrade; Joseba Alberdi-Rodriguez; David A. Strubbe; Micael J. T. Oliveira; Fernando Nogueira; Alberto Castro; Javier Muguerza; Agustin Arruabarrena; Steven G. Louie; Alán Aspuru-Guzik; Angel Rubio; Miguel A. L. Marques

Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.

Pattern Recognition Letters | 2011

Towards a standard methodology to evaluate internal cluster validity indices

Ibai Gurrutxaga; Javier Muguerza; Olatz Arbelaitz; Jesús M. Pérez; José Ignacio Martín

The evaluation and comparison of internal cluster validity indices is a critical problem in the clustering area. The methodology used in most of the evaluations assumes that the clustering algorithms work correctly. We propose an alternative methodology that does not make this often false assumption. We compared 7 internal cluster validity indices with both methodologies and concluded that the results obtained with the proposed methodology are more representative of the actual capabilities of the compared indices.

Pattern Recognition | 2010

SEP/COP: An efficient method to find the best partition in hierarchical clustering based on a new cluster validity index

Ibai Gurrutxaga; Iñaki Albisua; Olatz Arbelaitz; José Ignacio Martín; Javier Muguerza; Jesús M. Pérez; Iñigo Perona

Hierarchical clustering algorithms provide a set of nested partitions called a cluster hierarchy. Since the hierarchy is usually too complex it is reduced to a single partition by using cluster validity indices. We show that the classical method is often not useful and we propose SEP, a new method that efficiently searches in an extended partition set. Furthermore, we propose a new cluster validity index, COP, since many of the commonly used indices cannot be used with SEP. Experiments performed with 80 synthetic and 7 real datasets confirm that SEP/COP is superior to the method currently used and furthermore, it is less sensitive to noise.

Pattern Recognition Letters | 2007

Combining multiple class distribution modified subsamples in a single tree

Jesús M. Pérez; Javier Muguerza; Olatz Arbelaitz; Ibai Gurrutxaga; José Ignacio Martín

This work describes the Consolidated Tree Construction (CTC) algorithm: a single tree is built based on a set of subsamples. This way the explaining capacity of the classifier is not lost even if many subsamples are used. We show how CTC algorithm can use undersampling to change class distribution without loss of information, building more accurate classifiers than C4.5.

Expert Systems With Applications | 2013

Web usage and content mining to extract knowledge for modelling the users of the Bidasoa Turismo website and to adapt it

Olatz Arbelaitz; Ibai Gurrutxaga; Aizea Lojo; Javier Muguerza; Jesús M. Pérez; Iñigo Perona

Abstract The tourism industry has experienced a shift from offline to online travellers and this has made the use of intelligent systems in the tourism sector crucial. These information systems should provide tourism consumers and service providers with the most relevant information, more decision support, greater mobility and the most enjoyable travel experiences. As a consequence, Destination Marketing Organizations (DMOs) not only have to respond by adopting new technologies, but also by interpreting and using the knowledge created by the use of these techniques. This work presents the design of a general and non-invasive web mining system, built using the minimum information stored in a web server (the content of the website and the information from the log files stored in Common Log Format (CLF)) and its application to the Bidasoa Turismo (BTw) website. The proposed system combines web usage and content mining techniques with the three following main objectives: generating user navigation profiles to be used for link prediction; enriching the profiles with semantic information to diversify them, which provides the DMO with a tool to introduce links that will match the users taste; and moreover, obtaining global and language-dependent user interest profiles, which provides the DMO staff with important information for future web designs, and allows them to design future marketing campaigns for specific targets. The system performed successfully, obtaining profiles which fit in more than 60% of cases with the real user navigation sequences and in more than 90% of cases with the user interests. Moreover the automatically extracted semantic structure of the website and the interest profiles were validated by the BTw DMO staff, who found the knowledge provided to be very useful for the future.

international conference on pattern recognition | 1998

A two-stage classifier for broken and blurred digits in forms

Clemente Rodríguez; Javier Muguerza; Marisa Navarro; A. Zárate; José Ignacio Martín; Jesús M. Pérez

A classifier for an automatic system that recognizes multifont typewritten digits, often broken and blurred, in forms is presented. The classification, which is based on the utilization of a global feature, is applied in two phases. Firstly, a minimum distance method (1-NN) is applied in a multifont classifier to provide a global classification of the patterns in a form. A problem associated to multifont classifiers is the interference among classes in different fonts. An interesting aspect of this particular application is that it is highly probable that a form includes just one font. Then, in the second phase, a specialized classifier, oriented to one-form, uses the patterns in the form previously classified to validate, or reject and reclassify them, on the basis of the mean distance to the predefined classes. This specialized classifier affords significant improvement in performance. A classification accuracy rate of 99.42% has been achieved.

Journal of Computational Chemistry | 2014

A survey of the parallel performance and accuracy of Poisson solvers for electronic structure calculations

Pablo García-Risueño; Joseba Alberdi-Rodriguez; Micael J. T. Oliveira; Xavier Andrade; Michael Pippig; Javier Muguerza; Agustin Arruabarrena; Angel Rubio

We present an analysis of different methods to calculate the classical electrostatic Hartree potential created by charge distributions. Our goal is to provide the reader with an estimation on the performance—in terms of both numerical complexity and accuracy—of popular Poisson solvers, and to give an intuitive idea on the way these solvers operate. Highly parallelizable routines have been implemented in a first‐principle simulation code (Octopus) to be used in our tests, so that reliable conclusions about the capability of methods to tackle large systems in cluster computing can be obtained from our work.

international conference on advances in pattern recognition | 2005

Consolidated tree classifier learning in a car insurance fraud detection domain with class imbalance

Jesús M. Pérez; Javier Muguerza; Olatz Arbelaitz; Ibai Gurrutxaga; José Ignacio Martín

This paper presents an analysis of the behaviour of Consolidated Trees, CT (classification trees induced from multiple subsamples but without loss of explaining capacity). We analyse how CT trees behave when used to solve a fraud detection problem in a car insurance company. This domain has two important characteristics: the explanation given to the classification made is critical to help investigating the received reports or claims, and besides, this is a typical example of class imbalance problem due to its skewed class distribution. In the results presented in the paper CT and C4.5 trees have been compared, from the accuracy and structural stability (explaining capacity) point of view and, for both algorithms, the best class distribution has been searched.. Due to the different associated costs of different error types (costs of investigating suspicious reports, etc.) a wider analysis of the error has also been done: precision/recall, ROC curve, etc.

International Journal of Electrical Power & Energy Systems | 1996

Fault analysis with modular neural networks

Clemente Rodríguez; S. Rernentería; José Ignacio Martín; A. Lafuente; Javier Muguerza; J. Pérez

Abstract Automatic fault diagnosis in power systems presents real challenges to computing technologies. As an alternative approach to expert systems, several neural network solutions have been proposed recently. In this paper a modular, neural network-based solution to power systems alarm handling and fault diagnosis is described that overcomes the limitations of ‘toy’ alternatives constrained to small and fixed-topology electrical networks. In contrast to monolithical diagnosis systems, the neural network-based approach presented here fulfills the scalability and dynamic adaptability requirements of the application. Mapping the power grid onto a set of interconnected modules that model the functional behaviour of electrical equipment provides the flexibility and speed demanded by the problem. The way in which the neural system is conceived allows full scalability to real-size power systems.

Explore More