Ibai Gurrutxaga
University of the Basque Country
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ibai Gurrutxaga.
Pattern Recognition | 2013
Olatz Arbelaitz; Ibai Gurrutxaga; Javier Muguerza; Jesús M. Pérez; Iñigo Perona
The validation of the results obtained by clustering algorithms is a fundamental part of the clustering process. The most used approaches for cluster validation are based on internal cluster validity indices. Although many indices have been proposed, there is no recent extensive comparative study of their performance. In this paper we show the results of an experimental work that compares 30 cluster validity indices in many different environments with different characteristics. These results can serve as a guideline for selecting the most suitable index for each possible application and provide a deep insight into the performance differences between the currently available indices.
Pattern Recognition Letters | 2011
Ibai Gurrutxaga; Javier Muguerza; Olatz Arbelaitz; Jesús M. Pérez; José Ignacio Martín
The evaluation and comparison of internal cluster validity indices is a critical problem in the clustering area. The methodology used in most of the evaluations assumes that the clustering algorithms work correctly. We propose an alternative methodology that does not make this often false assumption. We compared 7 internal cluster validity indices with both methodologies and concluded that the results obtained with the proposed methodology are more representative of the actual capabilities of the compared indices.
Pattern Recognition | 2010
Ibai Gurrutxaga; Iñaki Albisua; Olatz Arbelaitz; José Ignacio Martín; Javier Muguerza; Jesús M. Pérez; Iñigo Perona
Hierarchical clustering algorithms provide a set of nested partitions called a cluster hierarchy. Since the hierarchy is usually too complex it is reduced to a single partition by using cluster validity indices. We show that the classical method is often not useful and we propose SEP, a new method that efficiently searches in an extended partition set. Furthermore, we propose a new cluster validity index, COP, since many of the commonly used indices cannot be used with SEP. Experiments performed with 80 synthetic and 7 real datasets confirm that SEP/COP is superior to the method currently used and furthermore, it is less sensitive to noise.
Pattern Recognition Letters | 2007
Jesús M. Pérez; Javier Muguerza; Olatz Arbelaitz; Ibai Gurrutxaga; José Ignacio Martín
This work describes the Consolidated Tree Construction (CTC) algorithm: a single tree is built based on a set of subsamples. This way the explaining capacity of the classifier is not lost even if many subsamples are used. We show how CTC algorithm can use undersampling to change class distribution without loss of information, building more accurate classifiers than C4.5.
Expert Systems With Applications | 2013
Olatz Arbelaitz; Ibai Gurrutxaga; Aizea Lojo; Javier Muguerza; Jesús M. Pérez; Iñigo Perona
Abstract The tourism industry has experienced a shift from offline to online travellers and this has made the use of intelligent systems in the tourism sector crucial. These information systems should provide tourism consumers and service providers with the most relevant information, more decision support, greater mobility and the most enjoyable travel experiences. As a consequence, Destination Marketing Organizations (DMOs) not only have to respond by adopting new technologies, but also by interpreting and using the knowledge created by the use of these techniques. This work presents the design of a general and non-invasive web mining system, built using the minimum information stored in a web server (the content of the website and the information from the log files stored in Common Log Format (CLF)) and its application to the Bidasoa Turismo (BTw) website. The proposed system combines web usage and content mining techniques with the three following main objectives: generating user navigation profiles to be used for link prediction; enriching the profiles with semantic information to diversify them, which provides the DMO with a tool to introduce links that will match the users taste; and moreover, obtaining global and language-dependent user interest profiles, which provides the DMO staff with important information for future web designs, and allows them to design future marketing campaigns for specific targets. The system performed successfully, obtaining profiles which fit in more than 60% of cases with the real user navigation sequences and in more than 90% of cases with the user interests. Moreover the automatically extracted semantic structure of the website and the interest profiles were validated by the BTw DMO staff, who found the knowledge provided to be very useful for the future.
international conference on advances in pattern recognition | 2005
Jesús M. Pérez; Javier Muguerza; Olatz Arbelaitz; Ibai Gurrutxaga; José Ignacio Martín
This paper presents an analysis of the behaviour of Consolidated Trees, CT (classification trees induced from multiple subsamples but without loss of explaining capacity). We analyse how CT trees behave when used to solve a fraud detection problem in a car insurance company. This domain has two important characteristics: the explanation given to the classification made is critical to help investigating the received reports or claims, and besides, this is a typical example of class imbalance problem due to its skewed class distribution. In the results presented in the paper CT and C4.5 trees have been compared, from the accuracy and structural stability (explaining capacity) point of view and, for both algorithms, the best class distribution has been searched.. Due to the different associated costs of different error types (costs of investigating suspicious reports, etc.) a wider analysis of the error has also been done: precision/recall, ROC curve, etc.
international conference on enterprise information systems | 2009
Ibai Gurrutxaga; Olatz Arbelaitz; José Ignacio Martín; Javier Muguerza; Jesús M. Pérez; Iñigo Perona
SAHN is a widely used agglomerative hierarchical clustering method. Nevertheless it is not an incremental algorithm and therefore it is not suitable for many real application areas where all data is not available at the beginning of the process. Some authors proposed incremental variants of SAHN. Their goal was to obtain the same results in incremental environments. This approach is not practical since frequently must rebuild the hierarchy, or a big part of it, and often leads to completely different structures. We propose a novel algorithm, called SIHC, that updates SAHN hierarchies with minor changes in the previous structures. This property makes it suitable for real environments. Results on 11 synthetic and 6 real datasets show that SIHC builds high quality clustering hierarchies. This quality level is similar and sometimes better than SAHN’s. Moreover, the computational complexity of SIHC is lower than SAHN’s.
intelligent information systems | 2004
Jesús M. Pérez; Javier Muguerza; Olatz Arbelaitz; Ibai Gurrutxaga
This paper presents a new methodology for building decision trees, Consolidated Trees Construction algorithm, that improves the behavior of C4.5. It reduces the error and the complexity of the induced trees, being the differences in the complexity statistically significant. The advantage of this methodology in respect to other techniques such as bagging, boosting, etc. is that the final classifier is based on a single tree and not in a set of trees, so that, the explaining capacity of the classification is not lost. The experimentation has been done with some databases of the UCI Repository and a real application of customer fidelization from a company of electrical appliances.
International Journal of Parallel Programming | 2015
José Luis Jodrá; Ibai Gurrutxaga; Javier Muguerza
Matrix transposition is a basic operation for several computing tasks. Hence, transposing a matrix in a computer’s main memory has been well studied since many years ago. More recently, the out-of-place matrix transposition has been performed efficiently in graphical processing units (GPU), which are broadly used today for general purpose computing. However, due to the particular architecture of GPUs, the adaptation of the matrix transposition operation to 3D arrays is not straightforward. In this paper, we describe efficient implementations for graphical processing units of the 5 possible out-of-place 3D transpositions. Moreover, we also include the transposition of the most basic in-place 3D transpositions. The results show that the achieved bandwidth is close to a simple array copy and is similar to the 2D transposition.
CAEPIA'09 Proceedings of the Current topics in artificial intelligence, and 13th conference on Spanish association for artificial intelligence | 2009
Iñaki Albisua; Olatz Arbelaitz; Ibai Gurrutxaga; José Ignacio Martín; Javier Muguerza
When using machine learning to solve real world problems, the class distribution used in the training set is important; not only in highly unbalanced data sets but in every data set. Weiss and Provost suggested that each domain has an optimal class distribution to be used for training. The aim of this work was to analyze the truthfulness of this hypothesis in the context of decision tree learners. With this aim we found the optimal class distribution for 30 databases and two decision tree learners, C4.5 and Consolidated Tree Construction algorithm (CTC), taking into account pruned and unpruned trees and based on two measures for evaluating discriminating capacity: AUC and error. The results confirmed that changes in the class distribution of the training samples improve the performance (AUC and error) of the classifiers. Therefore, the experimentation showed that there is an optimal class distribution for each database and this distribution depends on the used learning algorithm, whether the trees are pruned or not and the used evaluation criteria. Besides, results showed that CTC algorithm combined with optimal class distribution samples achieves more accurate learners, than any of the options of C4.5 and CTC with original distribution, with statistically significant differences.