Victor Muntés-Mulero
CA Technologies
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Victor Muntés-Mulero.
very large data bases | 2012
Tilmann Rabl; Sergio Gómez-Villamor; Mohammad Sadoghi; Victor Muntés-Mulero; Hans-Arno Jacobsen; Serge Mankovskii
As the complexity of enterprise systems increases, the need for monitoring and analyzing such systems also grows. A number of companies have built sophisticated monitoring tools that go far beyond simple resource utilization reports. For example, based on instrumentation and specialized APIs, it is now possible to monitor single method invocations and trace individual transactions across geographically distributed systems. This high-level of detail enables more precise forms of analysis and prediction but comes at the price of high data rates (i.e., big data). To maximize the benefit of data monitoring, the data has to be stored for an extended period of time for ulterior analysis. This new wave of big data analytics imposes new challenges especially for the application performance monitoring systems. The monitoring data has to be stored in a system that can sustain the high data rates and at the same time enable an up-to-date view of the underlying infrastructure. With the advent of modern key-value stores, a variety of data storage systems have emerged that are built with a focus on scalability and high data rates as predominant in this monitoring use case. In this work, we present our experience and a comprehensive performance evaluation of six modern (open-source) data stores in the context of application performance monitoring as part of CA Technologies initiative. We evaluated these systems with data and workloads that can be found in application performance monitoring, as well as, on-line advertisement, power monitoring, and many other use cases. We present our insights not only as performance results but also as lessons learned and our experience relating to the setup and configuration complexity of these data stores in an industry setting.
conference on information and knowledge management | 2007
Norbert Martinez-Bazan; Victor Muntés-Mulero; Sergio Gómez-Villamor; Jordi Nin; Mario-A. Sanchez-Martinez; Josep-lluis Larriba-pey
Link and graph analysis tools are important devices to boost the richness of information retrieval systems. Internet and the existing social networking portals are just a couple of situations where the use of these tools would be beneficial and enriching for the users and the analysts. However, the need for integrating different data sources and, even more important, the need for high performance generic tools, is at odds with the continuously growing size and number of data repositories. In this paper we propose and evaluate DEX, a high performance graph database querying system that allows for the integration of multiple data sources. DEX makes graph querying possible in different flavors, including link analysis, social network analysis, pattern recognition and keyword search. The richness of DEX shows up in the experiments that we carried out on the Internet Movie Database (IMDb). Through a variety of these complex analytical queries, DEX shows to be a generic and efficient tool on large graph databases.
international database engineering and applications symposium | 2007
Jordi Nin; Victor Muntés-Mulero; Norbert Martinez-Bazan; Josep-lluis Larriba-pey
Record linkage (RL) is an important component of data cleansing and integration. For years, many efforts have focused on improving the performance of the RL process, either by reducing the number of record comparisons or by reducing the number of attribute comparisons, which reduces the computational time, but very often decreases the quality of the results. However, the real bottleneck of RL is the post-process, where the results have to be reviewed by experts that decide which pairs or groups of records are real links and which are false hits. In this paper, we show that exploiting the relationships (e.g. foreign key) established between one or more data sources, makes it possible to find a new sort of semantic blocking method that improves the number of hits and reduces the amount of review effort.
international conference on management of data | 2006
Josep Aguilar-saborit; Pedro Trancoso; Victor Muntés-Mulero; Josep-lluis Larriba-pey
Bloom filters are not able to handle deletes and inserts on multisets over time. This is important in many situations when streamed data evolve rapidly and change patterns frequently. Counting Bloom Filters (CBF) have been proposed to overcome this limitation and allow for the dynamic evolution of Bloom filters. The only dynamic approach to a compact and efficient representation of CBF are the Spectral Bloom Filters (SBF).In this paper we propose the Dynamic Count Filters (DCF) as a new dynamic and space-time efficient representation of CBF. Although DCF does not make a compact use of memory, it shows to be faster and more space efficient than any previous proposal. Results show that the proposed data structure is more efficient independently of the incoming data characteristics.
tpc technology conference | 2010
David Dominguez-Sal; Norbert Martinez-Bazan; Victor Muntés-Mulero; Pere Baleta; Josep Lluis Larriba-Pay
Graph Database Management systems (GDBs) are gaining popularity. They are used to analyze huge graph datasets that are naturally appearing in many application areas to model interrelated data. The objective of this paper is to raise a new topic of discussion in the benchmarking community and allow practitioners having a set of basic guidelines for GDB benchmarking. We strongly believe that GDBs will become an important player in the market field of data analysis, and with that, their performance and capabilities will also become important. For this reason, we discuss those aspects that are important from our perspective, i.e. the characteristics of the graphs to be included in the benchmark, the characteristics of the queries that are important in graph analysis applications and the evaluation workbench.
international database engineering and applications symposium | 2012
Norbert Martinez-Bazan; M. Ángel Águila-Lorente; Victor Muntés-Mulero; David Dominguez-Sal; Sergio Gómez-Villamor; Josep-L. Larriba-Pey
The increasing amount of graph like data from social networks, science and the web has grown an interest in analyzing the relationships between different entities. New specialized solutions in the form of graph databases, which are generic and able to adapt to any schema as an alternative to RDBMS, have appeared to manage attributed multigraphs efficiently. In this paper, we describe the internals of DEX graph database, which is based on a representation of the graph and its attributes as maps and bitmap structures that can be loaded and unloaded efficiently from memory. We also present the internal operations used in DEX to manipulate these structures. We show that by using these structures, DEX scales to graphs with billions of vertices and edges with very limited memory requirements. Finally, we compare our graph-oriented approach to other approaches showing that our system is better suited for out-of-core typical graph-like operations.
International Journal of Information Security | 2012
Marc Solé; Victor Muntés-Mulero; Jordi Nin
The contradictory requirements of data privacy and data analysis have fostered the development of statistical disclosure control techniques. In this context, microaggregation is one of the most frequently used methods since it offers a good trade-off between simplicity and quality. Unfortunately, most of the currently available microaggregation algorithms have been devised to work with small datasets, while the size of current databases is constantly increasing. The usual way to tackle this problem is to partition large data volumes into smaller fragments that can be processed in reasonable time by available algorithms. This solution is applied at the cost of losing quality. In this paper, we revisited the computational needs of microaggregation showing that it can be reduced to two steps: sorting the dataset with regard to a vantage point and a set of k-nearest neighbors searches. Considering this new point of view, we propose three new efficient quality-preserving microaggregation algorithms based on k-nearest neighbors search techniques. We present a comparison of our approaches with the most significant strategies presented in the literature using three real very large datasets. Experimental results show that our proposals overcome previous techniques by keeping a better balance between performance and the quality of the anonymized dataset.
conference on information and knowledge management | 2009
Victor Muntés-Mulero; Jordi Nin
With the increase of available public data sources and the interest for analyzing them, privacy issues are becoming the eye of the storm in many applications. The vast amount of data collected on human beings and organizations as a result of cyberinfrastructure advances, or that collected by statistical agencies, for instance, has made traditional ways of protecting social science data obsolete. This has given rise to different techniques aimed at tackling this problem and at the analysis of limitations in such environments, such as the seminal study by Aggarwal of anonymization techniques and their dependency on data dimensionality. The growing accessibility to high-capacity storage devices allows keeping more detailed information from many areas. While this enriches the information and conclusions extracted from this data, it poses a serious problem for most of the previous work presented up to now regarding privacy, focused on quality and paying little attention to performance aspects. In this workshop, we want to gather researchers in the areas of data privacy and anonymization together with researchers in the area of high performance and very large data volumes management. We seek to collect the most recent advances in data privacy and anonymization (i.e. anonymization techniques, statistic disclosure techniques, privacy in machine learning algorithms, privacy in graphs or social networks, etc) and those in High Performance and Data Management (i.e. algorithms and structures for efficient data management, parallel or distributed systems, etc).
acm special interest group on data communication | 2017
Albert Mestres; Alberto Rodriguez-Natal; Josep Carner; Pere Barlet-Ros; Eduard Alarcón; Marc Solé; Victor Muntés-Mulero; David Meyer; Sharon Barkai; Mike J. Hibbett; Giovani Estrada; Khaldun Maruf; Florin Coras; Vina Ermagan; Hugo Latapie; Chris Cassar; John Evans; Fabio Maino; Jean Walrand; Albert Cabellos
The research community has considered in the past the application of Artificial Intelligence (AI) techniques to control and operate networks. A notable example is the Knowledge Plane proposed by D.Clark et al. However, such techniques have not been extensively prototyped or deployed in the field yet. In this paper, we explore the reasons for the lack of adoption and posit that the rise of two recent paradigms: Software-Defined Networking (SDN) and Network Analytics (NA), will facilitate the adoption of AI techniques in the context of network operation and control. We describe a new paradigm that accommodates and exploits SDN, NA and AI, and provide use-cases that illustrate its applicability and benefits. We also present simple experimental results that support, for some relevant use-cases, its feasibility. We refer to this new paradigm as Knowledge-Defined Networking (KDN).
international conference on data engineering | 2010
Arnau Padrol-Sureda; Guillem Perarnau-Llobet; Julian Pfeifle; Victor Muntés-Mulero
Finding decompositions of a graph into a family of clusters is crucial to understanding its underlying structure. While most existing approaches focus on partitioning the nodes, real-world datasets suggest the presence of overlapping communities. We present OCA, a novel algorithm to detect overlapped communities in large data graphs. It outperforms previous proposals in terms of execution time, and efficiently handles large graphs containing more than 108 nodes and edges.