Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jordi Nin is active.

Publication


Featured researches published by Jordi Nin.


conference on information and knowledge management | 2007

Dex: high-performance exploration on large graphs for information retrieval

Norbert Martinez-Bazan; Victor Muntés-Mulero; Sergio Gómez-Villamor; Jordi Nin; Mario-A. Sanchez-Martinez; Josep-lluis Larriba-pey

Link and graph analysis tools are important devices to boost the richness of information retrieval systems. Internet and the existing social networking portals are just a couple of situations where the use of these tools would be beneficial and enriching for the users and the analysts. However, the need for integrating different data sources and, even more important, the need for high performance generic tools, is at odds with the continuously growing size and number of data repositories. In this paper we propose and evaluate DEX, a high performance graph database querying system that allows for the integration of multiple data sources. DEX makes graph querying possible in different flavors, including link analysis, social network analysis, pattern recognition and keyword search. The richness of DEX shows up in the experiments that we carried out on the Internet Movie Database (IMDb). Through a variety of these complex analytical queries, DEX shows to be a generic and efficient tool on large graph databases.


data and knowledge engineering | 2008

Rethinking rank swapping to decrease disclosure risk

Jordi Nin; Javier Herranz; Vicenç Torra

Nowadays, the need for privacy motivates the use of methods that allow to protect a microdata file both minimizing the disclosure risk and preserving the data utility. A very popular microdata protection method is rank swapping. Record linkage is the standard mechanism used to measure the disclosure risk of a microdata protection method. In this paper we present a new record linkage method, specific for rank swapping, which obtains more links than standard ones. The consequence is that rank swapping has a higher disclosure risk than believed up to now. Motivated by this, we present two new variants of the rank swapping method, which make the new record linkage technique unsuitable. Therefore, the real disclosure risk of these new methods is lower than the standard rank swapping.


data and knowledge engineering | 2008

On the disclosure risk of multivariate microaggregation

Jordi Nin; Javier Herranz; Vicenç Torra

The aim of data protection methods is to protect a microdata file both minimizing the disclosure risk and preserving the data utility. Microaggregation is one of the most popular such methods among statistical agencies. Record linkage is the standard mechanism used to measure the disclosure risk of a microdata protection method. However, only standard, and quite generic, record linkage methods are usually considered, whereas more specific record linkage techniques can be more appropriate to evaluate the disclosure risk of some protection methods. In this paper we present a new record linkage technique, specific for microaggregation, which obtains more correct links than standard techniques. We have tested the new technique with MDAV microaggregation and two other microaggregation methods, based on projections, that we propose here for the first time. The direct consequence is that these microaggregation methods have a higher disclosure risk than believed up to now.


international database engineering and applications symposium | 2007

On the Use of Semantic Blocking Techniques for Data Cleansing and Integration

Jordi Nin; Victor Muntés-Mulero; Norbert Martinez-Bazan; Josep-lluis Larriba-pey

Record linkage (RL) is an important component of data cleansing and integration. For years, many efforts have focused on improving the performance of the RL process, either by reducing the number of record comparisons or by reducing the number of attribute comparisons, which reduces the computational time, but very often decreases the quality of the results. However, the real bottleneck of RL is the post-process, where the results have to be reviewed by experts that decide which pairs or groups of records are real links and which are false hits. In this paper, we show that exploiting the relationships (e.g. foreign key) established between one or more data sources, makes it possible to find a new sort of semantic blocking method that improves the number of hits and reduces the amount of review effort.


Information Sciences | 2009

Towards the evaluation of time series protection methods

Jordi Nin; Vicenç Torra

The goal of statistical disclosure control (SDC) is to modify statistical data so that it can be published without releasing confidential information that may be linked to specific respondents. The challenge for SDC is to achieve this variation with minimum loss of the detail and accuracy sought by final users. There are many approaches to evaluate the quality of a protection method. However, all these measures are only applicable to numerical or categorical attributes. In this paper, we present some recent results about time series protection and re-identification. We propose a complete framework to evaluate time series protection methods. We also present some empirical results to show how our framework works.


computer software and applications conference | 2009

Computing Reputation for Collaborative Private Networks

Jordi Nin; Barbara Carminati; Elena Ferrari; Vicenç Torra

The use of collaborative network services is increasing, therefore, the protection of the resources and relations shared by network participants is becoming crucial. One of the main issues in such networks is the evaluation of participant reputation, since network resources access may or may not be granted on the basis of the reputation of the requesting node. Therefore, the calculation of the reputation of the nodes becomes a very important issue. There are several reputation models presented in the literature. Some of these models (e.g., Ebay or Sporas) are very simple and participants cannot express their preferences in the reputation computation process. On the contrary, there are other reputations models (e.g., Reget or Fire) too complex to be applied when privacy is a primary concern. In this paper, we propose a new reputation model based on OWA and WOWA operators. The key characteristics of our proposal are that reputation is computed in a private way using the homomorphic properties of elGamal crypto-system and it is possible to introduce user preferences inside reputation computation. We present the feasibility of this new reputation model by considering a Web-based Social Network scenario.


International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems | 2008

HOW TO GROUP ATTRIBUTES IN MULTIVARIATE MICROAGGREGATION

Jordi Nin; Javier Herranz; Vicenç Torra

Microaggregation is one of the most employed microdata protection methods. It builds clusters of at least k original records, and then replaces these records with the centroid of the cluster. When the number of attributes of the dataset is large, one usually splits the dataset into smaller blocks of attributes, and then applies microaggregation to each block, successively and independently. In this way, the effect of the noise introduced by microaggregation is reduced, at the cost of losing the k-anonymity property. In this work we show that, besides the specific microaggregation method, the value of the parameter k and the number of blocks in which the dataset is split, there exists another factor which influences the quality of the microaggregation: the way in which the attributes are grouped to form the blocks. When correlated attributes are grouped in the same block, the statistical utility of the protected dataset is higher. In contrast, when correlated attributes are dispersed into different blocks, the achieved anonymity is higher, and so, the disclosure risk is lower. We present quantitative evaluations of such statements based on different experiments on real datasets.


International Journal of Information Security | 2012

Efficient microaggregation techniques for large numerical data volumes

Marc Solé; Victor Muntés-Mulero; Jordi Nin

The contradictory requirements of data privacy and data analysis have fostered the development of statistical disclosure control techniques. In this context, microaggregation is one of the most frequently used methods since it offers a good trade-off between simplicity and quality. Unfortunately, most of the currently available microaggregation algorithms have been devised to work with small datasets, while the size of current databases is constantly increasing. The usual way to tackle this problem is to partition large data volumes into smaller fragments that can be processed in reasonable time by available algorithms. This solution is applied at the cost of losing quality. In this paper, we revisited the computational needs of microaggregation showing that it can be reduced to two steps: sorting the dataset with regard to a vantage point and a set of k-nearest neighbors searches. Considering this new point of view, we propose three new efficient quality-preserving microaggregation algorithms based on k-nearest neighbors search techniques. We present a comparison of our approaches with the most significant strategies presented in the literature using three real very large datasets. Experimental results show that our proposals overcome previous techniques by keeping a better balance between performance and the quality of the anonymized dataset.


conference on information and knowledge management | 2009

Privacy and anonymization for very large datasets

Victor Muntés-Mulero; Jordi Nin

With the increase of available public data sources and the interest for analyzing them, privacy issues are becoming the eye of the storm in many applications. The vast amount of data collected on human beings and organizations as a result of cyberinfrastructure advances, or that collected by statistical agencies, for instance, has made traditional ways of protecting social science data obsolete. This has given rise to different techniques aimed at tackling this problem and at the analysis of limitations in such environments, such as the seminal study by Aggarwal of anonymization techniques and their dependency on data dimensionality. The growing accessibility to high-capacity storage devices allows keeping more detailed information from many areas. While this enriches the information and conclusions extracted from this data, it poses a serious problem for most of the previous work presented up to now regarding privacy, focused on quality and paying little attention to performance aspects. In this workshop, we want to gather researchers in the areas of data privacy and anonymization together with researchers in the area of high performance and very large data volumes management. We seek to collect the most recent advances in data privacy and anonymization (i.e. anonymization techniques, statistic disclosure techniques, privacy in machine learning algorithms, privacy in graphs or social networks, etc) and those in High Performance and Data Management (i.e. algorithms and structures for efficient data management, parallel or distributed systems, etc).


IEEE Transactions on Knowledge and Data Engineering | 2011

Optimal Symbol Alignment Distance: A New Distance for Sequences of Symbols

Javier Herranz; Jordi Nin; Marc Solé

Comparison functions for sequences (of symbols) are important components of many applications, for example, clustering, data cleansing, and integration. For years, many efforts have been made to improve the performance of such comparison functions. Improvements have been done either at the cost of reducing the accuracy of the comparison, or by compromising certain basic characteristics of the functions, such as the triangular inequality. In this paper, we propose a new distance for sequences of symbols (or strings) called Optimal Symbol Alignment distance (OSA distance, for short). This distance has a very low cost in practice, which makes it a suitable candidate for computing distances in applications with large amounts of (very long) sequences. After providing a mathematical proof that the OSA distance is a real distance, we present some experiments for different scenarios (DNA sequences, record linkage, etc.), showing that the proposed distance outperforms, in terms of execution time and/or accuracy, other well-known comparison functions such as the Edit or Jaro-Winkler distances.

Collaboration


Dive into the Jordi Nin's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Javier Herranz

Polytechnic University of Catalonia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jordi Pont-Tuset

Polytechnic University of Catalonia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Josep-lluis Larriba-pey

Polytechnic University of Catalonia

View shared research outputs
Top Co-Authors

Avatar

Marc Solé

Polytechnic University of Catalonia

View shared research outputs
Top Co-Authors

Avatar

Josep Ll. Larriba-Pey

Polytechnic University of Catalonia

View shared research outputs
Top Co-Authors

Avatar

Maguelonne Teisseire

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Pascal Poncelet

University of Montpellier

View shared research outputs
Researchain Logo
Decentralizing Knowledge