Is this you? Create Your Porfile

Arnau Prat-Pérez

Polytechnic University of Catalonia

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Arnau Prat-Pérez is active.

Explore More

Publication

Featured researches published by Arnau Prat-Pérez.

international world wide web conferences | 2014

High quality, scalable and parallel community detection for large real graphs

Arnau Prat-Pérez; David Dominguez-Sal; Josep-lluis Larriba-pey

Community detection has arisen as one of the most relevant topics in the field of graph mining, principally for its applications in domains such as social or biological networks analysis. Different community detection algorithms have been proposed during the last decade, approaching the problem from different perspectives. However, existing algorithms are, in general, based on complex and expensive computations, making them unsuitable for large graphs with millions of vertices and edges such as those usually found in the real world. In this paper, we propose a novel disjoint community detection algorithm called Scalable Community Detection (SCD). By combining different strategies, SCD partitions the graph by maximizing the Weighted Community Clustering (WCC), a recently proposed community detection metric based on triangle analysis. Using real graphs with ground truth overlapped communities, we show that SCD outperforms the current state of the art proposals (even those aimed at finding overlapping communities) in terms of quality and performance. SCD provides the speed of the fastest algorithms and the quality in terms of NMI and F1Score of the most accurate state of the art proposals. We show that SCD is able to run up to two orders of magnitude faster than practical existing solutions by exploiting the parallelism of current multi-core processors, enabling us to process graphs of unprecedented size in short execution times.

First International Workshop on Graph Data Management Experiences and Systems | 2013

Benchmarking database systems for social network applications

Renzo Angles; Arnau Prat-Pérez; David Dominguez-Sal; Josep-lluis Larriba-pey

Graphs have become an indispensable tool for the analysis of linked data. As with any data representation, the need for using database management systems appears when they grow in size and complexity. Associated to those needs, benchmarks appear to assess the performance of such systems in specific scenarios, representative of real use cases. In this paper we propose a microbenchmark based on social networks. This includes a data generator that synthetically creates social graphs, and a set of low level atomic queries that model parts of the behavior of social network users. In order to understand how different data management paradigms are stressed, we execute the benchmark over five different database systems representing graph (Dex and Neo4j), RDF (RDF-3X) and relational (Virtuoso and PostgreSQL) data management. We conclude that reachability queries are those that put all the database systems into more difficulties, justifying themselves, and making them good candidates for more complex benchmarks.

international conference on management of data | 2015

Graphalytics: A Big Data Benchmark for Graph-Processing Platforms

Mihai Capotă; Tim Hegeman; Alexandru Iosup; Arnau Prat-Pérez; Orri Erling; Peter A. Boncz

Graphs are increasingly used in industry, governance, and science. This has stimulated the appearance of many and diverse graph-processing platforms. Although platform diversity is beneficial, it also makes it very challenging to select the best platform for an application domain or one of its important applications, and to design new and tune existing platforms. Continuing a long tradition of using benchmarking to address such challenges, in this work we present our vision for Graphalytics, a big data benchmark for graph-processing platforms. We have already benchmarked with Graphalytics a variety of popular platforms, such as Giraph, GraphX, and Neo4j.

conference on information and knowledge management | 2012

Shaping communities out of triangles

Arnau Prat-Pérez; David Dominguez-Sal; Josep M. Brunat; Josep-lluis Larriba-pey

Community detection has arisen as one of the most relevant topics in the field of graph data mining due to its importance in many fields such as biology, social networks or network traffic analysis. The metrics proposed to shape communities are too lax and do not consider the internal layout of the edges in the community, which lead to undesirable results. We define a new community metric called WCC. The proposed metric meets a minimum set of basic properties that guarantees communities with structure and cohesion. We experimentally show that WCC correctly quantifies the quality of communities and community partitions using real and synthetic datasets, and compare some of the most used community detection algorithms in the state of the art.

very large data bases | 2016

LDBC graphalytics: a benchmark for large-scale graph analysis on parallel and distributed platforms

Alexandru Iosup; Tim Hegeman; Wing Lung Ngai; Stijn Heldens; Arnau Prat-Pérez; Thomas Manhardto; Hassan Chafio; Mihai Capotă; Narayanan Sundaram; Michael J. Anderson; Ilie Gabriel Tănase; Yinglong Xia; Lifeng Nai; Peter A. Boncz

In this paper we introduce LDBC Graphalytics, a new industrial-grade benchmark for graph analysis platforms. It consists of six deterministic algorithms, standard datasets, synthetic dataset generators, and reference output, that enable the objective comparison of graph analysis platforms. Its test harness produces deep metrics that quantify multiple kinds of system scalability, such as horizontal/vertical and weak/strong, and of robustness, such as failures and performance variability. The benchmark comes with open-source software for generating data and monitoring performance. We describe and analyze six implementations of the benchmark (three from the community, three from the industry), providing insights into the strengths and weaknesses of the platforms. Key to our contribution, vendors perform the tuning and benchmarking of their platforms.

ACM Transactions on Knowledge Discovery From Data | 2016

Put Three and Three Together: Triangle-Driven Community Detection

Arnau Prat-Pérez; David Dominguez-Sal; Josep M. Brunat; Josep-lluis Larriba-pey

Community detection has arisen as one of the most relevant topics in the field of graph data mining due to its applications in many fields such as biology, social networks, or network traffic analysis. Although the existing metrics used to quantify the quality of a community work well in general, under some circumstances, they fail at correctly capturing such notion. The main reason is that these metrics consider the internal community edges as a set, but ignore how these actually connect the vertices of the community. We propose the Weighted Community Clustering (WCC), which is a new community metric that takes the triangle instead of the edge as the minimal structural motif indicating the presence of a strong relation in a graph. We theoretically analyse WCC in depth and formally prove, by means of a set of properties, that the maximization of WCC guarantees communities with cohesion and structure. In addition, we propose Scalable Community Detection (SCD), a community detection algorithm based on WCC, which is designed to be fast and scalable on SMP machines, showing experimentally that WCC correctly captures the concept of community in social networks using real datasets. Finally, using ground-truth data, we show that SCD provides better quality than the best disjoint community detection algorithms of the state of the art while performing faster.

database systems for advanced applications | 2011

Social based layouts for the increase of locality in graph operations

Arnau Prat-Pérez; David Dominguez-Sal; Josep-lluis Larriba-pey

Graphs provide a natural data representation for analyzing the relationships among entities in many application areas. Since the analysis algorithms perform memory intensive operations, it is important that the graph layout is adapted to take advantage of the memory hierarchy. Here, we propose layout strategies based on community detection to improve the in-memory data locality of generic graph algorithms. We conclude that the detection of communities in a graph provides a layout strategy that improves the performance of graph algorithms consistently over other state of the art strategies.

international conference on management of data | 2015

Understanding Graph Structure of Wikipedia for Query Expansion

Joan Guisado-Gámez; Arnau Prat-Pérez

Knowledge bases are very good sources for knowledge extraction, the ability to create knowledge from structured and unstructured sources and use it to improve automatic processes as query expansion. However, extracting knowledge from unstructured sources is still an open challenge [9]. In this respect, understanding the structure of knowledge bases can provide significant benefits for the effectiveness of such purpose. In particular, Wikipedia has become a very popular knowledge base in the last years because it is a general encyclopedia that has a large amount of information and thus, covers a large amount of different topics. In this piece of work, we analyze how articles and categories of Wikipedia relate to each other and how these relationships can support a query expansion technique. In particular, we show that the structures in the form of dense cycles with a minimum amount of categories tend to identify the most relevant information.

Proceedings of Workshop on GRAph Data management Experiences and Systems | 2014

How community-like is the structure of synthetically generated graphs?

Arnau Prat-Pérez; David Dominguez-Sal

Social-like graph generators have become an indispensable tool when designing proper evaluation methodologies for social graph applications, algorithms and systems. Existing synthetic generators have been designed to produce data with characteristics similar to those found in real graphs, such as power-law degree distributions, a large clustering coefficient or a small diameter. However, real social networks are organized into higher level structures, called communities, that are not explicitly considered by these generators. In this paper, we study the statistical features of the community structure found in real social networks, and compare them to those generated by the LFR and LDBC-DG generators. We found that communities show multimodal features, and thus are hard to generate with simple community models. According to our results LDBC-DG draws realistic community distributions, even reproducing the multimodality observed.

privacy in statistical databases | 2008

Parallelizing Record Linkage for Disclosure Risk Assessment

Joan Guisado-Gámez; Arnau Prat-Pérez; Jordi Nin; Victor Muntés-Mulero; Josep-lluis Larriba-pey

Handling very large volumes of confidential data is becoming a common practice in many organizations such as statistical agencies. This calls for the use of protection methods that have to be validated in terms of the quality they provide. With the use of Record Linkage (RL) it is possible to compute the disclosure risk, which gives a measure of the quality of a data protection method. However, the RL methods proposed in the literature are computationally costly, which poses difficulties when frequent RL processes have to be executed on large data. Here, we propose a distributed computing technique to improve the performance of a RL process. We show that our technique not only improves the computing time of a RL process significantly, but it is also scalable in a distributed environment. Also, we show that distributed computation can be complemented with SMP based parallelization in each node increasing the final speedup.

Explore More