Josep-lluis Larriba-pey

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Josep-lluis Larriba-pey is active.

Explore More

Publication

Featured researches published by Josep-lluis Larriba-pey.

high-performance computer architecture | 1997

The memory performance of DSS commercial workloads in shared-memory multiprocessors

Pedro Trancoso; Josep-lluis Larriba-pey; Zheng Zhang; Josep Torrellas

Although cache-coherent shared-memory multiprocessors are often used to run commercial workloads, little work has been done to characterize how well these machines support such workloads. In particular, we do not have much insight into the demands of commercial workloads on the memory subsystem of these machines. In this paper, we analyze in detail the memory access patterns of several queries that are representative of Decision Support System (DSS) databases. Our analysis shows that the memory use of queries differs largely depending on how the queries access the database data, namely via indices or by sequentially scanning the records. The former queries, which we call Index queries, suffer most of their shared-data misses on indices and on lock-related metadata structures. The latter queries, which we call Sequential queries, suffer most of their shared-data misses on the database records as they are scanned. An analysis of the data locality in the queries shows that both Index and Sequential queries exhibit spatial locality and, therefore, can benefit from relatively long cache lines. Interestingly, shared data is reused very little inside queries. However, there is data reuse across Sequential queries. Finally, we show that the performance of Sequential queries can be improved moderately with data prefetching.

conference on information and knowledge management | 2007

Dex: high-performance exploration on large graphs for information retrieval

Norbert Martinez-Bazan; Victor Muntés-Mulero; Sergio Gómez-Villamor; Jordi Nin; Mario-A. Sanchez-Martinez; Josep-lluis Larriba-pey

Link and graph analysis tools are important devices to boost the richness of information retrieval systems. Internet and the existing social networking portals are just a couple of situations where the use of these tools would be beneficial and enriching for the users and the analysts. However, the need for integrating different data sources and, even more important, the need for high performance generic tools, is at odds with the continuously growing size and number of data repositories. In this paper we propose and evaluate DEX, a high performance graph database querying system that allows for the integration of multiple data sources. DEX makes graph querying possible in different flavors, including link analysis, social network analysis, pattern recognition and keyword search. The richness of DEX shows up in the experiments that we carried out on the Internet Movie Database (IMDb). Through a variety of these complex analytical queries, DEX shows to be a generic and efficient tool on large graph databases.

web age information management | 2010

Survey of graph database performance on the HPC scalable graph analysis benchmark

David Dominguez-Sal; P. Urbon-Bayes; Aleix Gimenez-Vano; Sergio Gómez-Villamor; Norbert Martinez-Bazan; Josep-lluis Larriba-pey

The analysis of the relationship among data entities has lead to model them as graphs. Since the size of the datasets has significantly grown in the recent years, it has become necessary to implement efficient graph databases that can load and manage these huge datasets. In this paper, we evaluate the performance of four of the most scalable native graph database projects (Neo4j, Jena, HypergraphDB and DEX). We implement the full HPC Scalable Graph Analysis Benchmark, and we test the performance of each database for different typical graph operations and graph sizes, showing that in their current development status, DEX and Neo4j are the most efficient graph databases.

international conference on management of data | 2015

The LDBC Social Network Benchmark: Interactive Workload

Orri Erling; Alex Averbuch; Josep-lluis Larriba-pey; Hassan Chafi; Andrey Gubichev; Arnau Prat; Minh-Duc Pham; Peter A. Boncz

The Linked Data Benchmark Council (LDBC) is now two years underway and has gathered strong industrial participation for its mission to establish benchmarks, and benchmarking practices for evaluating graph data management systems. The LDBC introduced a new choke-point driven methodology for developing benchmark workloads, which combines user input with input from expert systems architects, which we outline. This paper describes the LDBC Social Network Benchmark (SNB), and presents database benchmarking innovation in terms of graph query functionality tested, correlated graph generation techniques, as well as a scalable benchmark driver on a workload with complex graph dependencies. SNB has three query workloads under development: Interactive, Business Intelligence, and Graph Algorithms. We describe the SNB Interactive Workload in detail and illustrate the workload with some early results, as well as the goals for the two other workloads.

IEEE Transactions on Computers | 2005

Software Trace Cache

Alex Ramirez; Josep-lluis Larriba-pey; Mateo Valero

We explore the use of compiler optimizations, which optimize the layout of instructions in memory. The target is to enable the code to make better use of the underlying hardware resources regardless of the specific details of the processor/architecture in order to increase fetch performance. The Software Trace Cache (STC) is a code layout algorithm with a broader target than previous layout optimizations. We target not only an improvement in the instruction cache hit rate, but also an increase in the effective fetch width of the fetch engine. The STC algorithm organizes basic blocks into chains trying to make sequentially executed basic blocks reside in consecutive memory positions, then maps the basic block chains in memory to minimize conflict misses in the important sections of the program. We evaluate and analyze in detail the impact of the STC, and code layout optimizations in general, on the three main aspects of fetch performance; the instruction cache hit rate, the effective fetch width, and the branch prediction accuracy. Our results show that layout optimized, codes have some special characteristics that make them more amenable for high-performance instruction fetch. They have a very high rate of not-taken branches and execute long chains of sequential instructions; also, they make very effective use of instruction cache lines, mapping only useful instructions which will execute close in time, increasing both spatial and temporal locality.

international world wide web conferences | 2014

High quality, scalable and parallel community detection for large real graphs

Arnau Prat-Pérez; David Dominguez-Sal; Josep-lluis Larriba-pey

Community detection has arisen as one of the most relevant topics in the field of graph mining, principally for its applications in domains such as social or biological networks analysis. Different community detection algorithms have been proposed during the last decade, approaching the problem from different perspectives. However, existing algorithms are, in general, based on complex and expensive computations, making them unsuitable for large graphs with millions of vertices and edges such as those usually found in the real world. In this paper, we propose a novel disjoint community detection algorithm called Scalable Community Detection (SCD). By combining different strategies, SCD partitions the graph by maximizing the Weighted Community Clustering (WCC), a recently proposed community detection metric based on triangle analysis. Using real graphs with ground truth overlapped communities, we show that SCD outperforms the current state of the art proposals (even those aimed at finding overlapping communities) in terms of quality and performance. SCD provides the speed of the fastest algorithms and the quality in terms of NMI and F1Score of the most accurate state of the art proposals. We show that SCD is able to run up to two orders of magnitude faster than practical existing solutions by exploiting the parallelism of current multi-core processors, enabling us to process graphs of unprecedented size in short execution times.

international conference on parallel architectures and compilation techniques | 2000

The Effect of Code Reordering on Branch Prediction

Alex Ramirez; Josep-lluis Larriba-pey; Mateo Valero

Branch prediction accuracy is a very important factor for superscalar processor performance. The ability to predict the outcome of a branch allows the processor to effectively use a large instruction window, and extract a larger amount of Instruction Level Parallelism (ILP). In this paper we will examine the effect of code layout optimizations on branch prediction accuracy and final processor performance. These code reordering techniques align branches so that they tend to be not taken, achieving better instruction cache performance and increasing the fetch bandwidth. Here we focus on how these optimizations affect both static and dynamic branch prediction. Code reordering mainly increases the number of not taken branches, which benefits simple static predictors, which reach over 80% prediction accuracy with optimized codes. This branch direction change produces no effects on dynamic branch prediction: on the positive side, trades negative interference for neutral or positive interference in the prediction tables; on the negative side, it causes a worse distribution of the Branch History Register (BHR), causing many possible history values to be unused. Our results show that code reordering reduces negative Pattern History Table (PHT) interference, increasing branch prediction accuracy on small branch predictors. For example, a 0.5 KB gshare improves from 91.4% to 93.6%, and a 0.4 KB gskew predictor from 93.5% to 94.4%. For larger history lengths, the large amount of not taken branches can degrade predictor performance on dealiased schemes, like the 16 KB agree predictor which goes from 96.2% to 95.8%. But processor performance not only depends on branch prediction accuracy. Layout optimized codes have much better instruction cache performance, and wider fetch bandwidth. Our results show that when all three factors are considered together; code reordering techniques always improve processor performance. For example, performance still increases by 8% with an agree predictor; which loses prediction accuracy, and it increases by 9% with a gshare predictor, which increases prediction accuracy.

international symposium on microarchitecture | 2002

Fetching instruction streams

Alex Ramirez; Oliverio J. Santana; Josep-lluis Larriba-pey; Mateo Valero

Fetch performance is a very important factor because it effectively limits the overall processor performance. However there is little performance advantage in increasing front-end performance beyond what the back-end can consume. For each processor design, the target is to build the best possible fetch engine for the required performance level. A fetch engine will be better if it provides better performance, but also if it takes fewer resources, requires less chip area, or consumes less power. In this paper we propose a novel fetch architecture based on the execution of long streams of sequential instructions, taking maximum advantage of code layout optimizations. We describe our architecture in detail, and show that it requires less complexity and resources than other high performance fetch architectures like the trace cache, while providing a high fetch performance suitable for wide-issue superscalar processors. Our results show that using our fetch architecture and code layout optimizations obtains 10% higher performance than the EV8 fetch architecture, and 4% higher than the FTB architecture using state-of-the-art branch predictors, while being only 1.5% slower than the trace cache. Even in the absence of code layout optimizations, fetching instruction streams is still 10% faster than the EV8, and only 4% slower than the trace cache. Fetching instruction streams effectively exploits the special characteristics of layout optimized codes to provide a high fetch performance, close to that of a trace cache, but has a much lower cost and complexity, similar to that of a basic block architecture.

First International Workshop on Graph Data Management Experiences and Systems | 2013

Benchmarking database systems for social network applications

Renzo Angles; Arnau Prat-Pérez; David Dominguez-Sal; Josep-lluis Larriba-pey

Graphs have become an indispensable tool for the analysis of linked data. As with any data representation, the need for using database management systems appears when they grow in size and complexity. Associated to those needs, benchmarks appear to assess the performance of such systems in specific scenarios, representative of real use cases. In this paper we propose a microbenchmark based on social networks. This includes a data generator that synthetically creates social graphs, and a set of low level atomic queries that model parts of the behavior of social network users. In order to understand how different data management paradigms are stressed, we execute the benchmark over five different database systems representing graph (Dex and Neo4j), RDF (RDF-3X) and relational (Virtuoso and PostgreSQL) data management. We conclude that reachability queries are those that put all the database systems into more difficulties, justifying themselves, and making them good candidates for more complex benchmarks.

international database engineering and applications symposium | 2007

On the Use of Semantic Blocking Techniques for Data Cleansing and Integration

Jordi Nin; Victor Muntés-Mulero; Norbert Martinez-Bazan; Josep-lluis Larriba-pey

Record linkage (RL) is an important component of data cleansing and integration. For years, many efforts have focused on improving the performance of the RL process, either by reducing the number of record comparisons or by reducing the number of attribute comparisons, which reduces the computational time, but very often decreases the quality of the results. However, the real bottleneck of RL is the post-process, where the results have to be reviewed by experts that decide which pairs or groups of records are real links and which are false hits. In this paper, we show that exploiting the relationships (e.g. foreign key) established between one or more data sources, makes it possible to find a new sort of semantic blocking method that improves the number of hits and reduces the amount of review effort.

Explore More