Miguel A. Martínez-Prieto

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Miguel A. Martínez-Prieto is active.

Explore More

Publication

Featured researches published by Miguel A. Martínez-Prieto.

Journal of Web Semantics | 2013

Binary RDF representation for publication and exchange (HDT)

Javier D. Fernández; Miguel A. Martínez-Prieto; Claudio Gutierrez; Axel Polleres; Mario Arias

The current Web of Data is producing increasingly large RDF datasets. Massive publication efforts of RDF data driven by initiatives like the Linked Open Data movement, and the need to exchange large datasets has unveiled the drawbacks of traditional RDF representations, inspired and designed by a document-centric and human-readable Web. Among the main problems are high levels of verbosity/redundancy and weak machine-processable capabilities in the description of these datasets. This scenario calls for efficient formats for publication and exchange. This article presents a binary RDF representation addressing these issues. Based on a set of metrics that characterizes the skewed structure of real-world RDF data, we develop a proposal of an RDF representation that modularly partitions and efficiently represents three components of RDF datasets: Header information, a Dictionary, and the actual Triples structure (thus called HDT). Our experimental evaluation shows that datasets in HDT format can be compacted by more than fifteen times as compared to current naive representations, improving both parsing and processing while keeping a consistent publication scheme. Specific compression techniques over HDT further improve these compression rates and prove to outperform existing compression solutions for efficient RDF exchange.

bioinformatics and bioengineering | 2010

Compressed q-Gram Indexing for Highly Repetitive Biological Sequences

Francisco Claude; Antonio Fariña; Miguel A. Martínez-Prieto; Gonzalo Navarro

The study of compressed storage schemes for highly repetitive sequence collections has been recently boosted by the availability of cheaper sequencing technologies and the flood of data they promise to generate. Such a storage scheme may range from the simple goal of retrieving whole individual sequences to the more advanced one of providing fast searches in the collection. In this paper we study alternatives to implement a particularly popular index, namely, the one able of finding all the positions in the collection of substrings of fixed length (

international semantic web conference | 2010

Compact representation of large RDF data sets for publishing and exchange

Javier D. Fernández; Miguel A. Martínez-Prieto; Claudio Gutierrez

symposium on experimental and efficient algorithms | 2011

Compressed string dictionaries

Nieves R. Brisaboa; Rodrigo Cánovas; Francisco Claude; Miguel A. Martínez-Prieto; Gonzalo Navarro

-grams). We introduce two novel techniques and show they constitute practical alternatives to handle this scenario. They excel particularly in two cases: when

international world wide web conferences | 2010

RDF compression: basic approaches

Javier D. Fernández; Claudio Gutierrez; Miguel A. Martínez-Prieto

Future Generation Computer Systems | 2015

The Solid architecture for real-time management of big semantic data

Miguel A. Martínez-Prieto; Carlos E. Cuesta; Mario Arias; Javier D. Fernández

is small (up to 6), and when the collection is extremely repetitive (less than 0.01% mutations).

Knowledge and Information Systems | 2015

Compressed vertical partitioning for efficient RDF management

Sandra Álvarez-García; Nieves R. Brisaboa; Javier D. Fernández; Miguel A. Martínez-Prieto; Gonzalo Navarro

Increasingly huge RDF data sets are being published on the Web. Currently, they use different syntaxes of RDF, contain high levels of redundancy and have a plain indivisible structure. All this leads to fuzzy publications, inefficient management, complex processing and lack of scalability. This paper presents a novel RDF representation (HDT) which takes advantage of the structural properties of RDF graphs for splitting and representing, efficiently, three components of RDF data: Header, Dictionary and Triples structure. On-demand management operations can be implemented on top of HDT representation. Experiments show that data sets can be compacted in HDT by more than fifteen times the current naive representation, improving parsing and processing while keeping a consistent publication scheme. For exchanging, specific compression techniques over HDT improve current compression solutions.

ACM Sigapp Applied Computing Review | 2012

Querying RDF dictionaries in compressed space

Miguel A. Martínez-Prieto; Javier D. Fernández; Rodrigo Cánovas

The problem of storing a set of strings - a string dictionary - in compact form appears naturally in many cases. While classically it has represented a small part of the whole data to be processed (e.g., for Natural Language processing or for indexing text collections), recent applications inWeb engines, RDF graphs, Bioinformatics, and many others, handle very large string dictionaries, whose size is a significant fraction of the whole data. Thus efficient approaches to compress them are necessary. In this paper we empirically compare time and space performance of some existing alternatives, as well as new ones we propose. We show that space reductions of up to 20% of the original size of the strings is possible while supporting dictionary searches within a few microseconds, and up to 10% within a few tens or hundreds of microseconds.

conference on information and knowledge management | 2011

Indexes for highly repetitive document collections

Francisco Claude; Antonio Fariña; Miguel A. Martínez-Prieto; Gonzalo Navarro

This paper studies the compressibility of RDF data sets. We show that big RDF data sets are highly compressible due to the structure of RDF graphs (power law), organization of URIs and RDF syntax verbosity. We present basic approaches to compress RDF data and test them with three well-known, real-world RDF data sets.

european conference on software architecture | 2013

Towards an architecture for managing big semantic data in real-time

Carlos E. Cuesta; Miguel A. Martínez-Prieto; Javier D. Fernández

Big Data?management has become a critical task in many application systems, which usually rely on heavyweight batch processes to manage such large amounts of data. However, batch architectures are not an adequate choice for designing real-time systems in which data updates and reads must be satisfied with very low latency. Thus, gathering and consuming high volumes of data at high velocities is an emerging challenge which we specifically address in the scope of innovative scenarios based on semantic data (RDF) management. The Linked Open Data initiative or emergent projects in the Internet of Things are examples of such scenarios. This paper describes a new architecture (referred to as Solid) which separates the complexities of Big Semantic Data?storage and indexing from real-time data acquisition and consumption. This decision relies on the use of two optimized datastores which respectively store historical (big) data and run-time data. It ensures efficient volume management and high processing velocity, but adds the need of coordinating both datastores. Solid ?proposes a 3-tiered architecture in which each responsibility is specifically addressed. Besides its theoretical description, we also propose and evaluate a Solid ?prototype built on top of binary RDF and state-of-the-art triplestores. Our experimental numbers report that Solid ?achieves large savings in data storage (it uses up to 5 times less space than the compared triplestores), while provides efficient SPARQL resolution over the Big Semantic Data?(in the order of 10-20?ms for the studied queries). These experiments also show that Solid ?ensures low-latency operations because data effectively managed in real-time remain small, so do not suffer Big Data?issues. We propose an architecture (Solid) for managing big semantic data in real-time.Specific big data and real-time responsibilities are isolated in dedicated layers.A dynamic pipe-filter solution is introduced for addressing query responsibilities.Solid ?leverages Rdf/Hdt ?features to obtain the most compressed representations.The Solid ?prototype performs competitive respect to the most prominent triplestores.

Explore More