Mario Arias | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mario Arias is active.

Explore More

Publication

Featured researches published by Mario Arias.

Journal of Web Semantics | 2013

Binary RDF representation for publication and exchange (HDT)

Javier D. Fernández; Miguel A. Martínez-Prieto; Claudio Gutierrez; Axel Polleres; Mario Arias

The current Web of Data is producing increasingly large RDF datasets. Massive publication efforts of RDF data driven by initiatives like the Linked Open Data movement, and the need to exchange large datasets has unveiled the drawbacks of traditional RDF representations, inspired and designed by a document-centric and human-readable Web. Among the main problems are high levels of verbosity/redundancy and weak machine-processable capabilities in the description of these datasets. This scenario calls for efficient formats for publication and exchange. This article presents a binary RDF representation addressing these issues. Based on a set of metrics that characterizes the skewed structure of real-world RDF data, we develop a proposal of an RDF representation that modularly partitions and efficiently represents three components of RDF datasets: Header information, a Dictionary, and the actual Triples structure (thus called HDT). Our experimental evaluation shows that datasets in HDT format can be compacted by more than fifteen times as compared to current naive representations, improving both parsing and processing while keeping a consistent publication scheme. Specific compression techniques over HDT further improve these compression rates and prove to outperform existing compression solutions for efficient RDF exchange.

Future Generation Computer Systems | 2015

The Solid architecture for real-time management of big semantic data

Miguel A. Martínez-Prieto; Carlos E. Cuesta; Mario Arias; Javier D. Fernández

Big Data?management has become a critical task in many application systems, which usually rely on heavyweight batch processes to manage such large amounts of data. However, batch architectures are not an adequate choice for designing real-time systems in which data updates and reads must be satisfied with very low latency. Thus, gathering and consuming high volumes of data at high velocities is an emerging challenge which we specifically address in the scope of innovative scenarios based on semantic data (RDF) management. The Linked Open Data initiative or emergent projects in the Internet of Things are examples of such scenarios. This paper describes a new architecture (referred to as Solid) which separates the complexities of Big Semantic Data?storage and indexing from real-time data acquisition and consumption. This decision relies on the use of two optimized datastores which respectively store historical (big) data and run-time data. It ensures efficient volume management and high processing velocity, but adds the need of coordinating both datastores. Solid ?proposes a 3-tiered architecture in which each responsibility is specifically addressed. Besides its theoretical description, we also propose and evaluate a Solid ?prototype built on top of binary RDF and state-of-the-art triplestores. Our experimental numbers report that Solid ?achieves large savings in data storage (it uses up to 5 times less space than the compared triplestores), while provides efficient SPARQL resolution over the Big Semantic Data?(in the order of 10-20?ms for the studied queries). These experiments also show that Solid ?ensures low-latency operations because data effectively managed in real-time remain small, so do not suffer Big Data?issues. We propose an architecture (Solid) for managing big semantic data in real-time.Specific big data and real-time responsibilities are isolated in dedicated layers.A dynamic pipe-filter solution is introduced for addressing query responsibilities.Solid ?leverages Rdf/Hdt ?features to obtain the most compressed representations.The Solid ?prototype performs competitive respect to the most prominent triplestores.

european conference on information retrieval | 2009

Extracting Geographic Context from the Web: GeoReferencing in MyMoSe

Álvaro Zubizarreta; Pablo de la Fuente; José Manuel Cantera; Mario Arias; Jorge Cabrero; Guido García; César Llamas; Jesús Vegas

Many Web pages are clearly related to specific locations. Identifying this geographic focus is the cornerstone of the next generation of geographic context aware search services. This paper shows a multistage method for assigning a geographic focus to Web pages (GeoReferencing), using several heuristics for toponym disambiguation and a scoring function for focus determination. We provide an experimental methodology for evaluating the accuracy of the system with Web pages in English and Spanish. Finally, we have obtained promising results, reaching an accuracy of over 70% with a town-level resolution.

conference on information and knowledge management | 2008

A georeferencing multistage method for locating geographic context in web search

Álvaro Zubizarreta; Pablo de la Fuente; José Manuel Cantera; Mario Arias; Jorge Cabrero; Guido García; César Llamas; Jesús Vegas

The geographic scope of Web pages is becoming an essential dimension of Web search, especially for mobile users. This paper shows a multistage method for assigning a geographic focus to Web pages (GeoReferencing) according to their text contents. We suggest several heuristics for the disambiguation toponyms and a scoring procedure for focus determination. Furthermore, we provide an experimental methodology for evaluating the accuracy. Finally, we obtained promising results of over 70% accuracy with a city-level resolution.

CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence | 2011

Lightweighting the web of data through compact RDF/HDT

Javier D. Fernández; Miguel A. Martínez-Prieto; Mario Arias; Claudio Gutierrez; Sandra Álvarez-García; Nieves R. Brisaboa

The Web of Data is producing large RDF datasets from diverse fields. The increasing size of the data being published threatens to make these datasets hardly to exchange, index and consume. This scalability problem greatly diminishes the potential of interconnected RDF graphs. The HDT format addresses these problems through a compact RDF representation, that partitions and efficiently represents three components: Header (metadata), Dictionary (strings occurring in the dataset), and Triples (graph structure). This paper revisits the format and exploits the latest findings in triples indexing for querying, exchanging and visualizing RDF information at large scale.

arXiv: Information Retrieval | 2011