Jesús Camacho-Rodríguez
University of Paris-Sud
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jesús Camacho-Rodríguez.
conference on information and knowledge management | 2012
Andrés Aranda-Andújar; Francesca Bugiotti; Jesús Camacho-Rodríguez; Dario Colazzo; François Goasdoué; Zoi Kaoudi; Ioana Manolescu
We present AMADA, a platform for storing Web data (in particular, XML documents and RDF graphs) based on the Amazon Web Services (AWS) cloud infrastructure. AMADA operates in a Software as a Service (SaaS) approach, allowing users to upload, index, store, and query large volumes of Web data. The demonstration shows (i) the step-by-step procedure for building and exploiting the warehouse (storing, indexing, querying) and (ii) the monitoring tools enabling one to control the expenses (monetary costs) charged by AWS for the operations involved while running AMADA.
international conference on data engineering | 2012
Jesús Camacho-Rodríguez; Dario Colazzo; Ioana Manolescu
It has been by now widely accepted that an increasing part of the worlds interesting data is either shared through the Web or directly produced through and for Web platforms using formats like XML (structured documents). We present a scalable store for managing a large corpora of XML documents built on top of off-the-shelf cloud infrastructure. We implement different indexing strategies to evaluate a query workload over the stored documents in the cloud. Moreover, each strategy presents different trade-offs between efficiency in query answering and cost for storing the index.
international conference on management of data | 2014
Jesús Camacho-Rodríguez; Dario Colazzo; Ioana Manolescu
We present a novel approach for parallelizing the execution of queries over XML documents, implemented within our system PAXQuery. We compile a rich subset of XQuery into plans expressed in the PArallelization ConTracts (PACT) programming model. These plans are then optimized and executed in parallel by the Stratosphere system. We demonstrate the efficiency and scalability of our approach through experiments on hundreds of GB of XML data.
extending database technology | 2013
Jesús Camacho-Rodríguez; Dario Colazzo; Ioana Manolescu
An increasing part of the worlds data is either shared through the Web or directly produced through and for Web platforms, in particular using structured formats like XML or JSON. Cloud platforms are interesting candidates to handle large data repositories, due to their elastic scaling properties. Popular commercial clouds provide a variety of sub-systems and primitives for storing data in specific formats (files, key-value pairs etc.) as well as dedicated sub-systems for running and coordinating execution within the cloud. We propose an architecture for warehousing large-scale Web data, in particular XML, in a commercial cloud platform, specifically, Amazon Web Services. Since cloud users support monetary costs directly connected to their consumption of cloud resources, we focus on indexing content in the cloud. We study the applicability of several indexing strategies, and show that they lead not only to reducing query evaluation time, but also, importantly, to reducing the monetary costs associated with the exploitation of the cloud-based warehouse. Our architecture can be easily adapted to similar cloud-based complex data warehousing settings, carrying over the benefits of access path selection in the cloud.
conference on information and knowledge management | 2016
Jesús Camacho-Rodríguez; Dario Colazzo; Melanie Herschel; Ioana Manolescu; Soudip Roy Chowdhury
Pig Latin is a popular language which is widely used for parallel processing of massive data sets. Currently, subexpressions occurring repeatedly in Pig Latin scripts are executed as many times as they appear, and the current Pig Latin optimizer does not identify reuse opportunities. We present a novel optimization approach aiming at identifying and reusing repeated subexpressions in Pig Latin scripts. Our optimization algorithm, named PigReuse, identifies subexpression merging opportunities, selects the best ones to execute based on a cost function, and reuses their results as needed in order to compute exactly the same output as the original scripts. Our experiments demonstrate the effectiveness of our approach.
international conference on management of data | 2018
Edmon Begoli; Jesús Camacho-Rodríguez; Julian Hyde; Michael J. Mior; Daniel Lemire
Apache Calcite is a foundational software framework that provides query processing, optimization, and query language support to many popular open-source data processing systems such as Apache Hive, Apache Storm, Apache Flink, Druid, and MapD. The goal of this paper is to formally introduce Calcite to the broader research community, brie y present its history, and describe its architecture, features, functionality, and patterns for adoption. Calcites architecture consists of a modular and extensible query optimizer with hundreds of built-in optimization rules, a query processor capable of processing a variety of query languages, an adapter architecture designed for extensibility, and support for heterogeneous data models and stores (relational, semi-structured, streaming, and geospatial). This exible, embeddable, and extensible architecture is what makes Calcite an attractive choice for adoption in big-data frameworks. It is an active project that continues to introduce support for the new types of data sources, query languages, and approaches to query processing and optimization.
international conference on management of data | 2015
Jesús Camacho-Rodríguez; Dario Colazzo; Ioana Manolescu; Juan A.M. Naranjo
XQuery is a general-purpose programming language for processing semi-structured data, and as such, it is very expressive. As a consequence, optimizing and parallelizing complex analytics XQuery queries is still an open, challenging problem. We demonstrate PAXQuery, a novel system that parallelizes the execution of XQuery queries over large collections of XML documents. PAXQuery compiles a rich subset of XQuery into plans expressed in the PArallelization ConTracts (PACT) programming model. Thanks to this translation, the resulting plans are optimized and executed in a massively parallel fashion by the Apache Flink system. The result is a scalable system capable of querying massive amounts of XML data very efficiently, as proved by the experimental results we outline.
Bases de Données Avancées | 2010
Jesús Camacho-Rodríguez; Konstantinos Karanasos; Ioana Manolescu; Alin Tilea; Spyros Zoupanos
Archive | 2016
Jesús Camacho-Rodríguez; Dario Colazzo; Melanie Herschel; Ioana Manolescu; Soudip Roy Chowdhury
Linked Data Management | 2014
Francesca Bugiotti; Jesús Camacho-Rodríguez; François Goasdoué; Zoi Kaoudi; Ioana Manolescu; Stamatis Zampetakis