María S. Pérez-Hernández

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where María S. Pérez-Hernández is active.

Explore More

Publication

Featured researches published by María S. Pérez-Hernández.

Scientific Programming | 2015

Towards Reproducibility in Scientific Workflows: An Infrastructure-Based Approach

Idafen Santana-Perez; María S. Pérez-Hernández

It is commonly agreed that in silico scientific experiments should be executable and repeatable processes. Most of the current approaches for computational experiment conservation and reproducibility have focused so far on two of the main components of the experiment, namely, data and method. In this paper, we propose a new approach that addresses the third cornerstone of experimental reproducibility: the equipment. This work focuses on the equipment of a computational experiment, that is, the set of software and hardware components that are involved in the execution of a scientific workflow. In order to demonstrate the feasibility of our proposal, we describe a use case scenario on the Text Analytics domain and the application of our approach to it. From the original workflow, we document its execution environment, by means of a set of semantic models and a catalogue of resources, and generate an equivalent infrastructure for reexecuting it.

international conference on cluster computing | 2016

Spark Versus Flink: Understanding Performance in Big Data Analytics Frameworks

Ovidiu-Cristian Marcu; Alexandru Costan; Gabriel Antoniu; María S. Pérez-Hernández

Big Data analytics has recently gained increasing popularity as a tool to process large amounts of data on-demand. Spark and Flink are two Apache-hosted data analytics frameworks that facilitate the development of multi-step data pipelines using directly acyclic graph patterns. Making the most out of these frameworks is challenging because efficient executions strongly rely on complex parameter configurations and on an in-depth understanding of the underlying architectural choices. Although extensive research has been devoted to improving and evaluating the performance of such analytics frameworks, most of them benchmark the platforms against Hadoop, as a baseline, a rather unfair comparison considering the fundamentally different design principles. This paper aims to bring some justice in this respect, by directly evaluating the performance of Spark and Flink. Our goal is to identify and explain the impact of the different architectural choices and the parameter configurations on the perceived end-to-end performance. To this end, we develop a methodology for correlating the parameter settings and the operators execution plan with the resource usage. We use this methodology to dissect the performance of Spark and Flink with several representative batch and iterative workloads on up to 100 nodes. Our key finding is that there none of the two framework outperforms the other for all data types, sizes and job patterns. This paper performs a fine characterization of the cases when each framework is superior, and we highlight how this performance correlates to operators, to resource usage and to the specifics of the internal framework design.

Future Generation Computer Systems | 2017

Reproducibility of execution environments in computational science using Semantics and Clouds

Idafen Santana-Perez; Rafael Ferreira da Silva; Mats Rynge; Ewa Deelman; María S. Pérez-Hernández; Oscar Corcho

Abstract In the past decades, one of the most common forms of addressing reproducibility in scientific workflow-based computational science has consisted of tracking the provenance of the produced and published results. Such provenance allows inspecting intermediate and final results, improves understanding, and permits replaying a workflow execution. Nevertheless, this approach does not provide any means for capturing and sharing the very valuable knowledge about the experimental equipment of a computational experiment, i.e., the execution environment in which the experiments are conducted. In this work, we propose a novel approach based on semantic vocabularies that describes the execution environment of scientific workflows, so as to conserve it. We define a process for documenting the workflow application and its related management system, as well as their dependencies. Then we apply this approach over three different real workflow applications running in three distinct scenarios, using public, private, and local Cloud platforms. In particular, we study one astronomy workflow and two life science workflows for genomic information analysis. Experimental results show that our approach can reproduce an equivalent execution environment of a predefined virtual machine image on all evaluated computing platforms.

european conference on parallel processing | 2014

A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Idafen Santana-Perez; Rafael Ferreira da Silva; Mats Rynge; Ewa Deelman; María S. Pérez-Hernández; Oscar Corcho

Reproducible research in scientic work ows is often addressed by tracking the provenance of the produced results. While this approach allows inspecting intermediate and nal results, improves understanding, and permits replaying a work ow execution, it does not ensure that the computational environment is available for subsequent executions to reproduce the experiment. In this work, we propose describing the resources involved in the execution of an experiment using a set of semantic vocabularies, so as to conserve the computational environment. We dene a process for documenting the work ow application, management system, and their dependencies based on 4 domain ontologies. We then conduct an experimental evaluation sing a real work ow application on an academic and a public Cloud platform. Results show that our approach can reproduce an equivalent execution environment of a predened virtual machine image on both computing platforms.

international conference on e science | 2006

Complex Data-Intensive Systems and Semantic Grid: Applications in Satellite Missions

Manuel Sánchez-Gestido; L. Blanco-abruna; María S. Pérez-Hernández; Rafael González-Cabero; Asunción Gómez-Pérez; Oscar Corcho

The use of a Semantic Grid architecture can ease the deployment of complex applications, in which several organizations are involved and where resources of diverse nature (data and computing elements) are shared. This is the situation in the Space domain, with a strong demand of computational resources inscribed in an extensive and heterogeneous network of facilities and institutions. This paper presents the S-OGSA architecture, defined in the Ontogrid project, as applied into a scenario for the overall monitoring and data analysis in a Satellite Mission currently in nominal operations. Flexibility, scalability, interoperability and use of a common framework for data sharing are the main advantages of a Semantic Grid implementation in Complex and dataintensive systems.

ieee acm international symposium cluster cloud and grid computing | 2017

Exploring Shared State in Key-Value Store for Window-Based Multi-Pattern Streaming Analytics

Ovidiu-Cristian Marcu; Radu Tudoran; Bogdan Nicolae; Alexandru Costan; Gabriel Antoniu; María S. Pérez-Hernández

We are now witnessing an unprecedented growth ofdata that needs to be processed at always increasing rates inorder to extract valuable insights. Big Data streaming analyticstools have been developed to cope with the online dimensionof data processing: they enable real-time handling of live datasources by means of stateful aggregations (operators). Currentstate-of-art frameworks (e.g. Apache Flink [1]) enable eachoperator to work in isolation by creating data copies, at theexpense of increased memory utilization. In this paper, weexplore the feasibility of deduplication techniques to addressthe challenge of reducing memory footprint for window-basedstream processing without significant impact on performance. We design a deduplication method specifically for window-basedoperators that rely on key-value stores to hold a sharedstate. We experiment with a synthetically generated workloadwhile considering several deduplication scenarios and based onthe results, we identify several potential areas of improvement. Our key finding is that more fine-grained interactions betweenstreaming engines and (key-value) stores need to be designedin order to better respond to scenarios that have to overcomememory scarcity.

international semantic web conference | 2007

A Semantic Data Grid for Satellite Mission Quality Analysis

Reuben Wright; Manuel Sánchez-Gestido; Asunción Gómez-Pérez; María S. Pérez-Hernández; Rafael González-Cabero; Oscar Corcho

The use of Semantic Grid architecture eases the development of complex, flexible applications, in which several organisations are involved and where resources of diverse nature (data and computing elements) are shared. This is the situation in the Space domain, with an extensive and heterogeneous network of facilities and institutions. There is a strong need to share both data and computational resources for complex processing tasks. One such is monitoring and data analysis for Satellite Missions and this paper presents the Satellite Mission Grid, built in the OntoGrid project as an alternative to the current systems used. Flexibility, scalability, interoperability, extensibility and efficient development were the main advantages found in using a common framework for data sharing and creating a Semantic Data Grid.

international conference on service oriented computing | 2005

ODEGSG framework, knowledge-based annotation and design of grid services

Carole A. Goble; Asunción Gómez-Pérez; Rafael González-Cabero; María S. Pérez-Hernández

The convergence of the Semantic Web and Grid technologies has resulted in the Semantic Grid. The great effort devoted in by the Semantic Web community to achieve the semantic markup of Web services (what we call Semantic Web Services) has yielded many markup technologies and initiatives, from which the Semantic Grid technology should benefit as, in recent years, it has become Web service-oriented. Keeping this fact in mind, our first premise in this work is to reuse the ODESWS Framework for the Knowledge-based markup of Grid services. Initially ODESWS was developed to enable users to annotate, design, discover and compose Semantic Web Services at the Knowledge Level. But at present, if we want to reuse it for annotating Grid services, we should carry out a detailed study of the characteristics of Web services and Grid services and thus, we will learn where they differ and why. Only when this analysis is performed should we know how to extend our theoretical framework for describing Grid services. Finally, we present the ODESGS Framework, which is the result of having applied the extensions identified to the aforementioned Semantic Web Services description framework.

international conference on knowledge capture | 2005

ODESGS framework, knowledge-based markup for semantic grid services

Carole A. Goble; Asunción Gómez-Pérez; Rafael González-Cabero; María S. Pérez-Hernández

The convergence of the Semantic Web and Grid technologies has resulted in the Semantic Grid. The Semantic Grid should be service-oriented, as the Grid is, so the formal description of Grid Services (GS) turns to be a crucial issue. In this paper we present our approach for this issue. ODESGS Framework will enable the annotation of all the aspects of a GS and the design, discovery and composition Semantic Grid Services (SGS).

european conference on parallel processing | 2013

Topic 5: Parallel and Distributed Data Management

María S. Pérez-Hernández; André Brinkmann; Stergios V. Anastasiadis; Sandro Fiore; Adrien Lèbre; Kostas Magoutis

Nowadays we are facing an exponential growth of new data that is overwhelming the capabilities of companies, institutions and the society in general to manage and use it in a proper way. Ever-increasing investments in Big Data, cutting edge technologies and the latest advances in both application development and underlying storage systems can help dealing with data of such magnitude. Especially parallel and distributed approaches will enable new data management solutions that operate effectively at large scale.

Explore More