Idafen Santana-Perez | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Idafen Santana-Perez is active.

Explore More

Publication

Featured researches published by Idafen Santana-Perez.

Scientific Programming | 2015

Towards Reproducibility in Scientific Workflows: An Infrastructure-Based Approach

Idafen Santana-Perez; María S. Pérez-Hernández

It is commonly agreed that in silico scientific experiments should be executable and repeatable processes. Most of the current approaches for computational experiment conservation and reproducibility have focused so far on two of the main components of the experiment, namely, data and method. In this paper, we propose a new approach that addresses the third cornerstone of experimental reproducibility: the equipment. This work focuses on the equipment of a computational experiment, that is, the set of software and hardware components that are involved in the execution of a scientific workflow. In order to demonstrate the feasibility of our proposal, we describe a use case scenario on the Text Analytics domain and the application of our approach to it. From the original workflow, we document its execution environment, by means of a set of semantic models and a catalogue of resources, and generate an equivalent infrastructure for reexecuting it.

Future Generation Computer Systems | 2017

Reproducibility of execution environments in computational science using Semantics and Clouds

Idafen Santana-Perez; Rafael Ferreira da Silva; Mats Rynge; Ewa Deelman; María S. Pérez-Hernández; Oscar Corcho

Abstract In the past decades, one of the most common forms of addressing reproducibility in scientific workflow-based computational science has consisted of tracking the provenance of the produced and published results. Such provenance allows inspecting intermediate and final results, improves understanding, and permits replaying a workflow execution. Nevertheless, this approach does not provide any means for capturing and sharing the very valuable knowledge about the experimental equipment of a computational experiment, i.e., the execution environment in which the experiments are conducted. In this work, we propose a novel approach based on semantic vocabularies that describes the execution environment of scientific workflows, so as to conserve it. We define a process for documenting the workflow application and its related management system, as well as their dependencies. Then we apply this approach over three different real workflow applications running in three distinct scenarios, using public, private, and local Cloud platforms. In particular, we study one astronomy workflow and two life science workflows for genomic information analysis. Experimental results show that our approach can reproduce an equivalent execution environment of a predefined virtual machine image on all evaluated computing platforms.

european conference on parallel processing | 2014

A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Idafen Santana-Perez; Rafael Ferreira da Silva; Mats Rynge; Ewa Deelman; María S. Pérez-Hernández; Oscar Corcho

Reproducible research in scientic work ows is often addressed by tracking the provenance of the produced results. While this approach allows inspecting intermediate and nal results, improves understanding, and permits replaying a work ow execution, it does not ensure that the computational environment is available for subsequent executions to reproduce the experiment. In this work, we propose describing the resources involved in the execution of an experiment using a set of semantic vocabularies, so as to conserve the computational environment. We dene a process for documenting the work ow application, management system, and their dependencies based on 4 domain ontologies. We then conduct an experimental evaluation sing a real work ow application on an academic and a public Cloud platform. Results show that our approach can reproduce an equivalent execution environment of a predened virtual machine image on both computing platforms.

international conference on cloud computing | 2012

A Semantic Scheduler Architecture for Federated Hybrid Clouds

Idafen Santana-Perez; María S. Pérez-Hern'ndez

Cloud computing is one the most relevant computing paradigms available nowadays. Its adoption has increased during last years due to the large investment and research from business enterprises and academia institutions. Among all the services cloud providers usually offer, Infrastructure as a Service has reached its momentum for solving HPC problems in a more dynamic way without the need of expensive investments. The integration of a large number of providers is a major goal as it enables the improvement of the quality of the selected resources in terms of pricing, speed, redundancy, etc. In this paper, we propose a system architecture, based on semantic solutions, to build an interoperable scheduler for federated clouds that works with several IaaS (Infrastructure as a Service) providers in a uniform way. Based on this architecture we implement a proof-of-concept prototype and test it with two different cloud solutions to provide some experimental results about the viability of our approach.

international conference on speech and computer | 2017

Spanish Corpus for Sentiment Analysis Towards Brands

María Navas-Loro; Víctor Rodríguez-Doncel; Idafen Santana-Perez; Alberto Sánchez

Posts published in the social media are a good source of feedback to assess the impact of advertising campaigns. Whereas most of the published corpora of messages in the Sentiment Analysis domain tag posts with polarity labels, this paper presents a corpus in Spanish language where tagging has been made using 8 predefined emotions: love-hate, happiness-sadness, trust-fear, satisfaction-dissatisfaction. In every post, extracted from Twitter, sentiments have been annotated towards each specific brand under study. The corpus is published as a collection of RDF resources with links to external entities. Also a vocabulary describing this emotion classification along with other relevant aspects of customer’s opinion is provided.

european semantic web conference | 2018

MAS: A Corpus of Tweets for Marketing in Spanish.

María Navas-Loro; Víctor Rodríguez-Doncel; Idafen Santana-Perez; Alba Fernández-Izquierdo; Alberto Sánchez

This paper presents a corpus of tweets in Spanish language which were manually tagged for marketing purposes. The used tags describe three aspects of the text of each Twitter post. First, the emotions a brand caused to the author from among a taxonomy of emotions designed by marketing experts. Also, whether it mentioned any element of the marketing mix (including various relevant marketing concepts such as price or promotion). Finally, the position of the author of the tweet with respect to the acquisition process (or purchase funnel). Each Twitter post is related to only one brand, which is also indicated in the corpus. The corpus presented in this article is published in a machine-readable format as a collection of RDF documents with links to additional external information. The paper also includes details on the used vocabulary and the tagging criteria, as well as a description of the annotation process followed to tag the tweets.

international conference on knowledge capture | 2017

Repairing Hidden Links in Linked Data: Enhancing the quality of RDF knowledge graphs

Nandana Mihindukulasooriya; Mariano Rico; Idafen Santana-Perez; Raúl García-Castro; Asunción Gómez-Pérez

Knowledge Graphs (KG) are becoming core components of most artificial intelligence applications. Linked Data, as a method of publishing KGs, allows applications to traverse within, and even out of, the graph thanks to global dereferenceable identifiers denoting entities, in the form of IRIs. However, as we show in this work, after analyzing several popular datasets (namely DBpedia, LOD Cache, and Web Data Commons JSON-LD data) many entities are being represented using literal strings where IRIs should be used, diminishing the advantages of using Linked Data. To remedy this, we propose an approach for identifying such strings and replacing them with their corresponding entity IRIs. The proposed approach is based on identifying relations between entities based on both ontological axioms as well as data profiling information and converting strings to entity IRIs based on the types of entities linked by each relation. Our approach showed 98% recall and 76% precision in identifying such strings and 97% precision in converting them to their corresponding IRI in the considered KG. Further, we analyzed how the connectivity of the KG is increased when new relevant links are added to the entities as a result of our method. Our experiments on a subset of the Spanish DBpedia data show that it could add 25% more links to the KG and improve the overall connectivity by 17%.

international conference on high performance computing and simulation | 2012

Semantic scheduling of virtualized infrastructures for scientific workflows

Idafen Santana-Perez; María S. Pérez-Hernández

Virtualized Infrastructures are a promising way for providing flexible and dynamic computing solutions for resource-consuming tasks. Scientific Workflows are one of these kind of tasks, as they need a large amount of computational resources during certain periods of time. To provide the best infrastructure configuration for a workflow it is necessary to explore as many providers as possible taking into account different criteria like Quality of Service, pricing, response time, network latency, etc. Moreover, each one of these new resources must be tuned to provide the tools and dependencies required by each of the steps of the workflow. Working with different infrastructure providers, either public or private using their own concepts and terms, and with a set of heterogeneous applications requires a framework for integrating all the information about these elements. This work proposes semantic technologies for describing and integrating all the information about the different components of the overall system and a set of policies created by the user. Based on this information a scheduling process will be performed to generate an infrastructure configuration defining the set of virtual machines that must be run and the tools that must be deployed on them.

--- | reproducibility@XSEDE: An XSEDE14 Workshop | 14 Julio 2014 | Atlanta, GA | 2014