David A. Monge | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David A. Monge is active.

Explore More

Publication

Featured researches published by David A. Monge.

ieee international conference on high performance computing data and analytics | 2014

Adaptive Spot-Instances Aware Autoscaling for Scientific Workflows on the Cloud

David A. Monge; Carlos García Garino

This paper deals with the problem of autoscaling for cloud computing scientific workflows. Autoscaling is a process in which the infrastructure scaling (i.e. determining the number and type of instances to acquire for executing an application) interleaves with the scheduling of tasks for reducing time and monetary cost of executions. This work proposes a novel strategy called Spots Instances Aware Autoscaling (SIAA) designed for the optimized execution of scientific workflow applications. SIAA takes advantage of the better prices of Amazon’s EC2-like spot instances to achieve better performance and cost savings. To deal with execution efficiency, SIAA uses a novel heuristic scheduling algorithm to optimize workflow makespan and reduce the effect of tasks failures that may occur by the use of spot instances. Experiments were carried out using several types of real-world scientific workflows. Results demonstrated that SIAA is able to greatly overcome the performance of state-of-the-art autoscaling mechanisms in terms of makespan (up to 88.0%) and cost of execution (up to 43.6%).

ieee international conference on high performance computing data and analytics | 2017

FaaSter, better, cheaper : the prospect of serverless scientific computing and HPC

Josef Spillner; Cristian Mateos; David A. Monge

The adoption of cloud computing facilities and programming models differs vastly between different application domains. Scalable web applications, low-latency mobile backends and on-demand provisioned databases are typical cases for which cloud services on the platform or infrastructure level exist and are convincing when considering technical and economical arguments. Applications with specific processing demands, including high-performance computing, high-throughput computing and certain flavours of scientific computing, have historically required special configurations such as compute- or memory-optimised virtual machine instances. With the rise of function-level compute instances through Function-as-a-Service (FaaS) models, the fitness of generic configurations needs to be re-evaluated for these applications. We analyse several demanding computing tasks with regards to how FaaS models compare against conventional monolithic algorithm execution. Beside the comparison, we contribute a refined FaaSification process for legacy software and provide a roadmap for future work.

Cluster Computing | 2015

Ensemble learning of runtime prediction models for gene-expression analysis workflows

David A. Monge; Matěj Holec; Filip Železný; Carlos García Garino

The adequate management of scientific workflow applications strongly depends on the availability of accurate performance models of sub-tasks. Numerous approaches use machine learning to generate such models autonomously, thus alleviating the human effort associated to this process. However, these standalone models may lack robustness, leading to a decay on the quality of information provided to workflow systems on top. This paper presents a novel approach for learning ensemble prediction models of tasks runtime. The ensemble-learning method entitled bootstrap aggregating (bagging) is used to produce robust ensembles of M5P regression trees of better predictive performance than could be achieved by standalone models. Our approach has been tested on gene expression analysis workflows. The results show that the ensemble method leads to significant prediction-error reductions when compared with learned standalone models. This is the first initiative using ensemble learning for generating performance prediction models. These promising results encourage further research in this direction.

ieee international conference on high performance computing data and analytics | 2014

Ensemble Learning of Run-Time Prediction Models for Data-Intensive Scientific Workflows

David A. Monge; Matĕj Holec; Filip Z̆elezný; Carlos García Garino

Workflow applications for in-silico experimentation involve the processing of large amounts of data. One of the core issues for the efficient management of such applications is the prediction of tasks performance. This paper proposes a novel approach that enables the construction models for predicting task’s running-times of data-intensive scientific workflows. Ensemble Machine Learning techniques are used to produce robust combined models with high predictive accuracy. Information derived from workflow systems and the characteristics and provenance of the data are exploited to guarantee the accuracy of the models. The proposed approach has been tested on Bioinformatics workflows for Gene Expressions Analysis over homogeneous and heterogeneous computing environments. Obtained results highlight the convenience of using ensemble models in comparison with single/standalone prediction models. Ensemble learning techniques permitted reductions of the prediction error up to 24.9% in comparison with single-model strategies.

Computers & Electrical Engineering | 2017

Meta-heuristic based autoscaling of cloud-based parameter sweep experiments with unreliable virtual machines instances

David A. Monge; Elina Pacini; Cristian Mateos; Carlos García Garino

Abstract Cloud Computing is the delivery of on-demand computing resources over the Internet on a pay-per-use basis and is very useful to execute scientific experiments such as parameter sweep experiments (PSEs). When PSEs are executed it is important to reduce both the makespan and monetary cost. We propose a novel tri-objective formulation for the PSEs autoscaling problem considering unreliable virtual machines (VM) pursuing the minimization of makespan, monetary cost and probability of failures. We also propose a new autoscaler based on the Non-dominated Sorting Genetic Algorithm II able to automatically determine the right amount for each type of VM and pricing scheme, as well as the bid prices for the spot instances. Experiments show that the proposed autoscaler achieves great improvements in terms of makespan and cost when it is compared against Scaling First and Spot Instances Aware Autoscaling.

ieee international conference on high performance computing data and analytics | 2017

Markov Decision Process to Dynamically Adapt Spots Instances Ratio on the Autoscaling of Scientific Workflows in the Cloud

Yisel Garí; David A. Monge; Cristian Mateos; Carlos García Garino

Spot instances are extensively used to take advantage of large-scale Cloud infrastructures at lower prices than traditional on-demand instances. Autoscaling scientific workflows in the Cloud considering both spot and on-demand instances presents a major challenge as the autoscalers have to determine the proper amount and type of virtual machine instances to acquire, dynamically adjusting the number of instances under each pricing model (spots or on-demand) depending on the workflow needs. Under budget constraints, this adjustment is performed by an assignment policy that determines the suitable proportion of the available budget intended for each model. We propose an approach to derive an adaptive budget assignment policy able to reassign the budget at any point in the workflow execution. Given the inherent variability of the resources in a Cloud, we formalize the described problem as a Markov Decision Process and derive adaptive policies based on other baseline policies. Experiments demonstrate that our policies outperform all the baseline policies in terms of makespan and most of them in terms of cost. These promising results encourage the future study of new strategies aiming to find optimal budget policies applied to the execution of workflows on the Cloud.

IEEE Latin America Transactions | 2015

Learning Running-time Prediction Models for Gene-Expression Analysis Workflows

David A. Monge; Matej Holec; Filip Zelezny; Carlos García Garino

One of the central issues for the efficient management of Scientific workflow applications is the prediction of tasks performance. This paper proposes a novel approach for constructing performance models for tasks in data-intensive scientific workflows in an autonomous way. Ensemble Machine Learning techniques are used to produce robust combined models with high predictive accuracy. Information derived from workflow systems and the characteristics and provenance of the data are exploited to guarantee the accuracy of the models. A gene-expression analysis workflow application was used as case study over homogeneous and heterogeneous computing environments. Experimental results evidence noticeable improvements while using ensemble models in comparison with single/standalone prediction models. Ensemble learning techniques made it possible to reduce the prediction error with respect to the strategies of a single-model with values ranging from 14.47 percent to 28.36 percent for the homogeneous case, and from 8.34 percent to 17.18 percent for the heterogeneous case.

computer based medical systems | 2011

Template-based semi-automatic workflow construction for gene expression data analysis

Jiri Belohradsky; David A. Monge; Filip Zelezny; Matej Holec; Carlos García Garino

We propose a technique for semi-automatic construction of gene expression data analysis workflows by grammar-like inference based on predefined workflow templates. The templates represent routinely used sequences of procedures such as normalization, data transformation, classifier learning, etc. Variations of such workflows (such as different instantiations to specific algorithms) may entail significant variance in the quality of the analysis results and our formalism enables to automatically explore such variations. Adhering to proven templates helps preserve the sanity of explored workflows and prevents the combinatorial explosion encountered by fully automatic workflow planners. Here we propose the basic principles of template-based workflow construction and demonstrate their working in the publicly available tool XGENE.ORG for multi-platform gene expression analysis.

Computing and Informatics \/ Computers and Artificial Intelligence | 2014