Anna Bánáti | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Anna Bánáti is active.

Explore More

Publication

Featured researches published by Anna Bánáti.

international convention on information and communication technology electronics and microelectronics | 2015

Four level provenance support to achieve portable reproducibility of scientific workflows

Anna Bánáti; Péter Kacsuk; Miklós Kozlovszky

In the scientists community one of the most vital challenges is the issue of reproducibility of workflow execution. In order to reproduce the results of an experiment, on one hand provenance information must be collected and on the other hand the dependencies of the execution need to be eliminated. Concerning the workflow execution environment we have differentiated four levels of provenance: infrastructural, environmental, workflow and data provenance. During the re-execution at all levels the components can change and capturing the data of each levels targets different problems to solve. For example storing the environmental and infrastructural parameters enables the portability of workflows between the different parallel and distributed systems (grid, HPC, cloud). The describers of the workflow model enable tracking the different versions of the workflow and their impacts on the execution. Our goal is to capture the most optimal parameters in number and type as well and reconstruct the way of data production independently from the environment. In this paper we investigate the necessary and satisfactory parameters of workflow reproducibility and give a mathematical formula to determine the rate of reproducibility. These measurements allow the scientist to make a decision about the next steps toward the creation of reproducible workflows.

international convention on information and communication technology electronics and microelectronics | 2014

Dynamic workflow support in gUSE

Eszter Kail; Anna Bánáti; Krisztián Karóczkai; Péter Kacsuk; Miklos Kozlovszky

Scientific workflow systems aim to provide user friendly, end-to-end solutions for automating and simplifying computational or data intensive tasks. A number of workflow environments have been developed in recent years to provide support for the specification and execution of scientific workflows. Normal static workflows can poorly cope with the ever changing status of the existing distributed systems. During workflow enactment unforeseen scenarios may arise, which can cause significant delays, failed executions, or improper results. Manual workflow enactment has its obvious limitations, however automatic failover mechanisms require minimum an accurate information set about the workflow tasks and about the status of the underlying processing infrastructure. Dynamism can be defined at different abstraction levels and in different phases of the workflow lifecycle. In this paper we identify the requirements of dynamic workflows in general and provide a thorough survey about gUSE/WS-PGRADEs dynamic workflow handling capabilities.

international conference on intelligent engineering systems | 2015

Minimal sufficient information about the scientific workflows to create reproducible experiment

Anna Bánáti; Péter Kacsuk; Miklos Kozlovszky

The reproducibility of an in-silico experiment is a great challenge because of the parallel and distributed environment and the complexity of the scientific workflows. In order to solve such problems on one hand provenance data has to be captured about the dataflow, the ancestry of the results and the environment of the execution, on the other hand description data has to be collected from the scientist and stored about the essential details, the types and samples of input/output data, and the operation of the experiment. The ultimate goal of our work is to propose a minimal dataset for recording and reporting scientific workflow based experiment, which will facilitate the reproducibility of such experiments, the public repositories and enable to share and reuse the scientific results. One part of the dataset can be filled in manually by the scientist, certain part can be filled in automatically by the system and other part can be filled in from provenance data.

Archive | 2017

Reproducibility Analysis of Scientific Workflows

Anna Bánáti; Péter Kacsuk; Miklos Kozlovszky

Scientific workflows are efficient tools for specifying and automating compute and data intensive in-silico experiments. An important challenge related to their usage is their reproducibility. In order to make it reproducible, many factors have to be investigated which can influence and even prevent this process: the missing descriptions and samples; the missing provenance data about the environmental parameters and the data dependencies; the dependencies of executions which are based on special hardware, changing or volatile third party services or random generated values. Some of these factors (called dependencies) can be eliminated by careful design or by huge resource usage but most of them cannot be bypassed. Our investigation deals with the critical dependencies of execution. In this paper we set up a mathematical model to evaluate the results of the workflow in addition we provide a mechanism to make the workflow reproducible based on provenance data and statistical tools.

international symposium on intelligent systems and informatics | 2016

Evaluating the average reproducibility cost of the scientific workflows

Anna Bánáti; Peter Karasz; Péter Kacsuk; Miklos Kozlovszky

Applying scientific workflow to perform in-silico experiment is a more and more prevalent solution among the scientists communities. Because of the data and compute intensive behavior of the scientific workflows parallel and distributed system (grids, clusters, clouds and supercomputers) are required to execute them. After all the complexity of these infrastructures and the continuously changing environment significantly encumber or even prevent the repeatability or the reproducibility which is often needed for results sharing or for judging scientific claims. The necessary data and parameters of the re-execution can be originated from different sources (infrastructural, third party, or related to the binaries), which may change or become unavailable during the years. Our ultimate goal is to compensate the lack of the original parameters by replacing, evaluating or simulating the value of the parameters in dispute. In order to create these methods we determined the levels of the re-execution and we defined a descriptor-space to collect all the parameters needed to the reproducibility. However these procedures take some extra cost the average reproducibility cost can be computed or even evaluated. In this paper we give a method to evaluate the average cost of making a workflow reproducible if the exact computation is not possible.

international symposium on computational intelligence and informatics | 2016

Investigation of the descriptors to make the scientific workflows reproducible

Anna Bánáti; Péter Kacsuk; Miklos Kozlovszky

The complexity of the scientific workflows and the continuously changing nature of the environment makes it hard or even prevents to reproduce the workflows or share their results in the scientists community. The main factors which can influence the reproducibility are the descriptors (resources, system variables, code variables, inputs, parameters etc. required to re-execute the workflow) that are continuously changing in time or become unavailable during the years (typically third party resources). The ultimate goal of my research is to reveal the behavior of the crucial descriptors which can prevent the reproducibility and to find the relationship between the change of the descriptors and the change of the result. Based on the nature of the change, and its relation to the result evaluation can be performed to replace the missing descriptor values with simulated ones. In this way we intend to support the scientist to create a reproducible scientific workflow. In addition, with determining the probability of the reproducibility we can also help the scientists to find the most suitable scientific workflow in a repository which they can reuse and create their own experiment.

international convention on information and communication technology electronics and microelectronics | 2016

Classification of scientific workflows based on reproducibility analysis

Anna Bánáti; Péter Kacsuk; Miklos Kozlovszky

In the scientists community one of the most vital challenges is the reproducibility of a workflow execution. The necessary parameters of the execution (we call them descriptors) can be external which depend on for example the computing infrastructure (grids, clusters and clouds), on third party resources or it can be internal which belong to the code of the workflow such as variables. Consequently, during the process of re-execution these parameters may change or become unavailable and finally they can prevent to reproduce the workflow. However in most cases the lack of the original parameters can be compensated by replacing, evaluating or simulating the value of the descriptors with some extra cost in order to make it reproducible. Our goal in this paper is to classify the scientific workflows based on the method and cost how they can become reproducible.

symposium on applied computational intelligence and informatics | 2015

Supporting health smart system applications in Scientific Gateway environment

Krisztián Karóczkai; Anna Bánáti; Péter Kacsuk; Miklos Kozlovszky

Smart system application are gaining significant attention especially concerning health monitoring researches. Data captured from sensors are arriving continuously and for an efficient handling of this large volume of data stream processing would give the ideal solution [6]. Stream processing means that without intermediate storage the data should be processed as they are coming from the sensors. Storing and downloading data is a time-consuming mechanism, and during our investigation we have proved that data downloading time has significant impact on the overall execution of our health monitoring smart application. We have investigated the most common science gateway solution to examine the path that data takes from the outer storage to the executor node or the infrastructure and we have defined an execution model for smart system applications where the data movement is minimal.

doctoral conference on computing electrical and industrial systems | 2015

Usability of scientific workflow in dynamically changing environment

Anna Bánáti; Eszter Kail; Péter Kacsuk; Miklos Kozlovszky

Scientific workflow management systems are mainly data-flow oriented, which face several challenges due to the huge amount of data and the required computational capacity which cannot be predicted before enactment. Other problems may arise due to the dynamic access of the data storages or other data sources and the distributed nature of the scientific workflow computational infrastructures (cloud, cluster, grid, HPC), which status may change even during running of a single workflow instance. Many of these failures could be avoided with workflow management systems that provide provenance based dynamism and adaptivity to the unforeseen scenarios arising during enactment. In our work we summarize and categorize the failures that can arise in cloud environment during enactment and show the possibility of prediction and avoidance of failures with dynamic and provenance support.

symposium on applied computational intelligence and informatics | 2018