George Papastefanatos
National Technical University of Athens
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by George Papastefanatos.
data warehousing and knowledge discovery | 2007
George Papastefanatos; Panos Vassiliadis; Alkis Simitsis; Yannis Vassiliou
In this paper, we deal with the problem of performing what-if analysis for changes that occur in the schema/structure of the data warehouse sources. We abstract software modules, queries, reports and views as (sequences of) queries in SQL enriched with functions. Queries and relations are uniformly modeled as a graph that is annotated with policies for the management of evolution events. Given a change at an element of the graph, our method detects the parts of the graph that are affected by this change and indicates the way they are tuned to respond to it.
Journal on Data Semantics | 2009
George Papastefanatos; Panos Vassiliadis; Alkis Simitsis; Yannis Vassiliou
In this paper, we discuss the problem of performing impact prediction for changes that occur in the schema/structure of the data warehouse sources. We abstract Extract-Transform-Load (ETL) activities as queries and sequences of views. ETL activities and its sources are uniformly modeled as a graph that is annotated with policies for the management of evolution events. Given a change at an element of the graph, our method detects the parts of the graph that are affected by this change and highlights the way they are tuned to respond to it. For many cases of ETL source evolution, we present rules so that both syntactical and semantic correctness of activities are retained. Finally, we experiment with the evaluation of our approach over real-world ETL workflows used in the Greek public sector.
international conference on conceptual modeling | 2008
George Papastefanatos; Panos Vassiliadis; Alkis Simitsis; Yannis Vassiliou
During data warehouse design, the designer frequently encounters the problem of choosing among different alternatives for the same design construct. The behavior of the chosen design in the presence of evolution events is an important parameter for this choice. This paper proposes metrics to assess the quality of the warehouse design from the viewpoint of evolution. We employ a graph-based model to uniformly abstract relations and software modules, like queries, views, reports, and ETL activities. We annotate the warehouse graph with policies for the management of evolution events. The proposed metrics are based on graph-theoretic properties of the warehouse graph to assess the sensi tivity of the graph to a set of possible events. We evaluate our metrics with experiments over alternative configurations of the same warehouse schema.
international conference on data engineering | 2010
George Papastefanatos; Panos Vassiliadis; Alkis Simitsis; Yannis Vassiliou
HECATAEUS is an open-source software tool for enabling impact prediction, what-if analysis, and regulation of relational database schema evolution. We follow a graph theoretic approach and represent database schemas and database constructs, like queries and views, as graphs. Our tool enables the user to create hypothetical evolution events and examine their impact over the overall graph before these are actually enforced on it. It also allows definition of rules for regulating the impact of evolution via (a) default values for all the nodes of the graph and (b) simple annotations for nodes deviating from the default behavior. Finally, HECATAEUS includes a metric suite for evaluating the impact of evolution events and detecting crucial and vulnerable parts of the system.
advances in databases and information systems | 2009
George Papastefanatos; Panos Vassiliadis; Alkis Simitsis; Timos K. Sellis; Yannis Vassiliou
In this paper, we visit the problem of the management of inconsistencies emerging on ETL processes as results of evolution operations occurring at their sources. We abstract Extract-Transform-Load (ETL) activities as queries and sequences of views. ETL activities and its sources are uniformly modeled as a graph that is annotated with rules for the management of evolution events. Given a change at an element of the graph, our framework detects the parts of the graph that are affected by this change and highlights the way they are tuned to respond to it. We then present the system architecture of a tool called Hecataeus that implements the main concepts of the proposed framework.
conference on software maintenance and reengineering | 2008
George Papastefanatos; Fotini Anagnostou; Yannis Vassiliou; Panos Vassiliadis
Databases are continuously evolving environments, where design constructs are added, removed or updated rather often. Small changes in the database configurations might impact a large number of applications and data stores around the system: queries and data entry forms can be invalidated, application programs might crash. HECATAEUS is a tool, which represents the database schema along with its dependent workload, mainly queries and views, as a uniform directed graph. The tool enables the user to create hypothetical evolution events and examine their impact over the overall graph as well as to define rules so that both syntactical and semantic correctness of the affected workload is retained.
international conference on data engineering | 2016
Nikos Bikakis; John Liagouris; Maria Krommyda; George Papastefanatos; Timos K. Sellis
We present a novel platform for the interactive visualization of very large graphs. The platform enables the user to interact with the visualized graph in a way that is very similar to the exploration of maps at multiple levels. Our approach involves an offline preprocessing phase that builds the layout of the graph by assigning coordinates to its nodes with respect to a Euclidean plane. The respective points are indexed with a spatial data structure, i.e., an R-tree, and stored in a database. Multiple abstraction layers of the graph based on various criteria are also created offline, and they are indexed similarly so that the user can explore the dataset at different levels of granularity, depending on her particular needs. Then, our system translates user operations into simple and very efficient spatial operations (i.e., window queries) in the backend. This technique allows for a fine-grained access to very large graphs with extremely low latency and memory requirements and without compromising the functionality of the tool. Our web-based prototype supports three main operations: (1) interactive navigation, (2) multi-level exploration, and (3) keyword search on the graph metadata.
international conference on conceptual modeling | 2013
Petros Manousis; Panos Vassiliadis; George Papastefanatos
Data-intensive ecosystems are conglomerations of data repositories surrounded by applications that depend on them for their operation. To support the graceful evolution of the ecosystems components we annotate them with policies for their response to evolutionary events. In this paper, we provide a method for the adaptation of ecosystems based on three algorithms that i assess the impact of a change, ii compute the need of different variants of an ecosystems components, depending on policy conflicts, and iii rewrite the modules to adapt to the change.
Journal on Data Semantics | 2012
George Papastefanatos; Panos Vassiliadis; Alkis Simitsis; Yannis Vassiliou
The Extract-Transform-Load (ETL) flows are essential for the success of a data warehouse and the business intelligence and decision support mechanisms that are attached to it. During both the ETL design phase and the entire ETL lifecycle, the ETL architect needs to design and improve an ETL design in a way that satisfies both performance and correctness guarantees and often, she has to choose among various alternative designs. In this paper, we focus on ways to predict the maintenance effort of ETL workflows and we explore techniques for assessing the quality of ETL designs under the prism of evolution. We focus on a set of graph-theoretic metrics for the prediction of evolution impact and we investigate their fit into real-world ETL scenarios. We present our experimental findings and describe the lessons we learned working on real-world cases.
Sprachwissenschaft | 2016
Nikos Bikakis; George Papastefanatos; Melina Skourla; Timos K. Sellis
Data exploration and visualization systems are of great importance in the Big Data era, in which the volume and heterogeneity of available information make it difficult for humans to manually explore and analyse data. Most traditional systems operate in an offline way, limited to accessing preprocessed (static) sets of data. They also restrict themselves to dealing with small dataset sizes, which can be easily handled with conventional techniques. However, the Big Data era has realized the availability of a great amount and variety of big datasets that are dynamic in nature; most of them offer API or query endpoints for online access, or the data is received in a stream fashion. Therefore, modern systems must address the challenge of on-the-fly scalable visualizations over large dynamic sets of data, offering efficient exploration techniques, as well as mechanisms for information abstraction and summarization. In this work, we present a generic model for personalized multilevel exploration and analysis over large dynamic sets of numeric and temporal data. Our model is built on top of a lightweight tree-based structure which can be efficiently constructed on-the-fly for a given set of data. This tree structure aggregates input objects into a hierarchical multiscale model. Considering different exploration scenarios over large datasets, the proposed model enables efficient multilevel exploration, offering incremental construction and prefetching via user interaction, and dynamic adaptation of the hierarchies based on user preferences. A thorough theoretical analysis is presented, illustrating the efficiency of the proposed model. The proposed model is realized in a web-based prototype tool, called SynopsViz that offers multilevel visual exploration and analysis over Linked Data datasets.