Is this you? Create Your Porfile

Maria Kontaki

Aristotle University of Thessaloniki

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Maria Kontaki is active.

Explore More

Publication

Featured researches published by Maria Kontaki.

international conference on data engineering | 2011

Continuous monitoring of distance-based outliers over data streams

Maria Kontaki; Anastasios Gounaris; Apostolos N. Papadopoulos; Kostas Tsichlas; Yannis Manolopoulos

Anomaly detection is considered an important data mining task, aiming at the discovery of elements (also known as outliers) that show significant diversion from the expected case. More specifically, given a set of objects the problem is to return the suspicious objects that deviate significantly from the typical behavior. As in the case of clustering, the application of different criteria lead to different definitions for an outlier. In this work, we focus on distance-based outliers: an object x is an outlier if there are less than k objects lying at distance at most R from x. The problem offers significant challenges when a stream-based environment is considered, where data arrive continuously and outliers must be detected on-the-fly. There are a few research works studying the problem of continuous outlier detection. However, none of these proposals meets the requirements of modern stream-based applications for the following reasons: (i) they demand a significant storage overhead, (ii) their efficiency is limited and (iii) they lack flexibility. In this work, we propose new algorithms for continuous outlier monitoring in data streams, based on sliding windows. Our techniques are able to reduce the required storage overhead, run faster than previously proposed techniques and offer significant flexibility. Experiments performed on real-life as well as synthetic data sets verify our theoretical study.

panhellenic conference on informatics | 2008

Continuous Top-k Dominating Queries in Subspaces

Maria Kontaki; Apostolos N. Papadopoulos; Yannis Manolopoulos

Dominating queries are significant tools for preference-based query processing in databases and decision support applications. An important preference-based query is the top-k dominating query, which reports the k most important objects according to their domination capabilities (score). In this paper, we address the following issues to tackle two limitations of previously proposed approaches: (i) we allow dominating queries to be expressed in a subset of the available dimensions and (ii) we provide the necessary techniques to enable continuous processing of multiple queries. We use a grid-based indexing scheme to facilitate efficient search and update operations, avoiding expensive reorganization costs. In addition, several optimizations are proposed to enhance efficiency. Performance evaluation results, based on real-life and synthetic data sets, show the efficiency and scalability of the proposed scheme.

data and knowledge engineering | 2007

Adaptive similarity search in streaming time series with sliding windows

Maria Kontaki; Apostolos N. Papadopoulos; Yannis Manolopoulos

The challenge in a database of evolving time series is to provide efficient algorithms and access methods for query processing, taking into consideration the fact that the database changes continuously as new data become available. Traditional access methods that continuously update the data are considered inappropriate, due to significant update costs. In this paper, we use the IDC-Index (Incremental DFT Computation - Index), an efficient technique for similarity query processing in streaming time series. The index is based on a multidimensional access method enhanced with a deferred update policy and an incremental computation of the Discrete Fourier Transform (DFT), which is used as a feature extraction method. We focus both on range and nearest-neighbor queries, since both types are frequently used in modern applications. An important characteristic of the proposed approach is its ability to adapt to the update frequency of the data streams. By using a simple heuristic approach, we manage to keep the update frequency at a specified level to guarantee efficiency. In order to investigate the efficiency of the proposed method, experiments have been performed for range queries and k-nearest-neighbor queries on real-life data sets. The proposed method manages to reduce the number of false alarms examined, achieving high answers vs. candidates ratio. Moreover, the results have shown that the new techniques exhibit consistently better performance in comparison to previously proposed approaches.

international conference on management of data | 2013

Continuous outlier detection in data streams: an extensible framework and state-of-the-art algorithms

Dimitrios Georgiadis; Maria Kontaki; Anastasios Gounaris; Apostolos N. Papadopoulos; Kostas Tsichlas; Yannis Manolopoulos

Anomaly detection is an important data mining task, aiming at the discovery of elements that show significant diversion from the expected behavior; such elements are termed as outliers. One of the most widely employed criteria for determining whether an element is an outlier is based on the number of neighboring elements within a fixed distance (R), against a fixed threshold (k). Such outliers are referred to as distance-based outliers and are the focus of this work. In this demo, we show both an extendible framework for outlier detection algorithms and specific outlier detection algorithms for the demanding case where outlier detection is continuously performed over a data stream. More specifically: i) first we demonstrate a novel flavor of an open-source publicly available tool for Massive Online Analysis (MOA) that is endowed with capabilities to encapsulate algorithms that continuously detect outliers and ii) second, we present four online outlier detection algorithms. Two of these algorithms have been designed by the authors of this demo, with a view to improving on key aspects related to outlier mining, such as running time, flexibility and space requirements.

statistical and scientific database management | 2012

Discovery of top-k dense subgraphs in dynamic graph collections

Elena Valari; Maria Kontaki; Apostolos N. Papadopoulos

Dense subgraph discovery is a key issue in graph mining, due to its importance in several applications, such as correlation analysis, community discovery in the Web, gene co-expression and protein-protein interactions in bioinformatics. In this work, we study the discovery of the top-k dense subgraphs in a set of graphs. After the investigation of the problem in its static case, we extend the methodology to work with dynamic graph collections, where the graph collection changes over time. Our methodology is based on lower and upper bounds of the density, resulting in a reduction of the number of exact density computations. Our algorithms do not rely on user-defined threshold values and the only input required is the number of dense subgraphs in the result (k). In addition to the exact algorithms, an approximation algorithm is provided for top-k dense subgraph discovery, which trades result accuracy for speed. We show that a significant number of exact density computations is avoided, resulting in efficient monitoring of the top-k dense subgraphs.

statistical and scientific database management | 2004

Efficient similarity search in streaming time sequences

Maria Kontaki; Apostolos N. Papadopoulos

Query processing in data streams is a very important research direction. The challenge in a database of data streams is to provide efficient algorithms and access methods for query processing, taking into consideration the fact that the database changes continuously as new data arrive. Traditional access methods that continuously update the data are considered inefficient, due to the significant update costs. In this paper we present IDC-Index, an efficient technique for similarity query processing in streaming time sequences, which is based on a multidimensional access method enhanced with a deferred update policy and an incremental computation of the discrete Fourier transform (DFT), which is used as a feature extraction method. The method manages to reduce the number of false alarms examined and therefore achieves high answers/candidates ratio. Moreover, an extensive performance evaluation based on synthetic random walk and real time sequences have shown that the proposed technique outperforms significantly existing approaches for similarity range query processing.

conference on current trends in theory and practice of informatics | 2009

Continuous Processing of Preference Queries in Data Streams

Maria Kontaki; Apostolos N. Papadopoulos; Yannis Manolopoulos

Preference queries have received considerable attention in the recent past, due to their use in selecting the most preferred objects, especially when the selection criteria are contradictory. Nowadays, a significant number of applications require the manipulation of time evolving data and therefore the study of continuous query processing has recently attracted the interest of the data management community. The goal of continuous query processing is to continuously evaluate long-running queries by using incremental algorithms and thus to avoid query evaluation from scratch, if possible. In this paper, we examine the characteristics of important preference queries, such as skyline, top-k and top-k dominating and we review algorithms proposed for the evaluation of continuous preference queries under the sliding window streaming model.

data warehousing and knowledge discovery | 2008

Continuous Trend-Based Clustering in Data Streams

Maria Kontaki; Apostolos N. Papadopoulos; Yannis Manolopoulos

Trend analysis of time series is an important problem since trend identification enables the prediction of the near future. In streaming time series the problem is more challenging due to the dynamic nature of the data. In this paper, we propose a method to continuously clustering a number of streaming time series based on their trend characteristics. Each streaming time series is transformed to a vector by means of the Piecewise Linear Approximation (PLA) technique. The PLA vector comprises pairs of values (timestamp, trend) denoting the starting time of the trend and the type of the trend (either UP or DOWN) respectively. A distance metric for PLA vectors is introduced. We propose split and merge criteria to continuously update the clustering information. Moreover, the proposed method handles outliers. Performance evaluation results, based on real-life and synthetic data sets, show the efficiency and scalability of the proposed scheme.

acm symposium on applied computing | 2008

Continuous k-dominant skyline computation on multidimensional data streams

Maria Kontaki; Apostolos N. Papadopoulos; Yannis Manolopoulos

Skyline queries are important due to their usefulness in many application domains. However, by increasing the number of attributes, the probability that a tuple dominates another one is reduced significantly. To attack this problem, k-dominant skylines have been proposed, relaxing the definition of domination. In this paper, we study the problem of continuous monitoring of k-dominant skylines, where multiple queries are running concurrently. The proposed method divides the space in pairs of attributes. For each pair, we compute skyline tuples and we exploit them to eliminate candidates tuples of the queries and we combine the partial results. The proposed scheme uses only simple domination checks and it is applicable to the streaming case as well as to ad-hoc insertions and deletions. Experiments, based on different data distributions, show the efficiency of the proposed scheme in comparison to existing methods.

Information Systems | 2016

Efficient and flexible algorithms for monitoring distance-based outliers over data streams

Maria Kontaki; Anastasios Gounaris; Apostolos N. Papadopoulos; Kostas Tsichlas; Yannis Manolopoulos

Anomaly detection is considered an important data mining task, aiming at the discovery of elements (known as outliers) that show significant diversion from the expected case. More specifically, given a set of objects the problem is to return the suspicious objects that deviate significantly from the typical behavior. As in the case of clustering, the application of different criteria leads to different definitions for an outlier. In this work, we focus on distance-based outliers: an object x is an outlier if there are less than k objects lying at distance at most R from x. The problem offers significant challenges when a stream-based environment is considered, where data arrive continuously and outliers must be detected on-the-fly. There are a few research works studying the problem of continuous outlier detection. However, none of these proposals meets the requirements of modern stream-based applications for the following reasons: (i) they demand a significant storage overhead, (ii) their efficiency is limited and (iii) they lack flexibility in the sense that they assume a single configuration of the k and R parameters. In this work, we propose new algorithms for continuous outlier monitoring in data streams, based on sliding windows. Our techniques are able to reduce the required storage overhead, are more efficient than previously proposed techniques and offer significant flexibility with regard to the input parameters. Experiments performed on real-life and synthetic data sets verify our theoretical study. HighlightsWe prove a linear space lower bound.A novel continuous algorithm is presented, which has two versions (COD).To support different views of outliers, we propose an extension (ACOD).We also propose algorithms based on micro-clusters (MCOD/AMCOD).Performance evaluation results based on both real-life and synthetic data.

Explore More